WO1992013306A1

WO1992013306A1 - Vector associative map system

Info

Publication number: WO1992013306A1
Application number: PCT/US1991/009820
Authority: WO
Inventors: Stephen Grossberg; Paolo Gaudiano
Original assignee: Trustees Of Boston University
Priority date: 1991-01-15
Filing date: 1991-12-30
Publication date: 1992-08-06
Also published as: EP0567557A4; EP0567557A1

Abstract

A vector associative map system (10) for unsupervised real-time error-based learning and performance includes a vector associative map which is implemented as an Adaptive Vector Integration To Endpoint System for intramodal operation. AVITE (10) includes a target position command (12) representing the location of target, and present position command (14) for encoding the present hand-arm configuration. The difference vector (16) computes the difference between the (14) and the (12). A speed controlling signal (18) multiplies the result from DV (16). PPC (14) generates the product of (16) and (18) and an outflow command on line (20) to the object to be operated. Integration at (14) continues at a rate dependent on the signal from Go (18) until DV (16) reaches zero, at which time PPC (14) equals TPC (12) and the object has moved to the target position.

Description

VECTOR ASSOCIATIVE MAP SYSTEM

FIELD OF INVENTION This invention relates to a neural network for adaptiv control of movement trajectories, and more generally to a network using an unsupervised, real-time, error-based syste for learning and performance in which the present state of the system is compared to a target state to detect an error, which error in the performance phase is used to align the present state with the target state while in the learning phase is used to calibrate the system.

BACKGROUND OF INVENTION

Some conventional sensory-motor control systems for th control of arm movement trajectories and other trajectories attempt to calculate the entire trajectory for every possible movement of the arm. Since there is an infinite number of possible movements this approach can lead to a combinatorial explosion. As a result, such systems limit their field of operation to a narrow segment of space within which the equations describing the operation of the arm can be approximated by a linear system.

Other sensory motor control systems are able to generat arm movements without calculating specific trajectories. This is accomplished in a system in which the target positio command (TPC) represents the location of the desired target and the present position command (PPC) encodes the present hand-arm configuration. The difference vector (DV) population continuously computes the difference between the PPC and the TPC. A speed-controlling GO signal multiplies D output. The PPC integrates the (DV) ^• (GO) product and generates an outflow command to the arm. Integration at the PPC continues at a rate dependent on GO signal size until th DV reaches zero, at which time the PPC equals the TPC, and the arm has moved to the desired target. In such an approac the TPC and PPC must be in the same coordinates, or there has to be a calibrated adaptive filter interfaced between the two to transform the TPC signal into the same coordinates as the PPC so they can be properly compared at the DV.

Still other sensory-motor control systems attempt to learn the correct arm movement parameters using error-based learning techniques. Error-based learning systems use an external observer to detect the error between the commanded position and the actual final position, and feed that error back to the system to correct the calibration of the system.

However, all error-based sensory-motor control systems, and more generally all present error-based learning systems that attempt to learn a mapping from an input vector to an output vector, require external supervision and careful tailoring of the environment and the input signals. Such systems are incapable of adapting to a new or changed environment unless an external supervisor instructs them to do so.

SUMMARY OF INVENTION

It is therefore an object of this invention to provide an improved error-based learning system for the control of movement trajectories, and more generally to such an error-based learning system for transforming signals between arbitrary maps.

It is a further object of this invention to provide such an improved error-based learning system which employs a vecto associative map system.

It is a further object of this invention to provide such an improved error-based learning system which functions in real-time and unsupervised.

It is a further object of this invention to provide suc an improved error-based learning system which carries out learning and performance operations using the same circuitry

It is a further object of this invention to provide suc an improved error-based learning system which can autonomously shift between learning and performance operations.

It is a further object of this invention to provide suc an improved error-based learning system which includes an endogenous source of training signals to initiate learning i the uncalibrated system.

It is a further object of this invention to provide suc an improved error-based learning system which continues learning even after initial calibration upon occurrence of internal error conditions due to changes in the environment or in system parameters.

It is a further object of this invention to provide suc an improved error-based learning system in which learning is not relegated solely to an externally determined special training phase but may occur during operation of the system without interrupting normal on-line performance.

This invention results from the realization that a trul effective error-based learning system can be constructed using a vector associative map system which functions unsupervised in real-time, can carry out learning and performance operations using the same circuitry and can autonomously shift between learning and performance operations, so that learning is not relegated solely to an externally determined special training phase but may occur a any time during operation of the system without interrupting normal performance by using a technique in which in the performance phase the state of the system identified by the present map controller is aligned with the target goal set by the target map controller and in the learning phase the adaptive filter is adjusted to zero the difference vector when the target goal and present states are aligned, thereby enabling the learning and performance phases to occur throughout the operation of the system in the initial uncalibrated condition of the system .or later under calibrated conditions.

This invention features a vector associative map system for unsupervised real-time error-based learning and performance. There is a target map controller for setting a predetermined goal for the system. The target map controller includes an adaptive filter. A present map controller identifies the present state of the system. A difference vector network aligns the state of the system identified by the present map controller with the target goal set by the target map controller. The difference vector network also calibrates the adaptive filter to zero the difference vector when the target goal and present states are aligned.

In a preferred embodiment the target map controller includes means for transforming the current state of the present map controller into a command representative of the predetermined goal corresponding to the current state of the present map controller. The target map controller may include means for enabling calibration of the adaptive filter only when the target goal and present state are aligned. The means for transforming may include gate means interconnected between the target map controller and the present map controller for intermodal learning. The means for transforming may also include means for feedback through the environment for intermodal learning. The target map controller and the present map controllers may have different coordinate systems. The target map controller and the present map controller may encode positions in different coordinates. The means for enabling calibration may include a learning gate. The difference vector network aligns the state of the system identified by the present map controller with the target goal in the performance phase, and in the learning phase modifies the adaptive filter to zero the difference vector when the target goal and present states ar aligned. The phases may be interleaved during operation of the system. The system may further include means for endogenously generating sample training signals for modifyin the present state of the system to initiate learning in the uncalibrated system. The system may further include means for determining the rate at which the present map controller aligns with the target map controller.

The invention also features multidimensional learning and performance apparatus which includes a plurality of interconnected vector associated map systems, each including a target map controller, present map controller, and difference network as above.

DISCLOSURE OF PREFERRED EMBODIMENT

Other objects, features and advantages will occur to those skilled in the art from the following description of a preferred embodiment and the accompanying drawings, in which

Fig. 1 is a block diagram of an adaptive vector integration to endpoint circuit implementing a vector- associative map system according to this invention;

Figs. 2A-D are diagrammatic illustrations of a babbling cycle in the AVITE of Fig. 1;

Fig. 3 is a more detailed schematic diagram of the AVIT circuit of Fig. 1 showing the opposing agonist and antagonis channels;

Fig. 4 is a more detailed schematic view of the endogenous random generator (ERG) of Fig. 2;

Figs. 5A and B illustrate the ERG ON and ERG OFF waveforms;

Fig. 5C is a diagram of the joint angle distribution obtained using the ERG of Fig. 4;

Figs. 5D and 5E are histograms showing the magnitude an phase distribution, respectively, of the joint angle distribution of Fig. 5C;

Fig. 6 is a more detailed schematic diagram of a comple ERG AVITE system for a two-joint arm;

Figs. 7A, B and C illustrate the decrease in absolute error as training progresses in the agonist, antagonist and the total error, respectively, measured during the quiet or learning phases of the system;

Figs. 8A-E show arm reaching performance subsequent to different training levels;

Fig. 8F is a view similar to Fig. 8E but for the target in a different position;

Fig. 9 is a block diagram similar to Fig 1 showing both the agonist and antagonist channels and employing a TPC which encodes spatial coordinates;

Fig. 10A shows the linear function that transforms a PPC in amplitude, or agonist-antagonist coordinates into a TPC in spatial coordinates;

Fig. 10B shows the synaptic weights in the adaptive filter interconnecting the spatial TPC and the agonist-antagonist DV in Fig. 9 having learned th correct reverse linear transformation;

Fig. 11A is a plot of a nonlinear sigmoidal transformation similar to that shown in Fig. 10;

Fig. 11B shows the synaptic weights for the nonlinear transformation of Fig. 11A;

Fig. 12 shows synaptic weights for the same nonlinear transformation as in Fig. 11B when distributed activation occurs in a spatial TPC as in Fig. 9;

Fig. 13 illustrates the effect of distributed TPC coding after twenty thousand simulation steps: A: winner-take-all TPC; B: activation spread over two nodes on either side of the TPC peak; C: activation spread to five nodes on either side of the TPC peak;

Fig. 14 is a block diagram of a VAM cascade showing intermodal and intramodal VAMs coupled in series;

Fig. 15 illustrates the synaptic weights of the intermodal adaptive filter in Fig. 14;

Fig. 16 is a block diagram showing a VAM cascade where the intermodal VAM depends on two spatial maps; and

Figs. 17A and B show the synaptic weights of the two adaptive filters for the intermodal VAM in Fig. 16.

A child, or untrained robot, can learn to reach for objects that it sees, or to repeat sounds that it hears through a circular reaction: as an infant makes internally generated movements of his hand, the eyes automatically follow this motion. A transformation is learned between the visual representation of hand position and the motor representation of hand position. Learning of this transformation eventually enables the child to accurately reach for visually detected targets. A similar circular reaction is found in the babbling phase of speech acquisition in infants: as the child produces an internally generated sound, the internal representation of the command that gave rise to the sound coexists with the auditory representation of the same sound, thus allowing a transformation to be learned between production (speech) and perception (hearing) systems. This transformation eventually enables the child to reproduce sounds emitted by other speakers.

This type of circular reaction is intermodal, that is, it combines signals in different modalities. In order for an intermodal circular reaction to lead to stable learning of a transformation between two modalities, it is necessary that the internal representation of each modality is already stable. For example, in order to learn a transformation from visual targets to arm movements it is important that the visual system can consistently follow the arm's motion, and that the internal representation of movement commands can be stably correlated to the actual arm movement. This suggests that stable learning of intramodal control parameters must take place prior to intermodal learning.

There is shown in Fig. 1 a vector associative map according to this invention which is implemented as an Adaptive Vector Integration To Endpoint (AVITE) 10 system for intramodal operation. AVITE 10 may be used for arm or speech trajectory generation. AVITE 10 includes a target position command, TPC 12, which represents the location of the desired target, and a present position command, PPC 14, which encodes the present hand-arm configuration in the case of an arm trajectory controller. The difference vector DV 16 continuously computes the difference between the PPC 14 and the TPC 12. A speed controlling GO signal 18 multiplies the output from DV 16. PPC 14 integrates the product of DV 16 and GO 18 and generates an outflow command on line 20 to the arm or other object to be operated. Integration at PPC 14 continues at a rate dependent upon the size of the signal from GO circuit 18 until DV 16 reaches zero, at which time PPC 14 equals TPC 12 and the arm or other object has moved to the desired target position.

In order for AVITE 10 to generate correct arm trajectories, TPC 12 and PPC 14 must be able to activate dimensionally consistent signals from TPC 12 to DV 16 and from PPC 14 to DV 16, for comparison at the DV 16. There is no reason to assume that the gains or even the coordinates o these signals are initially correctly matched. Learning of an adaptive coordinate transformation is needed to achieve self-consistent matching of TPC 12 and PPC 14 signals at DV 16. This is accomplished through the use of adaptive filter 22 which forms a part of TPC 12.

In the performance phase, as previously explained, DV 1 represents the difference between the desired state of the system represented by TPC 12, and the actual state of the system represented by PPC 14. As the difference or error is driven to zero, PPC 14 approaches and finally equals TPC 12. In the learning phase, with now print gate 24 open, the current state of the PPC 14 is transformed into a command at TPC 12 representative of the predetermined goal correspondin to the current state of the PPC. With the TPC 12 and the PP 14 thus aligned, any error occurring in DV 16 is an internal representation of miscalibration. Such internal miscalibration is eliminated by modification of the adaptive filter 22. TPC 12 includes learning gate 25, which enables modification of adaptive filter 22 only when the target goal of TPC 12 and the present state of TPC 14 are aligned. While the means for transforming the state representative of the PPC 14 to the state of the TPC 12 is shown as including a hard-wired feedback loop, this is not a necessary limitation of the invention. For example, in a speech application the feedback loop 23 may be from mouth to ear or from speaker to microphone.

The initial learning phase of AVITE 10 is regulated by activation of an autonomous endogenous random generator of training vectors, ERG 26, Figs. 2A-D. ERG 26 consists of two complementary channels ERG ON 26a and ERG OFF 26b. ERG ON 26a generates random vectors on line 28a, Fig. 2A. Each vector 28a is integrated at PPC 14, giving rise to the generation of each movement command as illustrated by arm 30, Fig. 2A. The generation of each movement induces a complementary postural phase during which the output of ERG ON 26a stops and learning occurs. Then a new vector is generated and the cycle is repeated. The output of ERG ON 26a autonomously stops in such a way that across trials a broad range of arm positions is generated, thus sampling the full extent of the arm's workspace.

The complementary phases of activation of ERG 26 demarcate the learning and performance modes of an intramodal circular reaction within AVITE 10. When ERG 26 is on, 26a, Fig. 2A, it generates activation 29 at PPC 14, thus randomly moving arm 30. When ERG 26 autonomously shuts off, 26b, Fig. 2B, arm 30 stops moving and modulator gate or now print gate 24 is activated, copying the current PPC 14 into the TPC 12 through a fixed transformation to generate the command 32 which represents the predetermined goal corresponding to the current activation 29 of the PPC 14. The resulting activation 32 of TPC 12 is compared to the corresponding activation 29 of PPC 14 at DV 16, Fig. 2C. In a perfectly calibrated AVITE 10, the resulting TPC 12 and PPC 14 activations 32 and 29, respectively, are perfectly matched so that the activation of DV 16 will be zero. Thus any nonzero activation 34, Fig. 2C, of DV 16 when ERG 26 is off, 26b, represents an internal measure of mismatch. Learning of a correct intramodal transformation drives the activation 34 of DV 16 to zero during this postural phase, Fig. 2D, by changing the synaptic weights in adaptive filter between TPC 12 and DV 16.

In accordance with the invention both the learning and performance phases use the same circuitry of AVITE 10, notabl the same DV 16 for the respective functions. Thus learning and performance can be carried out unsupervised on-line in a real-time setting unlike most traditional off-line supervise error correction schemes. The operation whereby an endogenously generated PPC 14 activates a corresponding TPC 12, as in Fig. 2B, back-propagates information for use in learning, but does so using local operations without the intervention of an external teacher or a break in on-line processing. The class of models that uses this on-line learning performance scheme is referred to as a vector associative map (VAM) because it is used to both learn and perform an associative mapping between internal representation. AVITE 10 is one form of VAM.

Opponent processing is needed to realize many propertie of AVITE 10. The primary need for opponency arises from the fact that each PPC 12 integrates the net positive or excitatory output of the corresponding DV 16. Once the PPC 12 has grown to a positive value, it cannot decrease without receiving some form of inhibition. Within AVITE 10, two controlling channels for each agonist-antagonist controller pair are coupled in a push-pull fashion: each component in Fig. 1 actually consists of agonist and antagonist component as shown schematically in Fig. 3, where the agonist and antagonist channels have been given like numbers accompanied by lower case a's and b's, respectively, and each black dot represents a neuron or population of neurons.

In Fig. 3, the "+" and "-" superscripts represent the agonist and antagonist components, respectively. The capita letters E, F, T, Z, V and P which bear those superscripts, represent the variables used in the mathematical description associated with each of the components. E represents externa inputs from elsewhere in the system; F represents the feedback from Now Print gates 24a and 24b; T represents the activation of TPC 12; Z is the synaptic weight of the adaptive filter 22; V is the activation of DV 16; and P is the activation of PPC 14. 0⁺2i-i and 0⁺2i represent the outputs from two distinct ERG ON channels.

The unaccompanied plus and minus signs in Fig. 3 represent excitatory and inhibitory connections. For example, arrow 40 extending from antagonist TPC 12b to agonist TPC 12a has a minus sign at its arrow head indicating that this input to TPC 12a is inhibitory. Each AVITE 10 requires the input of two ERG ON channels coupled in a push-pull fashion to insure that contraction of the agonist controller is accompanied by relaxation of the antagonist. For PPC 14 let variable Pi⁺ obey the equation

*£-=(1- P ) (G [v.*]*+o ) - P (G [Vr]^R + oή , (1) where [w]^R = max(w,0) represents rectification. This is the rate-determining equation for the entire system: assume an integration rate of 1 and adjust the time constant of all other equations relative to this one.

In equation (1) , PPC 14 acts to integrate its inputs via a shunting, or multiplicative, on-center off-surround network. Adding a small leaky integrator term Pi⁺ to the right hand side of (1) does not qualitatively change the results. Terms G[Vi⁺]^R and G[Vi~]^R are agonist and antagonist components, respectively, gated by the nonspecific GO signal G. Terms 0⁺2i-l and 0⁺2i are ERG ON channel outputs, respectively, to the agonist and antagonist PPC 14a and b. Excitatory inputs coming from the agonist DV 16a and ERG channels (Vi⁺, 0⁺2i-ι) are counteracted by inputs from the antagonist DV 16b and ERG channels (Vi^~, 0⁺2i) . This creates a push-pull mechanism that insures proper antisymmetrical activation in the agonist and antagonist muscles. The multiplicative factors (1 - Pi⁺) and Pi⁺ in the excitatory and inhibitory terms of (1) are shunting terms that interact with the opponent inputs to normalize activations of PPC 14a and b within the range [0,1], and to make Pi⁺ compute the ratio of opponent inputs. This is shown by solving equation (1) at equilibrium (dP/dt = 0) :

Activation in the antagonist channel appears in the denominator, thus reducing agonist activation, and vice versa. Furthermore, activation in either channel is bounded in the interval [0,1], and total activation is normalized to 1: that is, Pi⁺ + Pi^" = 1.

For DV 16, variable Vi⁺ obeys the additive equation:

DV 16 tracks the difference between a filtered copy of TPC 12, namely Ti⁺Zi⁺, and PPC 14 variable Pi⁺ at rate α.

For adaptive filter 22 the synaptic weight or LTM trace Zi⁺ from TPC 12 Ti⁺ to DV 16 Vi⁺ obeys the learning equation

where

Equations (4) and (5) define a gated vector learning la whereby changes in adaptive weights are driven by deviations of the DV 16 from zero when the learning gate g_n is opened and the presynaptic node Ti is active. Other types of f(Ti) would work as long as learning is prevented when Ti = 0.

As the correct scaling factors from PPC to TPC channels are learned, the DV values converge to zero. Term gn in (4) represents learning gate LG 25, Fig. 1, which is coupled to the now print Gate 24. Now print gate 24 enables PPC 14 to activate TPC 12 that represents the same target position of the arm. As explained later, NP 24 may be coupled to the pauser gate (equation (18)).

For TPC 14 variable Ti⁺ obeys the equation:

d 3T? = 6 [→T? + (1 - T?)(Et + F? + T?) - T?(E + Fr + Tr)] (6) dt

Equation (6) is a shunting competitive equation that normalizes TPC activities for dimensionally consistent matching against PPC activities at the DV; see equation (1) . A small leaky integrator term - εTi⁺ was also included to illustrate that either a leaky integrator or a perfect integrator, as in (1) , can be used. The input terms to each TPC are of three types:

(i) intermodal target commands, which are feedforward external inputs (Ei⁺,Ei⁺) that instate new TPCs from other modalities, e.g. from visual inspection of a moving target;

(ii) PPC-to-TPC conversions, which are feedback units (Fi Fi~) from the PPC to the TPC. These inputs instate the TPC corresponding to the PPC attained during an active phase of ERG input integration. Terms Fi⁺ and Fi~ turn on when the now print gate gn turns on: that is,

Fi⁺ = l(Pi⁺,g_n) (7)

and Fi^" = l (Pi^" , gn) ( 8 )

where function 1 represents a fixed mapping; and

(iii) short-term memory storage, which are feedback signa (Ti⁺, Ti^") from TPCs to themselves such that each agonist excites itself and inhibits its antagonist via a linear function of its activity. Such on-center off-surround linea shunting feedback signals store the normalized TPCs in short-term memory until they are updated by new intermodal inputs or PPC feedback. The ratio scale established by thes shunting terms also allows PPC feedback to occur after the PPC integrates the TPC, without changing the TPC. In other words, if Fi⁺ and Fi^" turn on with values Fi⁺ = θ Ti⁺ and Fi^" = θTi^", for some scaling parameter θ>0, then Ti⁺ and Ti^" are essentially unchanged. Similarly, instatement of an intermodal target command (Ei⁺ = ΘTi⁺, Ei~ = ΘTi^") will not change the TPC activation. Any changes that do occur are du to the finite integration rate and the small passive decay term, but will typically be small and transient in nature.

ERG 26 embodies another example of the need for opponen interactions. The motor babbling cycle is controlled by two complementary phases in ERG 26: an active and a quiet phase. During the active phase ERG ON 26a generates random vectors to PPC 14. During the quiet phase, input to PPC 14 from ERG ON 26a is zero, thereby providing the opportunity to learn a stable (TPC, PPC) relationship. In addition, there must be a way for ERG 26 to signal onset of the quiet phase, so that N gate 24 can open and copy the PPC into the TPC. The NP gate 24 must not be open at other times: if it were always open, any incoming commands to TPC 12 could be distorted by contradictory inputs from PPC 14. Therefore, offset of the active ERG phase must be accompanied by onset of a complementary mechanism, ERG OFF 26b, whose output energizes opening of gate 24. ERG 26 may be implemented as a specialized gated dipole, Fig. 4. A gated dipole is a neural network model for the type of opponent processing during which a sudden input offset within one channel can trigger activation, or antagonistic rebound, within the opponent channel. Habituating transmitter gates in each opponent channel regulate the rebound property by modulating the signal in their respective channels. In applications to biological rhythms, each channel's offset can trigger an antagonistic rebound in the other channel, leading to a rhythmic temporal succession of rebound events.

The gated dipole implementation of ERG 26 includes an O channel 26a and OFF channel 26b. There is a pauser gate PG 42, a J⁺ phasic input to ON channel 26a, an I tonic input to both channels, and an X⁺, X^" input layer activation to each channel. Y⁺ and Y~ represent transmitter gates, 0⁺ represent the ERG ON, 26a, output to the PPC, O^" represents the ERG OFF, 26b, output which controls pause gate PG 42 activation.

Each channel includes an input neuron 44a, an intermediate neuron 46a, and an output neuron 48a. OFF channel 26b includes a similar set of neurons labelled with a lower case b. Each channel also includes gates 50a and 50b. Neuron 44a receives J⁺ input from neuron 41 and I input and has an activation parameter X⁺. The J⁺ input is controlled by pauser gate PG 42. Neuron 44b receives only a single input I. Its activation parameter is X^~. Gates 50a and b modulate activations X⁺ and X~ to yield activations Y⁺ and Y~. Neuron 46a receives a modulated signal through gate 50a and delivers an excitatory signal to neuron 48a and an inhibitory signal to neuron 48b. Similarly, neuron 46b receives a modulated signal from gate 50b and delivers an excitatory signal to neuron 48b and an inhibitory signal to neuron 48a. Neuron 48a generates the ERG ON output 0⁺ and neuron 48b generates the ERG OFF output O^", which it supplies to pauser gate PG 4

The key difference between the present design of ERG 26 and prior gated dipole models is the chemical gates 50a,b. In the simple feedforward gated dipole the modulated signal through the gate is a monotone increasing function of the input signal. Hence rebounds can only occur as a result of externally-imposed changes in inputs. In the present design the ERG 26 must convert a steady input signal I+J⁺ into a series of output bursts. This can be achieved if the gate can spontaneously crash in response to a steady input signal so that the modulated signal through gate 50a is an inverted-U function of input. After the gate has crashed in resonse to a steady input, it must be allowed to recover so that a new output burst can be generated. This is achieved by letting ERG OFF 26b activation shut off phasic input J⁺, which will cause a transient increase in activity O^" while the gate recovers. This mechanism is represented as the feedback pathway from PG 42 to neuron 41. Once the gate 50a has replenished, ERG OFF 26b activation will go to zero, causing PG 42 to become inactive, and allowing phasic input J⁺ to be reinstated at input population 44a, starting a new cycle.

ERG 26 may be implemented by the following set of equations.

Let the tonic input I to the ERG ON 26a channel and ERG OFF 26b channel obey the equations

I = constant (9)

The tonic input I provides a constant baseline of activation in both ERG channels 26a and 26b, Fig. 4. This provides the energy for the transient rebound in the k^tn OFF channel afte the random input Jk⁺ to the k^tn ON channel is gated off by the pauser gate PG 42, whose activation is described by the variable gp. Without tonic input, OFF channel activation could never exceed zero.

Let input Jk⁺ to the k^tn ERG ON 26a obey the equation _{J+ =} ( 6 [μj - - °i, μj +~ ^σϊ\^R with prob. -}- ^k \ μj with prob.1 - - - (io)

Random noise values k⁺ are chosen from an interval of size σ centered around the average level μj. The term TTJ represent the average time that elapses between activation "spikes". Equation (10) represents one type of internal noise; namely, randomly distributed activation within a fixed interval. Other types of noise have also been shown to work. The ERG OFF 26b channels receive no random input (Jk^~≡0) .

The k^tn ERG ON 26a channel input layer 44a, whose activity is Xk⁺, obeys the equation dX£

= -ζXΪ + tø - X_fc ⁺)[J + ? (1 - g,)]. dt (11)

This equation describes leaky-integrator shunting dynamics. The Xk⁺ populations 44a receive a tonic input I and a random input J]⁺ 41. The input Jk⁺ 41 is gated shut by term (1-gp) when the PG 42 (gp) turns on, since gp switches from 0 to 1 at that time (Equation (18)). The relative values of the leakage rate ξ and saturation limit η compared to the magnitude of the inputs determine how sensitive the cell will be to fluctuations in the input noise.

Let the transmitter gate 50a, with activation Yk⁺ in the k^tn ON channel obey the equation

dY⁺

■= κ(λ - Y_k ⁺) - h(XΪ)Y,?. (12) dt

In (12) , transmitter Yk⁺ accumulates to a maximal level λ at the constant rate k and is inactivated, or habituates, at the activity-dependent rate h(Xk⁺), where h(X) = vx²+ ξX. (13)

The net ON channel signal 46a through the gate is

Xk⁺Yk (14)

which is proportional to the rate of. transmitter release. When solved at equilibrium, the system (12) , (13) and (14) gives rise to an inverted-U function of Xk⁺, namely,

χ+γ+ - ^{κ X}k _{( 15 )} ^{k k} ~ i ( Y+\² ι C Y+ ' ^{ '

The net output Ok⁺ of the k^tn ERG ON 26a channel, after opponent processing, obeys the equation

ot- ^χtv--τ;v]'- ₍₁₆₎

The outputs Ok⁺ are the inputs to the PPC 14, Fig. 3, populations of the AVITE 10 model, as in equation (1). The ERG OFF 26b outputs Ok^" obey the analogous equation

- _{1 a ∑_k o- > θ_P otherwise ( 17 )

The pauser gate PG 42 (activation gp) obeys the equation

o = [ ζY_k- -χtY_k ⁺]^Rr (is)

where θp is a fixed threshold. When multiple ERG modules are simulated, all of the ERG OFF 26b outputs Ok^" are summed at PG 42 via term ∑kOk^" in (18). This insures that all AVITE 10 modules are in their quiet phase at the same time, and that learning is synchronous across all movement-controlling joints.

The ERG 26 described by the preceding equations produce sufficiently random ERG ON vectors, 28a, Fig. 5A, to produce a random joint angle distribution. Fig. 5C, which is uniformly distributed both with respect to magnitude as indicated by histogram 5D, and phase, as indicated by histogram 5E. At the same time, ERG OFF activations 28b, Fig. 5B, which occur when ERG ON 26a is autonomously shut off, are of approximately uniform duration, so as to generate the opportunity for stable learning in AVITE 10.

For a two-joint configuration, two AVITEs, 10s, Fig. 6, for the shoulder joint and lOe for the elbow joint, may be connected in parallel. A single GO circuit 18 and a single now print gate 24 are used to operate both AVITEs 10s and lOe synchronously. Each AVITE 10s and lOe is associated with two ERGs 26. The 0⁺ output from each ERG channel 26a is supplied to the TPC and the 0^" output from each ERG is supplied to PG 42, so that both the shoulder and the elbow joint move and learn synchronously. The output PPC 14s drives shoulder joint 60, whose sweep is denoted ι. Elbow joint 62 is driven by the output from PPC 14e and its sweep is denoted by 62. It is this double joint angle distribution which is depicted in Fig. 5C.

As the number of random movements increases during the learning phase, the system adapts by decreasing the internal error as shown in Figs. 7A and B, representative of the agonist DV error and the antagonist DV error. Fig. 7C shows the decrease in total error of the system with respect to the number of random movements.

Simulations were generated on Sun and Silicon Graphics workstations. The code is written in C using double- precision floating point accuracy. A fourth order Runge- Kutta ODE solver was used for numerical integration. Step size was fixed for each simulation, but was varied occasionally to ensure accuracy of the numerical integration. A standard simulation was run with the LSODA integration package of Livermore Laboratories to confirm accuracy. The LSODA package uses adaptive step size and can automatically switch between stiff and non-stiff methods. The discontinuous nature of the input actually made the simple Runge-Kutta integrator significantly faster.

The equations describing the networks disclosed herein can be implemented in various ways. The simulations presented here are based on numerical integration on digital computers. The numerical results were used to drive a graphical interface. Alternatively, commercially available hardware could be used to transform signals from the computer's output channel into a format suitable for the control of a mechanical arm or robot. Such a graphical demonstration of correct adaptive control by AVITE 10 is shown in Fig. 8, where each small grid, Figs. A-F, illustrate the graphical display of the program with the arm in a position determined by two joint angles as indicated in Fig. 6. Figs. 8A-E show that the terminal reaching behavior improves at increasing levels of training. The learning rate is the same as in Fig. 7. Note that the same terminal reaching behavior can be achieved with a much higher learning rate and requires only a few hundred steps (a couple of babbled movements) without compromising stability. Each grid shows a visualization of two PPC agonist-antagonist pairs. In each part the arm 70 is started at rest: (P+ = p- = 0.5). Two pairs of agonist/antagonist TPC values are selected, mapped as a black triangle 72 in each grid, the GO signal is turned on (G = 1.0), and the PPC populations are allowed to integrate until they have equilibrated. Each part shows reaching performance after a certain number of babbling phases: Fig. 8A, after about for babbled movements, B after about eighty movements, C after about 120 movements, D after about 160 movements, E after 20 movements. Fig. 8F shows reaching after about 200 movements but for a different target, to demonstrate that as the error is reduced the ability of the arm to approach a target is independent of the target's position.

While thus far the disclosure has been cached in terms of agonist and antagonist pairs, this is not a necessary limitation of the invention. For example, as shown in Fig. while DV 16 and PPC 14 include agonist and antagonist pairs, TPC 12 does not. Rather, TPC 12 in Fig. 9 includes a plurality of neurons or neuron populations 80. Each neuron 80 is connected to NP 24b and NP 24a. Each neuron is also connected to both DV 16a and 16b as indicated by lines 22a and 22b in adaptive filter 22 with respect to neuron 80a. With the introduction of the spatially distributed TPC 12a in Fig. 9, equations (6) and (3) must be replaced with equations (19) and (20) as follows:

TPC may operate in a winner-take-all mode, in which one and only one neuron 80 responds with constant activation 90, or there could be distributive coding in which neuron 80a would receive greatest activation but neighboring ones would also be activated, with the degree of activation decreasing with lateral distance from neuron 80a. That different coordinates can be combined in the same AVITE 10a can be seen from the fact that neurons 80 of TPC 12a constitute a spatial map, whereas the neurons at DV 16 and PPC 14 are agonist/ antagonist pairs such as represent muscle coordinates as illustrated in earlier figures. While activation of agonist/antagonist pairs such as 14a and 14b are complementary, as indicated by bar graphs 100 and 102, the activation 90 in a spatial map is constant while the positio of the activation changes.

A winner-take-all, or maximal compression, spatial PPC →-TPC map is shown in Fig. 10. For these simulations, the spatial mapping and ensuing TPC recurrent competition were replaced by a spatially linear algorithm for computational simplicity. Thus, when the NP gate opens, the PPC activation (P⁺,P~) activates the TPC spatial position j (i.e., Tj = 1.0) according to the equation

j = N ^• P⁺ (21)

where N represents the total number of TPC nodes (N = 40 in the figure). Equation (21) maps (0,1) to the leftmost node and (1,0) to the rightmost TPC node, with a linear interpolation for intermediate (P⁺,P^") pairs. This is show in Fig. 10A.

Fig. 10B shows the LTM values Zj⁺ (synapse from Tj to V⁺, plotted as a solid line marked by '*'), and Zj~ (synapse from Tj to V^", plotted as a dashed line marked by 'x') after 100,000 steps (about 400 movements). Since Tj⁺ = 1, the input to (V⁺,V^~) equals (Zj⁺,Zj"). The plot confirms that th LTM traces have learned the correct linear transformation. The LTM traces near the extremities of the TPC field are zer because these positions have not been sampled during babbling. When the PPC activation range is transformed through a nonlinear spatial map, Fig. 11, the TPC node that becomes active when the NP gate is open is determined by:

S =

Equation (22) describes a sigmoidal shift function. This nonlinear shift causes central TPCs to be more densely sampled than extreme TPCs. Fig. 11A shows the transformati generated by equation (22) , and Fig. 11B shows that the VAM able to learn the reverse transformation.

We now consider map learning when TPC competition allo more than a single TPC node to be active during learning. Equations (4) and (5) imply that the synapses from all acti TPCs grow at the same rate to cancel the (V⁺,V^") activation. For the present simulations, we allowed the amplitude of TP activation to determine the rate of learning; namely, we replaced (5) by f(Tj)=Tj, so that (4) becomes

- L ₌- _gnT-{-βZf -₁V+). (23)

In this case, the synapses from all active TPC nodes will be driven to the same pattern (P⁺,P~) , but at different rates. Using a distributed map allows nodes to learn approximately correct synaptic gains even if their exact spatial locations have never been sampled through motor babbling. If a node has never been directly sampled, but its neighbors on both sides have, then that node learns a pattern that is an average of its neighbor's patterns, with a bias for the more frequently sampled pattern. If sampling only happened for neighbors to one side, that node will learn the same pattern as its neighbor.

Fig. 12 shows the results using the same sigmoidal ma as in Fig. 11A. When the NP gate opens, several TPC nodes become active, with activity decaying inversely with distance from the central peak. The reverse transformatio is learned correctly, and the distributed spatial map leads faster learning, as illustrated in Fig. 13: Fig. 13A shows LTM traces for the maximal compression sigmoidal map after 20,000 simulation steps (about 80 movements) . Many of the traces are still near zero. Fig. 13B shows results when th spatial map activates two nodes on each side of the central peak, and Fig. 13C when the activation includes five nodes either side. Comparison of these figures show that more distributed maps can more closely approximate the correct inverse sigmoid transformation at intermediate training levels.

The AVITE 10 must be able to distinguish between learning and performance trials without losing its ability remain on-line at all times. The ability to copy a stationary PPC into the TPC for learning could potentially lead to destabilizing effects: if the NP gate were open at all times, the PPC would be continuously copied into the TP even when it does not represent the same position in space the TPC. To prevent this, the NP gate and the ERG are inhibited whenever a voluntary movement occurs.

In order to autonomously carry out these control functions, there must exist internal states capable of discriminating between endogenous babbling, learning, and planned performance phase?- The babbling and learning phase are demarcated by specific events in the ERG: the pauser gate, or PG, becomes active at the onset of the quite phase and enables babbling to resume by becoming inactive. Henc the NP gate can be coupled to the PG, so that PPC will onl be copied into the TPC stage during the quiet phase (Fig. 6) In addition, a nonspecific arousal signal from the PG can b used to modulate learning, so that the TPC →- DV synapses are only plastic while the NP gate is open, as in equations (4) and (23) .

Gating of the learning signal is not required under so circumstances. If the learning rate is slower than the integration rate of PPC and TPC, then the amount of learnin that takes place during the quiet phases will be statistical significant, whereas learning of incorrect (PPC, PC) pairin will be statistically insignificant. This is due primarily to the symmetry of the learning law (4) . Because the LTM traces can increase and decrease at equal rates in response to negative or positive DV fluctuations, and because the movements during babbling tend to be random, errors due to learning during active babbling tend to zero.

In addition to gating learning off during endogenous movements, it is equally important to gate learning during reactive or planned movements. The ERG must also be shut of when an external target command (E⁺,E~) is instated at the TPC, as in equation (6) . This can be accomplished in two wa

In a TPC-mediated gate the populations that input a target command to the TPC can simultaneously send a nonspecific gating signal to shut off the NP and ERG gates.

In a GO-mediated gate the GO signal shuts off the ERG a prevents the current PPC from degrading the desired TPC. In this scheme, however, if the TPC becomes active before the G signal turns on, then the TPC may be altered by PPC feedback through the NP gate if passive or endogenous movements occur before activation of the GO signal. Notwithstanding this difficulty, a GO-activated gate is conceptually attractive, because the GO signal seems to be the counterpart, for reactive and planned movements, of the activity source which energizes the ERG during endogenous movements. Inhibition the ERG by the GO signal thus describes a competition betwe two complementary sources of arousal.

Simulations have shown that either alternative is workable. The following section shows how a GO-mediated ga can be used without causing a problem of spurious AVITE learning during motor priming.

Now assume that the AVITE TPC encodes agonist- antagonist, or muscle coordinates, and that there exists a processing level prior to the TPC that transforms spatially-encoded targets into muscle coordinates. In particular, show that an intermodal VAM can be used to learn this spatial-to-motor transformation, Fig. 14.

In order to unambiguously describe such a VAM cascade, in which spatial-to-motor and motor-to-motor transformation occur among TPCs, DVs and PPCs, we introduce the following notation. Let TPCs, 12s, denote a TPC coded in spatial coordinates, and TPCm, 12m, denote a TPC coded in motor coordinates. Correspondingly, let DVsm 16s, denote a DV that transforms TPCs, 12s, into TPCm, 12m. For notational simplicity, let DVm, 16m (rather than DVm ) denote a DV tha transforms TPCm into PPC within an AVITE 10m module. Thus t subsequent discussion considers the sequence of VAM transformations TPC_S →-DV_Sm →-TPCm →-DVm →-PPC, as shown in Fig 14. This example illustrates the need to ensure that TPC a PPC obey homologous equations. Within the intermodal VAM 1 TPCm acts as a PPC, 14s, while simultaneously acting as the TPC for the intramodal VAM (AVITE) 10m.

Assume that movements of the arm during babbling are tracked by the visual system. For simplicity, we first ass that a single population encodes the arm's position in spatial coordinates. During the quiet phase of each babbled movement, the P is directly copied into TPCm (motor TPC) , so that the latte accurately reflects the current outflow movement command signals for tuning the intramodal adaptive filter, 22s, of the TPCm "^DVm pathways. The intermodal VAM 10s, Fig. 14, transforms TPCs (spatial TPC) into TPCm via the intermodal DVsm. If the visual system accurately tracks the moving hand this DVsm approaches zero as the TPCs→- DVsm LTM traces learn the correct spatial-to-motor transformation. Fig. 15 shows learning by the intermodal LTM traces of the correct linear transformation from spatial position to motor coordinates. In this example, activation of the TPC_S was distributed to five nodes on either side of the activation peak, using a linear mapping as in Fig. 10A. Nonlinear transformations, such as those presented in earlier sections, have also been shown to work. In all cases, learning was driven by a DV equation such as equation (3) , with activity-dependent gatin as in equation (23) .

The intermodal VAM circuit 10s performs the same function as a standard AVITE 10 module, meaning that instatement of a spatial target at the TPCs with a non-zero GO signal leads to integration of the correct muscle-coordinate target by the TPCm, which in turn gives rise to a synchronous arm movement trajectory by the intramodal VAM, or AVITE, module. Instatement of a TPC_S command when the GO signal is zero primes a DVsm without disrupting the previously stored TPCm.

In addition to showing the versatility of the VAM, this scheme segregates intermodal and intramodal learning, and illustrates the principle of supercession of control in sensory-motor systems. The intramodal AVITE is the first to become trained, and it relies entirely on a measure of error based on internal feedback. Learning enables target command in muscle coordinates to generate correct feed-forward trajectory commands. At a higher level, the intermodal VAM requires feedback through the environment for learning, but is eventually able to generate feed-forward commands from TPCs to TPCm which are capable, in turn, of controlling arm movements through the calibrated AVITE.

This segregation of intermodal. and intramodal control simplifies gating in the AVITE. Because primed targets at the TPCs are unable to perturb the AVITE TPcm unless the GO signal is active, the NP gate can be left open whenever the GO signal is zero. The TPCm can thus continuously be updat to reflect the PPC at all times, except when the GO signal active, at which time the NP gate closes to avoid conflicts between intermodal target commands and intramodal training signals. Similarly, because the fast integration at the TP keeps it always similar to the PPC even during movement, th intramodal learning rate can be kept high and requires no gating. In fact, because TPCm and PPC are almost always equal — instead of only being equal during the quiet phase — error convergence in the DVm is significantly faster tha in the example of the previous sections even with the same learning rate. Furthermore, segregation of intermodal and intramodal target commands allows priming of target commands in spatial coordinates even during active AVITE babbling.

The GO-mediated gate also allows the AVITE circuit to continue its calibration of TPCm →-DVm LTM traces long afte the ERG is no longer spontaneously active. If for example the intramodal AVITE parameters become inadequate after initial calibration because of changes in the system due to growth or injury, the DV will exhibit nonzero activation whenever the GO signal is off, which will cause autonomous recalibration of the adaptive filter. Moreover, the intramodal learning rate can be chosen large, because the probability of spurious (TPC_m,PPC) correlations is small.

This scheme suggests how best to gate intermodal learning: intermodal learning between TPCs should be gated shut except when the DVm of the intramodal AVITE is small. The DVm is large in two cases: (1) if the PPC differs significantly from the TPC_m, or (2) if the pathways TPCm→- DV are incorrectly calibrated. In the former case, the arm has not yet approached its desired target; in the latter case, the target representation is unreliable. Neither of these cases is suited for intermodal learning. Thus the DVm stage can be used to gate learning at the next, intermodal DVsm stage. These observations show that the VAM scheme requires internal gating signals to denote periods during which stabl learning can take place, but that different gating schemes may be necessary at different levels.

The act of reaching for visually-detected targets in space is known to involve a number of different modalities: for instance, the position of the target on the retina and the position of the eyes in the head are needed to calibrate an eye movement. In addition, the position of the head in the body, and the position of the arm with respect to the body are needed for correct execution of an arm movement. I particular, the position of a target with respect to the bod can be represented by many combinations of eye positions in the head and target positions on the retina. We now show that a VAM is able to learn an invariant multimodal mapping; that is, it can learn to generate a correct moveπent command for all combinations of retinal and eye positions corresponding to a single target position. In Fig. 16 the t top spatial maps 12r, 12e represent the horizontal position of the target on the retina, and the horizontal position of the eyes within the head. For simplicity, consider one-dimensional spatial maps, and assume a linear relationship between the change in arm position and the tot change in retinal position and eye position. That is,

iE + JR = H (24)

where i∑ represents activation of the i^tn node of the eye position map; JR represents activation of the j^tn node in t retinal map; and H is linearly related to arm position. In particular, if there are N nodes in the eye-position map an M nodes in the retinal map, we let

H = (N + M)P⁺ (25)

By (24) , each fixed target position H can be represented by many combinations of eye position and retinal position. In particular equations (24) and (25) indicate that for a fixe AVITE outflow command (P⁺,P~), a rightward shift in eye position (iE increases) is cancelled by a leftward shift in retinal position (JR decreases) , and vice versa. Our resul herein show how to learn such a map using a VAM cascade. For the simulations, the arm position H during each quiet phase of babbling is mapped into one or more random (JE,JR) pairs that satisfy equations (24) and (25). These equations embody the assumption that intramodal learning ha already taken place in the eye movement system, so that the eyes can reliably track the moving arm. Then the active no iE in the eye position map and JR in the retinal position m can sample the current arm position registered at the AVITE TPCm. However, the intermodal VAM activation is affected b activity in both populations, so that the filtered signal from each population only needs to be half as strong as it would be if only one population were present. This is reflected in Fig. 17. Here the LTM traces have learned t correct linear map, but their values are half those achiev with a single map (Fig. 15) . After training, instatement a target (iE,JR) when the GO signal is positive, moves the arm to the correct location according to equations (24) an (25) . Changes in iε and JR such that iE + JR remains unchanged do not change the position of the arm.

Similar results hold if the two intermodal population are not in the same coordinate system. For example, the horizontal eye position could be coded by a pair of nodes that represent the muscle lengths for an agonist-antagonis pair of oculomotor muscles.

Although specific features of the invention are shown some drawings and not others, this is for convenience only each feature may be combined with any or all of the other features in accordance with the invention.

Other embodiments will occur to those skilled in the a and are within the following claims: What is claimed is:

Claims

1. An error based multidimensional learning and performance apparatus comprising a plurality of interconnected vector associative map systems, each including: a target map controller for setting a predetermined goal for the system; said target map controller including an adaptive filter; a present map controller for identifying the presen state of the system; and a difference vector network for aligning the state of the system identified by said present map controller with the target goal set by the target map controller and for calibrating said adaptive filter to zero the difference vector when said target goal and present states are aligned.

2. The error-based multidimensional learning and performance apparatus of claim 1 in which said vector associative map systems are connected in parallel.

3. The error-based multidimensional learning and performance apparatus of claim 1 in which said vector associative map systems are connected in series.

4. The error-based multidimensional learning and performance apparatus of claim 1 in which said target map controller includes means for transforming the current state of said present map controller into a command representative of the predetermined goal corresponding to that current stat of said present map controller.

5. The error-based multidimensional learning and performance apparatus of claim 1 in which said target map controller includes gating means for enabling modification of said adaptive filter only when said target goal and present state are aligned.

6. The error-based multidimensional learning and performance apparatus of claim 4 in which which said means for transforming includes gate means interconnected between said target map controller and said present map controller for intramodal learning.

7. The error-based multidimensional learning and performance apparatus of claim 4 in which said means for transforming includes means for feedback through the environment for intermodal learning.

8. The error-based multidimensional learning and performance apparatus of claim 1 in which said target map controller and said present map controller have different coordinate systems.

9. The error-based multidimensional learning and performance apparatus of claim 1 in which said target map controller and said present map controller encode commands in spatial coordinates.

10. The error-based multidimensional learning and performance apparatus of claim 1 in which said target map controller and said present map controller encode commands in amplitude coordinates.

11. The error-based multidimensional learning and performance apparatus of claim 1 in which said target map controller and said present map controller encode commands in agonist-antagonist coordinates.

12. The error-based multidimensional learning and performance apparatus of claim 1 in which said difference vector network aligns the state of the system identified by said present map controller with the target goal in the performance phase and calibrates said adaptive filter to zero the difference vector when said target goal and present states are aligned in the learning phase and said phases occur interstitially during operation of the system.

13. The error-based multidimensional learning and performance apparatus of claim 1 further including means for determining the rate at which said present map controller aligns with said target map controller.

14. The error-based multidimensional learning and performance apparatus of claim 1 further including means for endogenously generating sample training signals for modifying the present state of the system to inititate learning in the uncalibrated system.

15. The error-based multidimensional learning and performance apparatus of claim 14 in which said means for endogenously generating sample training signals includes complementary mechanisms for generation of active and quiet phases.

16. The error-based multidimensional learning and performance apparatus of claim 14 in which said means for endogenously generating sample training signals includes autonomous means for switching said mechanisms for generatio of said active and quite phases.

17. The error-based multidimensional learning and performance apparatus of claim 16 in which said means for switching includes means for repetitive, alternating activation of said mechanisms for generating active and quit phases.

18. The error-based multidimensional learning and performance apparatus of claim 14 in which said means for generating includes means for establishing a full sample of representative vectors for modifying the present state of the system.

19. The error-based multidimensional learning and performance apparatus of claim 15 in which said mechanism for generating quite phases include means for controlling the means of transforming.

20. The error-based multidimensional learning and performance apparatus of claim 15 in which said means for endogenously generating sample training signals further includes means for inactivating itself after an initial training period.

21. A vector associative map system for unsupervised, real-time, error-based learning and performance comprising: a target map controller for setting a predetermined goal for the system; said target map controller including an adaptive filter; a present map controller for identifying the present state of the system; and a difference vector network for aligning the state of the system identified by said present map controller with the target goal set by the target map controller and for calibration of said adaptive filter by zeroing the difference vector when said target goal and present states are aligned.