WO2007020456A2

WO2007020456A2 - Neural network method and apparatus

Info

Publication number: WO2007020456A2
Application number: PCT/GB2006/003093
Authority: WO
Inventors: Heige Nareid
Original assignee: Axeon Limited
Priority date: 2005-08-19
Filing date: 2006-08-18
Publication date: 2007-02-22
Also published as: WO2007020456A3; WO2007020466A3; WO2007020466A2

Abstract

The invention relates to a method of training a neural network apparatus (10). The neural network apparatus (10) comprises a neural network (12), which has a plurality of neurons (14) and at least one function processor (16) operable to receive an output (28) from at least one of the plurality of neurons and to provide a processor output (40) in dependence upon the received output . The method comprises receiving a first set of training data in the neural network, the neural network being operative to adopt a trained response characteristic in dependence upon the received first set of training data. A second set of training data is received in the function processor, the function processor being operative to adopt a trained response characteristic in dependence upon the received second set of training data. The function processor is operative to adopt its trained response characteristic after the neural network is operative to adopt its trained response characteristic.

Description

Title: Neural Network Method and Apparatus

Field of the invention

The present invention relates to neural network apparatus. More specifically, the present invention relates to a method of training a neural network apparatus, to a control apparatus comprising a neural network and to change detection apparatus comprising a neural network.

Background to the invention

Neural network technology is used in a variety of applications to which conventional computer programming and processing techniques can be unsuited. Such applications include data classification, pattern recognition, control and function approximation. Function approximation using artificial neural network (ANN) techniques is data driven, in the sense that the function being approximated is derived from data generated by the function. When using a neural network for function approximation, the network will initially be trained on data representative of the state space of the function.

Different applications have seen the development of different neural network architectures, such as SeIf- Organising Map (SOM) networks described in Kohonen, T., ISBN 3-540-67921-9, published by Springer, or Learning Vector Quantisation (LVQ) networks, and a number of derived models .

An alternative neural network architecture is the subject of International Patent Publication number WO 00/45333 in the name of Axeon Limited, and is marketed under the Vindax® technology brand. The technology described in WO 00/45333 reflects a modular approach to neural network architecture, based on an adaptation of the Kohonen SOM algorithm. The technology is generally referred to as the modular map processor or architecture.

An application for modular map technology is in data classification, in which a neural network apparatus is operative to select a discrete value from a set of possible output values. Modular map technology has also been used in control applications involving control of mechanical actuators. In such control applications, the modular map technology is used to provide an approximation of a function that models a physical system, .such as a machine or a plant, which forms part of a control system of which the mechanical actuator being controlled forms an integral part. US Patent Publication Number US 2003/0167095 Al in the name of Axeon Limited describes such a function approximation application. According to US 2003/0167095, each processing element or neuron is associated with a specific function value. The possible output states are therefore limited to discrete values, with the number of output states being limited by the number of neurons in the network. This results in a granular output, which can be unacceptable in many applications.

Methods have been proposed to mitigate granularity effects. Such methods include averaging, function separation and interpolation. Each method can improve the numerical resolution of the modular map. However, each method has its disadvantages.

Averaging typically requires multiple passes of the data through the neural network for each output. This has a consequential reduction in the effective output rate of the system.

Function separation is not possible in all cases and may have the undesirable effect of adding processing overheads .

Interpolation can add considerable post-processing complexity to the function estimation process, thus partially negating the advantages of the neural network function approximation technique.

The present inventors have appreciated the shortcomings of prior art approaches to the use of neural network technology in a range of applications, including function estimation and control. It is therefore an object of the invention to provide a method and apparatus, which makes use of a neural network and which addresses the disadvantages of the prior art .

It is a further object of the invention to provide a method and apparatus which makes use of a neural network and which is configured for function estimation.

It is a further object of the invention to provide an improved method and apparatus which makes use of a neural network to model a physical system.

Further objects of the invention will become apparent from a reading of the following description.

Statement of invention

According to a first aspect of the present invention, there is provided a method of training a neural network apparatus, the neural network apparatus comprising a neural network, which has a plurality of neurons, and at least one function processor operable to receive an output from at least one of the plurality of neurons and to provide a processor output in dependence upon the received output, the method comprising: receiving a first set of training data in the neural network, the neural network being operative to adopt a trained response characteristic in dependence upon the received first set of training data, and receiving a second set of training data in the function processor, the function processor being operative to adopt a trained response characteristic in dependence upon the received second set of training data, in which the function processor is operative to adopt its trained response characteristic after the neural network is operative to adopt its trained response characteristic.

The neural network apparatus has what may be considered to be a two layer structure, with the first layer comprising the neural network and the second layer comprising the function processor. In an example of an application, the neural network apparatus may be intended for use in modelling a physical system, such as a machine. As such, the neural network and the function processor may be configured, by means of their respective response characteristics, to model the operational envelope (or state space) of the machine. The number of neurons in the neural network imposes a limit on the accuracy of the function approximated (or the model provided) by the neural network of itself. Thus, the function processor provides a means whereby an increase in accuracy (i.e. a reduction in granularity) may be obtained. The increase in accuracy may be obtained by the function processor providing a further function approximation within a subspace (of the total state space) associated with at least one neuron of the neural network.

The method of training a neural network apparatus according to the present invention takes advantage of the architecture described in the immediately preceding paragraph by training the neural network on a first set of training data and thereafter training the function processor on a second set of training data. Thus the two training stages can have independent dynamics. This means that more rapid convergence can be obtained during training compared with an approach in which the neural network and function processor are trained at the same time.

More specifically, the second set of training data may be received in the function processor after the first set of training data is received in the neural network.

Alternatively or in addition, the first set of training data may be different from the second set of training data. Thus, data contained in the first and second sets may be determined to provide for at least one of: an improved rate of convergence during training; and an improvement in a degree of accuracy of a function approximated by the neural network apparatus.

More specifically, the second set of training data may be a subset of the first set of training data. For example, the second set of training data may comprise data of the first set of training data, which is associated with a subspace of the neuron from which the function processor is operative to receive an output. Thus, the second set of training data may be determined in dependence upon the first set of training data.

Alternatively or in addition, the method may further comprise a step of receiving a third set of training data in the function processor, the function processor being operative to modify its trained response characteristic in dependence upon the received third set of training data . More specifically, the third set of training data may comprise at least one data element not comprised in the second set of training data.

More specifically, the at least one data element not comprised in the second set of training data may be determined based on an analysis of the trained response characteristic adopted in dependence upon the received second set of training data. For example, where the analysis determines that the response characteristic is based upon insufficient data elements to properly characterise a function, further appropriate data elements may be determined and be comprised in the third data set.

Alternatively or in addition, the at least one data element not comprised in the second set of training data may be determined based on a response characteristic of at least one further function processor associated with at least one neuron neighbouring the neuron from which the output is received by the function processor. Thus, the content of the third data set can be determined to reduce a discontinuity that may be present in a transition between the subspace associated with the neuron from which the output is received by the function processor and at least one neighbouring subspace.

Alternatively or in addition, the neural network apparatus may comprise a plurality of function processors, each of the plurality of function processors being operable to receive an output from a respective neuron of the neural network. In a first form, the neural network apparatus may comprise at least a same number of function processors as neurons in the neural network, with each of the function processors being operative to receive an output from a respective neuron of the neural network. Thus, a set of weights of a reference vector of a neuron may be stored in its associated function processor.

In a second form, the neural network apparatus may comprise a plurality of function processors, each of the function processors being operable to receive outputs from a plurality but not all of the neurons of the neural network (e.g. four) neurons. Thus, sets of weights of reference vectors of the plurality of neurons may be stored in the associated function processor and the neural network apparatus may be operative to select, for use, a corresponding one of the sets of weights. The selection may be in dependence upon selection of one of the plurality of neurons, i.e. operation of the neural ^■ network that determines the so-called "winning" neuron. For example, the selection may be by means of a so-called "pointer", which is a form of software or firmware function, to one of the sets of weights.

In a third form, the neural network apparatus may comprise one function processor operable to receive an output from each of the plurality of neurons in the neural network. Thus, sets of weights for the function processor may be stored in the neural network apparatus, and the function processor may be operative, in use, to receive a set of weights corresponding to an operative one of the plurality of neurons. Alternatively or in addition, the neural network apparatus may be operative such that a location of an input to the neural network apparatus within a subspace associated with a neuron is passed to the function processor.

Alternatively or in addition, the neural network may be comprised in an unsupervised neural network.

More specifically, the neural network may be comprised in a modified Kohonen Self-Organising Map neural network.

Alternatively, the neural network may be comprised in one of a Self-Organising Map (SOM) neural network and a Learning Vector Quantization (LVQ) neural network.

When the neural network apparatus has been trained according to the present invention, an overall response characteristic of the neural network apparatus may correspond to a function that defines a model, e.g. of a physical system such as a machine or a plant.

More specifically, the neural network (i.e. what may be considered to be the first layer of the neural network apparatus) may be operative to provide a first approximation to the model. Thus, the at least one function processor may be operative to provide an improved approximation to the model in relation to the first approximation and in a subspace of the model associated with the neuron of the neural network that provides an output to the function processor. Alternatively or in addition, the trained response characteristic of the function processor may comprise a numerical function. For example, the numerical function may be a linear polynomial. Thus, the trained response characteristic of the function processor, which defines a part of the model defined by an overall response characteristic of the neural network apparatus, can be simple comparison to the model defined by the overall response characteristic. Hence, complicated models can be accommodated by the neural network apparatus by means of the neural network and function processor structure whilst reducing processing demands.

Alternatively or in addition, the at least one function processor may comprise at least one perceptron of a further neural network.

The present inventors have appreciated that a neural network architecture comprising a neural network and at least one function processor can have wider application than hitherto described.

Thus, according to a second aspect of the present invention there is provided a control apparatus comprising: a neural network having a plurality of neurons, the neural network being configured to receive an input corresponding to at least one measured physical parameter and being operative to generate an output from one of the plurality of neurons in dependence on the received input and a trained response characteristic of the neural network; a function processor operable to receive the output from the neuron and to provide a processor output in dependence upon the received output and a trained response characteristic of the function processor; and an actuator that, in use, is controlled in dependence upon the processor output .

More specifically, the control apparatus may comprise a plurality of function processors.

More specifically, the control apparatus may comprise fewer function processors than neurons in the neural network. Therefore, the neural network apparatus may comprise a plurality of function processors, each of the function processors being operable to receive outputs from a plurality of (e.g. four) neurons. Thus, sets of weights of reference vectors of the plurality of neurons may be stored in the associated function processor and the neural network apparatus is operative to select, for use, a corresponding one of the sets of weights. The selection may be in dependence upon operation of one of the plurality of neurons. For example, the selection may be by means of a pointer to one of the sets of weights.

Alternatively, the neural network apparatus may comprise at least a same number of function processors as neurons in the neural network, with a function processor being operative to receive an output from a respective neuron of the neural network. Thus, a set of weights of a reference vector of a neuron may be stored in the associated function processor. In another form, the neural network apparatus may comprise one function processor operable to receive an output from each of the plurality of neurons in the neural network. Thus, sets of weights for the function processor may be stored in the neural network apparatus, and the function processor may be operative, in use, to receive a set of weights corresponding to an operative one of the plurality of neurons.

Alternatively or in addition, the control apparatus may be configured such that the output from the one neuron is received in a neighbouring function processor, the neighbouring function processor being operative to provide a neighbourhood processor output. In use, the processor output and neighbourhood processor output may be used to provide for an improvement in approximation accuracy towards a transition between the subspaces of the neighbouring function processors.

An overall response characteristic of the neural network apparatus may correspond to a function that defines a model of at least part of a system, e.g. a machine or a plant, to which the actuator belongs and which is controlled by means of the method.

More specifically, the neural network (i.e. what may be considered to be the first layer of the neural network apparatus) may be operative to provide a first approximation to the model. Thus, the at least one function processor may be operative to provide an improved approximation to the model in relation to the first approximation and in a subspace of the model associated with the neuron of the neural network that provides an output to the function processor.

Alternatively or in addition, the trained response characteristic of the function processor may comprise a numerical function. For example, the numerical function may be a linear polynomial. Thus, the trained response characteristic of the function processor, which defines a part of the model defined by an overall response characteristic of the neural network apparatus, can be simple in comparison to the model defined by the overall response characteristic. Hence, complicated models can be accommodated by the neural network apparatus by means of the neural network and function processor structure whilst reducing processing demands.

The control apparatus may be configured for operation with at least one of an internal combustion engine and oil/gas apparatus.

Further embodiments of the second aspect of the present invention may comprise one or more features of the first aspect of the present invention.

According to a third aspect of the present invention there is provided an automobile comprising control apparatus according to the second aspect of the present invention.

Embodiments of the third aspect of the present invention may comprise one or more features of the second aspect of the present invention. According to a fourth aspect of the present invention there is provided a method of controlling an actuator, the method comprising receiving an input corresponding to at least one measured physical parameter in a neural network having a plurality of neurons, the neural network operating to generate an output from one of the plurality of neurons in dependence on the received input and a trained response characteristic of the neural network; receiving the output from the one neuron in a function processor, the function processor operating to provide a processor output in dependence upon the received output and a trained response characteristic of the function processor; and controlling an actuator in dependence upon the processor output.

Embodiments of the fourth aspect of the present invention may comprise one or more features of the second aspect of the present invention.

According to a further aspect of the present invention, there is provided a change detection apparatus comprising: a neural network having a plurality of neurons, the neural network being configured to receive an input and being operable to generate an output from one of the plurality of neurons in dependence on the received input and on a trained response characteristic of the neural network; a function processor operable to receive the output from the one neuron and to provide a processor output in dependence upon the received output and a trained response characteristic of the function processor; and an indicator module operative to determine if an input received by the neural network is outside a state space defined by the trained response characteristic of the neural network and provide an indication output in dependence thereon.

More specifically, the indicator module may be operative in dependence upon at least one distance metric of the neural network. Thus, the received input may be compared with the at least one distance metric. The change detection apparatus may be operative to determine a confidence level metric in dependence upon the received input and the at least one distance metric. The determination may be based upon a comparison between the received input and the confidence level metric.

Alternatively or in addition, the function processor may be configured, in dependence on a determination by the indicator module that an input is outside the state space defined by the trained response characteristic, to provide a processor output in dependence upon an extrapolation based on its trained response characteristic.

Further embodiments of the further aspect of the present invention may comprise at least one feature of any of the previous aspects of the present invention.

According to a yet further aspect of the present invention there is provided a method of approximating a multi-dimensional function, the method comprising the steps of: receiving, in a first neural network, an input vector from an input space; deriving location data representing the location of the input vector within a subspace of the input space; presenting the location data to a numerical estimator; and calculating, in the numerical estimator, a numerical output value using the location data.

More specifically, the method may model a physical system having a number of variables, the physical system being represented by the multi-dimensional function. The input vector may represent parameters of the system from an input space representing an operational envelope of the system. The numerical output value may represent an output of the system.

According to a yet further aspect of the present invention, there is provided apparatus for approximating a multi-dimensional function, the apparatus comprising: a first processing layer comprising a first neural network having a plurality of processing elements; and a second processing layer comprising at least one numerical estimator; wherein the first processing layer is adapted to receive an input vector and the second processing layer is adapted to provide a numerical output value in response to data received from the first processing layer.

More specifically, the second processing layer may comprise a second neural network. There will now be described, by way of example only, various embodiments of the invention with reference to the following drawings, in which:

Figure 1 is a schematic representation of the components of an embodiment of the invention;

Figure 2 is a block diagram showing steps forming part of a method according to an embodiment of the invention; and

Figure 3 is a representation of a two-dimensional input space with subspaces associated with processing elements.

Referring firstly to Figure 1, there is shown a schematic representation of components of a neural network architecture according to an embodiment of the invention. The system, generally depicted at 10, is a two-layered neural network, where data are passed sequentially from the first layer to the second layer. The first layer is referred to as the selector layer 12, and the second layer as the estimator layer 16.

The selector layer 12 comprises a neural network 13 consisting of a plurality of processing elements or neurons 14. The neural network 13 is, in this example, a neural network modular map using a modified Kohonen SOM, of the type described in WO 00/45333.

The primary function of the selector layer is to determine which region of the input space an input vector belongs to. It can also be used for extracting additional information, described in more detail below.

The estimator layer 16 comprises a plurality of numerical estimators 18, which, are in this example perceptron processing elements of a second neural network. The numerical estimator provides a single numerical output 40 for a multi-dimensional input vector, such as a polynomial of first, second or higher order or a sum of sigmoid functions. The numerical estimator 18 will normally be characterised by a set of coefficients, often called weights in neural network terminology. Each numerical estimator 18 is associated with a processing element 14 of the selector layer 12.

The neural network 13 is trained according to the normal method on training data representing the state space of the function to be estimated, and each processing element in a trained network will have an associated reference vector. The reference vector will be of the same dimension as input vectors 22 presented to the system.

The estimator layer 16 is trained using a data set identical or similar to the data set used to train the selector layer, and is provided with associated actual numerical values for each input vector of the training data. The numerical estimator is, for example, trained using an optimising technique, where the numerical estimator coefficients are optimised so that they minimise the errors between the actual numerical values and the values calculated by the numerical estimator from the input vector. The errors can be evaluated using a merit function, such as a Root Mean Square (RMS) error estimate. Further details of the training of the estimator layer 16 are given below.

Figure 2 is a block diagram representing steps of the method carried out in the selector layer 12 and the estimator layer 16.

Initially, the trained selector layer 12 is presented with an input vector 22. The input vector 22 is compared to the reference vectors of all the processing elements in this layer, according to the algorithm implemented in the neural network modular map 13. The reference vector which is most similar to the input vector 22 is selected, and the processing element with which this reference vector is associated is identified (step 24) as the winning processing element 15.

Each processing element 14 will be the winning processing element for a subset of input vectors from the set of possible input vectors. Each processing element 14 may thus be associated with a localised subspace within the multidimensional hyperspace spanned by the set of possible input vectors. This subspace will contain the reference vector of the processing element 14. This is an inherent property of modular map networks and related neural network architectures such as the SOM and LVQ architectures.

Figure 3 is a graphical representation of a two- dimensional input space, generally depicted at 30. Reference vectors for the individual processing elements are shown as points 31, while the area (which in the general, higher dimensional case is a subspace) associated with each processing element is shown as an irregular polygon 32.

For any input vector, there will be a responding processing element with an associated subspace of the entire input space. In this technique, the selector layer 12 of the system is used to determine which subspace an input vector 22 is associated with.

When the associated subspace has been identified, the location of the input vector within that subspace is determined (step 26) . The location of the input vector 22 within the localised subspace can either be represented relative to the reference vector of the processing element 15 associated with the subspace, or relative to another fixed point within the total input space. Although either technique is valid, it is likely that using a local reference point will be advantageous from a numerical computation perspective, since the numerical values will be smaller.

The location of the input vector within the localised subspace of the input space is input (step 28) to the numerical estimator 19 that is associated with the winning processing element 15. Other information generated by the selector layer, such as a distance value (the distance of the input vector from the local reference vector) , may be used as an additional input (step 28a) for the estimator layer.

The additional input could include an indication of whether the input vector is located within the state space represented by the training data. This indication can be derived using the distance metric inherent in SOM- type networks. The indication can also be used to indicate whether the system is interpolating or extrapolating.

The system may use a reinforced metric, being the result or product of the distance metric of the selector layer and a numerical label applied to each of the selector layer processing elements. This numeric label provides further information relative to defining the input space. Thus, the distance metric alone, or a metric including or derived from the distance metric can be used.

The numerical estimator 19 is in this example implemented as a perceptron, which is trained on the subset of the data training set which activates the processing element in the selector layer with which it is associated. That is, it is trained on data which would cause the processing element to be identified as the winning processing element. The training data for the numerical estimator thus is representative of a subspace of the input space.

The numerical estimator 19 calculates a numerical value (step 29) and provides a numerical output (step 40), corresponding to the original input vector.

The system operates on the assumption that the complexity of the function within each subspace of the input space is less than complexity of the function over the entire input space. This allows acceptable numerical accuracy to be achieved with a simpler estimator function than would be required for adequate estimation over the entire input space. The estimator will calculate an estimated numerical function value for the input vector it has received. Since the estimator function will be a relatively simple function, it will be well suited for hardware implementations, but could equally be implemented in software.

The estimator layer is trained after the selector layer.

In an alternative training method, the training data may also include those data which activate a neighbourhood of processing elements around the associated processing element 15, during all or part of the training. The definition of a neighbourhood in this context may be similar to the definition of a neighbourhood in a modular map given in WO 00/45333 (the neighbourhood comprises those processing elements 14 with reference vectors falling within a predefined distance metric) , or may correspond to a logical cluster of processing elements. This enables the system to map the probability density distribution of the input data with better definition at the extremes or transitions between of the local subspace(s).

The accuracy of the estimator can be assessed during the training process. Where the accuracy of a particular estimator is insufficient, it is possible to bias the training data for the selector layer in such a way that the particular subspace represents a greater proportion of the training data. This can be used to "subdivide" the problematic subspace and potentially achieve better accuracy in the problem areas. This will result in another training cycle for the network; this process can be repeated until an optimum selector network configuration and size has been found.

The network configuration of this embodiment may be implemented fully or partially in hardware. For the selector layer, a hardware implementation is preferred, and the hardware will be substantially similar to the hardware described in WO 00/45333. The estimator layer may be implemented in software, e.g. as software operating on a general purpose computer platform or in hardware. Possible implementations include the following: i . The estimator layer has a dedicated estimator for each processing element in the selector layer. In this case, the weights for the reference vector of the associated processing element are permanently stored in the estimator. ii. The estimator layer comprises a single generic estimator which is able to receive both its weights and its inputs from the selector layer (which stores the associated weights for each of its processing element) . iii. The estimator may comprise a number of estimators, each of which serves a cluster of selector processing elements (e.g. 4). In this case, the weights are stored in the estimator, and the selector layer provides an input with a pointer to the correct set of weights to be used.

The present invention has numerous applications in the modelling and control of physical systems. In an application, a requirement will typically be to model a non-linear multi-dimensional function that represents a relationship amongst parameters of a physical system, for example a machine or plant. Such a function will be of the general form: Y = f(x_v..x_n)

The function value Y is assumed to be a numerical value. The set of input values xχ...x_n is termed the input vector, and the number of components n in the input vector is the dimensionality of the vector. The full set of values which can be potentially held by the input vector is the input space of the input vector, which can be visualised as an n-dimensional hyperspace. The state space of the function Y is the subspace of the input space which contains the actual range of function inputs, and will normally be significantly smaller than the potential input space. For a model of a physical system or plant, the state space will effectively be the full operational envelope of the system or plant.

An optimised input vector will consist of the minimum number of linearly independent components required to map the complete state space of the function. Full linear independence of the vector components is not a requirement, and indeed in most practical applications, some interdependence among input vector components is to be expected. The only necessary requirement for the input vector is that it completely fills the state space of the function, and that will in many cases result in a higher number of vector components, and thus dimensionality than strictly necessary. The function f(...) is assumed to be at least partially continuous, that is, continuous over discrete areas of the input space. The function will also be deterministic, that is that the function has a single output value for any given input vector xi...x_n. If the latter requirement is not fully satisfied, the situation may frequently be remedied by increasing the dimensionality of the input vector.

Beyond the above-mentioned limitations, which are necessary in order to establish a meaningful functional relationship, there may not be anything further known about the function. In particular, the function need not typically be known in an analytical form, nor need an algorithm be known (or found) to calculate the function value. A function estimation technique will typically be required to operate on the basis of the above information and assumptions alone.

A particular application is the estimation of mass airflow in an internal combustion engine. Accurate estimation allows control of the air/fuel ratio fed into the cylinders as closely as possible, which impacts on engine performance, fuel economy and emissions to the environment.

It has been shown that the mass air flow can be estimated, and control effected based on such estimates, from the measurement of various parameters of the engine, such as engine speed, manifold air pressure, intake air temperature and throttle position. The present invention allows results to be achieved using a significantly smaller network than the networks used in the previous proposals. This will facilitate implementation in an embedded control system where resources may be limited.

The invention can also be used in the control system described in US Patent Publication Number US 2003/0167095 Al in the name of Axeon Limited. The present invention can provide better accuracy, a smaller required network size or both in combination when compared with the implementation of US 2003/0167095 Al. The input parameters in this case can for example be desired actuator position, actual actuator position, actuator velocity, hydraulic pressure and temperature. The output in this implementation is used to provide an actuator control signal. Again, a solution providing appropriate accuracy with reduced overheads and resources is significant for implementation of embedded control systems .

The present invention also finds application in virtual sensing in alternative application areas, such as oil/gas wellhead control systems, where sensor replacement may be prohibitively expensive. Typical input parameters include valve position indicators, temperatures, other pressure signals, and flow rates. In such applications, the system behaviour is controlled to not vary significantly and thus large quantities of "similar" data are produced. However, when requirements do change, the periods of transients are relatively brief. Thus the transients are not well represented by the available data, and the ability to extrapolate from incomplete data sets becomes significant. The ability of the present invention to provide the high accuracy required in the transient regions of system behaviour is significant in this kind of application.

In an alternative application, the apparatus and method are used as a novelty filter or change detector. In this application, the selector layer is used to determine whether a specific input vector is within the state space on which the network has been trained.

The input vector is presented to the selector layer 12, which will determine which processing element 14 responds to the input vector, that is, which is the winning processing element 15. The input vector is determined to be located within the subspace of the total input hyperspace that is associated with the processing element 19. The location of the input vector within that subspace is subsequently passed to the estimator layer 16, where the numerical estimator function associated with this particular subspace is used to provide a numerical output.

By comparing the input vector with training data distance metrics, its location with respect to the state space can be determined. In the event that the input vector is outside the state space on which the network has been trained, a distance metric is obtained from the selector layer 12 and used to provide an out-of-range indicator. Alternatively the estimator 19 can provide an extrapolated output value for the input vector. The two methods can also be combined, so that the extrapolated numerical output value for the input vector can be associated with a confidence level derived from the out- of-range indicator. Although the above-described embodiment refers to a selector layer having a modular map network implementation, in an alternatively embodiment another network implementation of a similar type, such as a SOM or LVQ network could be used.

The modular map implementation is preferred as it has a number of advantages for function estimation. An advantage of this class of neural network architectures is that it maps the n-dimensional state space to a two- dimensional surface. The mapping retains the statistical distribution of the training data used, so that the area occupied on the modular map by a region in the state space is roughly proportional to the cumulative probability of the region within the training data. This property ensures that the entire state space of the training data will be properly mapped.

Another important property is that relationships between data points are retained, in the sense that points which are close to each other in the original input space remain close to each other in the trained modular map. This is one reason why neural networks of the self- organising map family are frequently used for visualisation of complex. multi-dimensional state spaces.

The embodiment described above has the estimator layer implemented as a perceptron. In an alternative embodiment of the invention, the numerical estimator comprises a numerical function which outputs a plurality of numbers, for instance by performing a numerical transform, such as a Fourier or wavelet transform on the input vector, or a data set associated with the input vector, with coefficients for the transform provided by the selector network.

In the embodiment described above, the location of the input vector is passed 28 to only one of the numerical estimators 15 in the estimator layer 16, being the numerical estimator associated with the winning processing element in the selector layer 12. In a variation of the invention may also pass the data to estimators 18 neighbouring the estimator 19.

Claims

CLAIMS :

1. A method of training a neural network apparatus, the neural network apparatus comprising: a neural network, which has a plurality of neurons; and at least one function processor operable to receive an output from at least one of the plurality of neurons and to provide a processor output in dependence upon the received output, the method comprising: receiving a first set of training data in the neural network, the neural network being operative to adopt a trained response characteristic in dependence upon the _ received first set of training data, and receiving a second set of training data in the function processor, the function processor being operative to adopt a trained response characteristic in dependence upon the received second set of training data, in which the function processor is operative to adopt its trained response characteristic after the neural network is operative to adopt its trained response characteristic.

2. A method according to claim 1, in which the second set of training data is received in the function processor after the first set of training data is received in the neural network.

3. A method according to claim 1 or claim 2, in which the first set of training data is different from the second set of training data.

4. A method according to claim 3 , in which the second set of training data is a subset of the first set of training data.

5. A method according to claim 4, in which the second set of training data comprises data of the first set of training data, which is associated with a subspace of the neuron from which the function processor is operative to receive an output .

6. A method according to any preceding claim, in which the method further comprises a step of receiving a third set of training data in the function processor, the function processor being operative to modify its trained response characteristic in dependence upon the received third set of training data.

7. A method according to claim 6, in which the third set of training data comprises at least one data element not comprised in the second set of training data.

8. A method according to claim 7, in which the at least one data element not comprised in the second set of training data is determined based on an analysis of the trained response characteristic adopted in dependence upon the received second set of training data.

9. A method according to claim 7 or claim 8, in which the at least one data element not comprised in the second set of training data is determined based on a response characteristic of at least one further function processor associated with at least one neuron neighbouring the neuron from which the output is received by the function processor.

10. A method according to any preceding claim, in which the neural network apparatus comprises a plurality of function processors, each of the plurality of function processors being operable to receive an output from a respective neuron of the neural network.

11. A method according to any preceding claim, in which the neural network apparatus comprises at least a same number of function processors as neurons in the neural network, with each of the function processors being operative to receive an output from a respective neuron of the neural network.

12. A method according to any one of claims 1 to 10, in which the neural network apparatus comprises a plurality of function processors, each of the function processors being operable to receive outputs from a plurality but not all of the neurons of the neural network.

13. A method according to claim 12, in which sets of weights of reference vectors of the plurality of neurons are stored in the associated function processor and the neural network apparatus is operative to select, for use, a corresponding one of the sets of weights.

14. A method according to claim 13, in which the selection of a set of weights is in dependence upon selection of one of the plurality of neurons during operation of the neural network apparatus.

15. A method according to any one of claims 1 to 10, in which the neural network apparatus comprises one function processor operable to receive an output from each of the plurality of neurons in the neural network.

16. A method according to claim 15, in which sets of weights for the function processor are stored in the neural network apparatus, and the function processor is operative, in use, to receive a set of weights corresponding to an operative one of the plurality of neurons.

17. .A method according to any preceding claim, in which the neural network apparatus is operative such that a location of an input to the neural network apparatus within a subspace associated with a neuron is passed to the function processor.

18. A method according to any preceding claim, in which the neural network is comprised in an unsupervised neural network.

19. A method according to claim 18, in which the neural network is comprised in a modified Kohonen SeIf- Organising Map neural network.

20. A method according to any preceding claim, in which the neural network is comprised in one of a SeIf- Organising Map (SOM) neural network and a Learning Vector Quantization (LVQ) neural network.

21. A method according to any preceding claim, in which the trained response characteristic of the function processor comprises a numerical function.

22. A method according to any preceding claim, in which the at least one function processor comprises at least one perceptron of a further neural network.

23. A control apparatus comprising: a neural network having a plurality of neurons, the neural network being configured to receive an input corresponding to at least one measured physical parameter and being operative to generate an output from one of the plurality of neurons in dependence on the received input and a trained response characteristic of the neural network; a function processor operable to receive the output from the neuron and to provide a processor output in dependence upon the received output and a trained response characteristic of the function processor; and an actuator that, in use, is controlled in dependence upon the processor output.

24. A control apparatus according to claim 23, in which the control apparatus comprises a plurality of function processors.

25. A control apparatus according to claim 24, in which the control apparatus comprises fewer function processors than neurons in the neural network.

26. A control apparatus according to claim 25, in which each of the function processors is operable to receive outputs from a plurality of neurons .

27. A control apparatus according to claim 23, in which the neural network apparatus comprises at least a same number of function processors as neurons in the neural network, with each function processor being operable to receive an output from a respective neuron of the neural network.

28. A control apparatus according to claim 23, in which the neural network apparatus comprises one function processor operable to receive an output from each of the plurality of neurons in the neural network.

29. A control apparatus according to any one of claims 23 to 28, in which the control apparatus is configured such that the output from the one neuron is received in a neighbouring function processor, the neighbouring function processor being operable to provide a neighbourhood processor output.

30. A control apparatus according to any one of claims 23 to 29, in which the control apparatus is configured for operation with at least one of an internal combustion engine and oil/gas apparatus.

31. An automobile comprising control apparatus according any one of claims 23 to 30.