WO2016033708A1

WO2016033708A1 - Apparatus and methods for image data classification

Info

Publication number: WO2016033708A1
Application number: PCT/CN2014/000825
Authority: WO
Inventors: Xiaoou Tang; Shuo YANG; Ping Luo; Chen Change Loy
Original assignee: Xiaoou Tang
Priority date: 2014-09-03
Filing date: 2014-09-03
Publication date: 2016-03-10
Also published as: CN106687993B; CN106687993A

Abstract

Disclosed is an apparatus for image data classification. The apparatus may comprise: a target code generator configured to retrieve a plurality of training data samples and to generate a target code for each of the retrieved training data samples, wherein the training data samples being grouped into different classes, and the generated target code has a dimension identical to number of the classes; a target prediction generator configured to receive a plurality of arbitrary data samples and to generate a target prediction for each of the received arbitrary data samples; and a predictor configured to predict a class for each of the received arbitrary data sample based on the generated target code and the generated target prediction. A method for image data classification is also disclosed.

Description

APPARATUS AND METHODS FOR IMAGE DATA CLASSIFICATION

Technical Field

The present application generally relates to a field of target identification, more particularly, to an apparatus and a method for image data classification.

Background

Learning robust and invariant representation has been a long-standing goal in computer vision. In comparison to hand-crafted visual features, such as SIFT or HoG, features learned by deep models have recently been shown more capable of capturing abstract concepts invariant to various phenomenon in visual world, e. g. viewpoint, illumination, and clutter. Hence, an increasing number of studies are now exploring the use of deep representation on vision problems, particularly on classification tasks.

Rather than using deep models for direct classification, many vision studies choose to follow a multistage technique. This technique has been shown effective in combining good invariant behavior of deep features and discriminative power of the standard classifiers. Typically, they first learn a deep model, e. g. convolutional neural network, in a supervised manner. The 1-of-K coding, containing vectors of length K, with the k-th element as one and the remaining zeros, is used along with a softmax function for classification. Each element in a 1-of-K code essentially represents a probability of a specific class. Subsequently, the features of a raw image are extracted from the penultimate layer or shallower layers, to form a high-dimensional feature vector as input to the classifiers such as SVM.

In neural network training, prior arts often adopt a 1-of-K coding scheme. However, discriminative hidden features formed in the neural network system trained by the 1-of-K coding are limited, and the predictions generated by the neural network system do not have error correcting capability. Thus there is a requirement for a more effective target coding having a better performance in neural network training.

Summary

According to an embodiment of the present application, disclosed is an apparatus for data classification. The apparatus may comprising: a target code generator configured to retrieve a plurality of training data samples and to generate a target code for each of the retrieved training data samples, wherein the training data samples being grouped into different classes； a target prediction generator configured to receive a plurality of arbitrary data samples and to generate a target prediction for each of the received arbitrary data samples； and a predictor configured to predict a class for each of the received arbitrary data sample based on the generated target code and the generated target prediction.

According to another embodiment of the present application, disclosed is a method for data classification. The method may comprise: retrieving a plurality of training data samples, wherein the training data samples being grouped into different classes； generating a target code for each of the retrieved training data samples； for an unclassified data sample, generating a target prediction for the unclassified data sample； and predicting a class for the unclassified data sample based on the generated target code and the generated target prediction.

The present invention brings extra benefits to neural network training. On one hand, more discriminative hidden features can form in the neural network system. On the other hand, and the predictions generated by the neural network system has error correcting capability.

Brief Description of the Drawing

Exemplary non-limiting embodiments of the present invention are described below with reference to the attached drawings. The drawings are illustrative and generally not to an exact scale. The same or similar elements on different figures are referenced with the same reference numbers.

Fig. 1 is a schematic diagram illustrating an apparatus for image data classification according to an embodiment of the present application.

Fig. 2 is a schematic diagram illustrating a target code generator according to an embodiment of the present application.

Fig. 3 is a schematic diagram illustrating an apparatus with a training unit according to another embodiment of the present application.

Fig. 4. is a schematic diagram illustrating the training unit according to another embodiment of the present application.

Fig. 5. is a schematic diagram illustrating a predictor according to an embodiment of the present application.

Fig. 6. is a schematic diagram illustrating a training unit according to another embodiment of the present application.

Fig. 7. is a schematic diagram illustrating a predictor according to another embodiment of the present application.

Fig. 8 is a schematic flowchart illustrating a method for image data classification according to an embodiment of the present application.

Fig. 9 is a schematic flowchart illustrating a process for generating a target code according to an embodiment of the present application.

Fig. 10 is a schematic flowchart illustrating a process for training a neural network system according to an embodiment of the present application.

Fig. 11 is a schematic flowchart illustrating a process for predicting a class for an unclassified data sample according to an embodiment of the present application.

Fig. 12 is a schematic flowchart illustrating a process for training a neural network system according to another embodiment of the present application.

Fig. 13 is a schematic flowchart illustrating a process for predicting a class for an unclassified data sample according to another embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When appropriate, the same reference numbers are used throughout the drawings to refer to the same or like parts. Fig. 1 is a schematic diagram illustrating an exemplary apparatus 1000 for data classification with some disclosed embodiments.

It shall be appreciated that the apparatus 1000 may be implemented using certain hardware, software, or a combination thereof. In addition, the embodiments of the present application may be adapted to a computer program product embodied on one or more computer readable storage media (comprising but not limited to disk storage, CD-ROM, optical memory and the like) containing computer program codes.

In the case that the apparatus 1000 is implemented with software, the apparatus 1000 can be run in one or more system that may include a general purpose computer, a computer cluster, a mainstream computer, a computing device dedicated for providing online contents, or a computer network comprising a group of computers operating in a centralized or distributed fashion.

Referring to Fig. 1 again, where the apparatus 1000 is implemented by the hardware or the combination of hardware and software, it may comprise a target code generator 100, a neural network system 200 and a predictor 300. In the embodiment shown in Fig. 1, the target code generator 100 may configured to retrieve a plurality of training data samples and to generate a target code for each of the retrieved training data samples, wherein the training data samples being grouped into different classes. The target prediction generator 200 may be configured to receive a plurality of arbitrary data samples and to generate a target prediction for each of the received arbitrary data samples. In some embodiments, the target prediction generator 200 may comprises a neural network system. In some embodiments, the neural network system may comprise at least one of a deep belief network and a convolutional network. For example, the neural network may consist of the convolutional filters, pooling layers, and locally or fully connected layers, which are well known in the art, and thus the detailed configurations thereof are omitted herein. The predictor 300 may be configured to predict a class for each of the received arbitrary data sample based on the generated target code and the generated target prediction.

Hereinafter the definition of a target code (or target coding) will be described. Let T be a set of integers, called the alphabet set. An element in T is called a symbol. For example, T ＝ {0, 1} is a binary alphabet set. A target code S is a matrix S ∈ T ^n×l, wherein each row of a target code is called a codeword, l denotes the number of symbols in each codeword and n denotes the total number of codewords. The target code can be constructed with a deterministic method, which is built on the Hadamard matrix. For a target code S, we denote

be the set of empirical distributions of symbols in the rows of S, i. e. for i＝1, 2, …, n, αi is a vector of length |T|, with the t-th component of αi counting the number of occurrence of the t-th symbol in the i-th row of S. Similarly, we let

be the set of empirical distributions of symbols in the columns of S. Given two distinct row indices i and i', the Hamming distance between row i and row i' of a target code S is defined as | {j: S_ij≠S_i'j'} |, i. e. , it counts the number of column indices such that the corresponding symbols in row i and row i'are not equal. For simplicity, we call it pairwise Hamming distance.

Table 1 shows an example of a 1-of-K target code, which is typically used in deep learning for representing K classes. Each of the K symbols, either ‘0’ or ‘1’ , indicates the probability of a specific class. The target code here can be written as S＝I, where I∈T^K×K is an identity matrix. It is easy to attain some properties of the 1-of-K coding. For instance, for i＝1, 2, ... , K, we have

and

since only one symbol in each codeword has a value ‘1’ . Similarly, we have

and

The pairwise Hamming distance is two.

Table 1

Instead of representing classes, the target coding can play additional roles, such as error correcting or facilitating better feature representation. To enable the additional roles, a target code S fulfilling specific requirements should be constructed.

The specific requirements a good target code should satisfy will be introduced hereinafter. Generally, the specific requirements can be summarized in three aspects: uniformness in each column, redundancy in each row, and constant pairwise Hamming distance. Hereinafter, how to generate a target code as shown in Table 2, which is also considered as Balanced Code (BC) denoted as S^BC, will be described based on above requirements.

Table 2

As shown in Fig. 2, the target code generator 100 further comprises a matrix generating module 110, a removing module 120, a changing module 130, and a selecting module 140.

The target code generator 100 is configured to generate a Hadamard matrix, wherein entries of the Hadamard matrix being either “+1” or “-1” , and dimension of the Hadamard matrix is larger than the number of classes of the training data samples. Particularly, for a square m×m matrix H, whose entries are either ‘+1’ or ‘－1’ , If HH^T＝mI, this matrix is called a Hadamard matrix. In some embodiments, we can use ‘+’ represents ‘+1’ and ‘－’ represents ‘－1’ . The definition of Hadamard matrix requires that any pair of distinct rows and columns are orthogonal, respectively. A possible way to generate the Hadamard matrix is by Sylvester’s method Hedayat and Wallis (1978) , where a new Hadamard matrix is produced from the old one by the Kronecker (or tensor) product. For example, given a Hadamard matrix H²＝ [++； +﹣] , we can produce H4 by

as following equations, where denotes the Kronecker product. Similiarly, H⁸ is computed by

equations 1-2

The removing module 120 is configured to let S^BC∈T ^(m-1^)×(m-1) obtained by removing the first row and the first column of H. The changing module 130 is configured to remove a first row and a first column of the Hadamard matrix. The above formulation yields the balanced target code S^BC of size (m-1) ×(m-1) , row sum m/2, column sum m/2, and pairwise Hamming distance is constant m/2.

The selecting module 140 is configured to randomly select a plurality of rows of the changed Hadamard matrix as the target code, wherein the number of rows is identical to that of the classes of the training data samples. In some embodiments, the target code may be represented as a vector. Particularly, the selecting module 140 is configured to randomly select c rows as balanced target codes for c classes, wherein each of the selected rows corresponds to one target code. In some embodiments, the class labels C^BC∈T^K×(m-1) is constructed by choosing K codewords randomly from S^BC∈T^(m-1)×(m-1) .

As shown in Fig. 3, the apparatus 1000’ according to another embodiment of the present application comprises a target code generator 100, a neural network system 200, a predictor 300, and a training unit 400. The functions of the target code generator 100, the neural network system 200, and the predictor 300 have been described with reference to Fig. 1, and thus will be omitted hereinafter. The training unit 400 is configured to train the neural network system with the retrieved training data samples such that the trained neural network system is capable of applying the convolutional filters, pooling layers, and locally or fully connected layers to the retrieved training data samples to generate said target predictions. In some embodiments, the target prediction may be represented as a vector.

As shown in Fig. 4, the training unit 400 further comprises a drawing module 410, an error computing module 420, and a back-propagating module 430. The drawing module 410 is configured to draw a training data sample from the training data samples, wherein each of the training data samples is associated with a corresponding ground-truth target code, for example, based on a class label of the training data sample. For example, above association with a ground truth target code based on a class label may have a form in which class label ＝ ‘1’ , target code ＝ ‘1010101’ , and class label ＝ ‘2’ , target code ＝ ‘0110011’ . In some embodiment, the target code may be a ground-truth target code. The error computing module 420 is configured to compute an error such as a Hamming distance between the generated target prediction of the training data sample and the ground-truth target code. The back-propagating module 430 is configured to back-propagate the computed error through the neural network system so as to adjust weights on connections between neurons of the neural network system. In order to get a convergence result, the drawing module, the error computing module and the back-propagating module repeat processes of drawing, computing and back-propagating until the error is less than a predetermined value.

As shown in Fig. 5, the predictor 300 further comprises a distance computing module 310, and an assigning module 320. The distance computing module 310 is configured to compute Hamming distances between a target prediction of an unclassified data sample and the corresponding ground-truth target code of each class of the training samples. Since both the target prediction and the ground-truth target code are vectors having similar length, the distance between the target prediction and the ground-truth target code can be determined by calculating the Hamming distance. For example, if target prediction is ‘1110111’ , and ground-truth target code is ‘1010101’ , the Hamming distance is determined by calculating the number of positions at which the corresponding values are different. In this example, the Hamming distance is 2. The assigning module 320 is configured to assign the unclassified data sample to a class corresponding to the minimum Hamming distance among the computed Hamming distances. That is to say, if the unclassified data sample is closest to a particular class (based on Hamming distance between its target prediction and ground-truth target code) , then the unclassified data sample is considered to be in the same class as the ground-truth code.

As shown in Fig. 6, the training unit 400’ according to another embodiment of the present application comprises a drawing module 410, an error computing module 420, a back-propagating module 430, and an extracting module 440. The drawing module 410 may be configured to draw a training data sample from the training data samples, wherein each of the training data samples is associated with a corresponding ground-truth target code, for example, based on a class label of the training data sample. The error computing module 420 may be configured to compute an error such as a Hamming distance between the generated target prediction of the training data sample and the ground-truth target code. The back-propagating module 430 may be configured to back-propagate the computed error through the neural network system so as to adjust weights on connections between neurons of the neural network system. The drawing module 410, the error computing module 420 and the back-propagating module 430 repeat processes of drawing, computing and back-propagating until the error is less than a predetermined value. The extracting module 440 may be configured to extract hidden layer features from the penultimate layer of the neural network system and train a multiclass classifier based on the extracted hidden layer features and class labels of the training data samples, after the error is less than a predetermined value. Particularly, the hidden layer features will be used as training input of the multiclass classifier, the class labels will be used as training target of the multiclass classifier, and the training input and the training target are used to train the multiclass classifier by optimizing the classifier’ sobjective function. Given an unclassified data sample, its hidden layer features may be extracted by the trained neural network system, and then fed into the multiclass classifier. Then, the multiclass classifier may output a class prediction for the unclassified data sample.

As shown in Fig. 7, the predictor 300’ according to another embodiment of the present application comprises a receiving module 340, a retrieving module 350, and a prediction generating module 360. The receiving module 340 may be configured to receive an unclassified data sample. The retrieving module 350 may be configured to retrieve the trained multiclass classifier from the training unit. The prediction generating module 360 may be configured to generate a class prediction for the unclassified data sample by the trained multiclass classifier.

Fig. 8 is a schematic flowchart illustrating a method 2000 for data classification. Hereinafter, the method 2000 may be described in detail with respect to Fig. 8.

At step S210, a plurality of training data samples is retrieved and a target code for each of the retrieved training data samples is generated by a target code generator, wherein the training data samples being grouped into different classes.

At step S220, for an unclassified data sample, a target prediction for the unclassified data sample is generated by a neural network system. In some embodiments, as stated in the above, the neural network system may consist of multiple layers of convolutional filters, pooling layers, and locally or fully connected layers. In some embodiments, the neural network system may comprise at least one of a deep belief network and a convolutional network. In some embodiments, the method further comprises a step S240 of training the neural network system with the retrieved training data samples such that the trained neural network system is capable of applying the convolutional filters, pooling layers, and locally or fully connected layers to the retrieved training data samples to generate said target predictions.

As shown in Fig. 9, the step S220 of generating a target code comprises following steps. To be specific, at step S310, a Hadamard matrix, whose entries are either “+1” or “-1” , is generated. At step S320, a first row and a first column of the Hadamard matrix is removed. At step S330, “+1” is changed to “0” and “-1” is mapped to “1” . At step S340, a number of rows of changed Hadamard matrix are randomly selected as the target codes, wherein the number of the selected rows is identical to that of the classes of the training data samples and each of the selected rows corresponds to one target code.

And then the method 2000 proceeds with step S230, at which a class for an unclassified data sample is predicted by a predictor based on the generated target code and the generated target prediction.

As shown in Fig. 10, in the case of following the nearest neighbor classification paradigm, the step S240 of training a neural network system comprises following steps.

At step S410, a training data sample is drawn from a predetermined training set, wherein the training data sample is associated with a corresponding target code, particularly a ground-truth target code, for example, based on a class label of the training data sample. For example, above association with a ground truth target code based on a class label may have a form in which class label ＝ ‘1’ , target code ＝ ‘1010101’ , and class label ＝ ‘2’ , target code ＝ ‘0110011’ .

At step S420, an error such as a Hamming distance between the generated target prediction and the ground-truth target code is computed.

At step S430, the computed error is back-propagated through the neural network system so as to adjust weights on connections between neurons of the neural network system.

At step S440, the steps S410-S430 are repeated until the error is less than a predetermined value, i. e. , until the training process is converged.

As shown in Fig. 11, in the case of following the nearest neighbor classification paradigm, the step S230 of predicting a class for an unclassified data sample comprises following steps.

At step S510, an unclassified data sampled is received.

At step S520, Hamming distances between a target prediction of the unclassified data sample and the corresponding ground-truth target code of each class of the training samples is computed. As discussed in the above, since both the target prediction and the ground-truth target code are vectors having similar length, the distance between the target prediction and the ground-truth target code can be computed by calculating the Hamming distance. For example, if target prediction is ‘1110111’ , and ground-truth target code is ‘1010101’ , the Hamming distance is computed by calculating the number of positions at which the corresponding values are different. In this example, the Hamming distance may be 2.

At step S530, the unclassified data sample is assigned to a class corresponding to the minimum Hamming distance among the computed Hamming distances. That is to say, if the unclassified data sample is closest to a particular class (based on Hamming distance between its target prediction and ground-truth target code) , then the unclassified data sample is considered to be in the same class as the ground-truth code.

As shown in Fig. 12, according to another embodiment of the present application, in the case of following the multistage paradigm, the step S240’ of training a neural network system further comprises following steps.

At step S410, a training data sample is drawn from a predetermined training set, wherein the training data sample is associated with a corresponding target code, particularly a ground-truth target code, for example, based on a class label of the training data sample.

At step S420, an error between the generated target prediction and the ground-truth target code is computed.

At step S440’ , if the error is less than a predetermined value, i. e. , the training process is converged, the steps S410-S430 are repeated, otherwise, the method proceed with step S450 of extracting hidden layer features from the penultimate layer of the neural network system and training a multiclass classifier based on the extracted hidden layer features and class labels of the training data samples. Particularly, the hidden layer features will be used as training input of the multiclass classifier, the class labels will be used as training target of the multiclass classifier, and the training input and the training target are used to train the multiclass classifier by optimizing the classifier’ sobjective function. Given an unclassified data sample, its hidden layer features may be extracted by the trained neural network system, and then fed into the multiclass classifier. Then, the multiclass classifier may output a class prediction for the unclassified data sample.

As shown in Fig. 13, according to another embodiment of the present application, in the case of following the multistage paradigm, the step S230’ of predicting the class for an unclassified data sample comprises following steps.

At step S540, an unclassified data sample is received.

At step S550, the multiclass classifier trained in step S450 is retrieved.

At step S560, a class prediction is generating for the unclassified data sample by the trained multiclass classifier.

In present application provides a neural network system, with a balanced target coding unit to represent the target code of different data classes. Such target codes are employed in the learning of a neural network along with a predetermined set of training data.

Prior arts often adopt a 1-of-K coding scheme in neural network training. In contrast to the conventional 1-of-K coding scheme, the balanced coding unit brings extra benefits to neural network training. On one hand, more discriminative hidden features can form in the neural network system. On the other hand, and the predictions generated by the neural network system has error correcting capability.

Although the preferred examples of the present invention have been described, those skilled in the art can make variations or modifications to these examples upon knowing the basic inventive concept. The appended claims is intended to be considered as comprising the preferred examples and all the variations or modifications fell into the scope of the present invention.

Interestingly, even on just a two-dimensional embedding space, the features induced by Balanced Code-based learning can already be easily separable. In contrast, the feature clusters induced by 1-of-K are overlapping, such that separation of such clusters may only be possible at higher dimensions. By replacing 1-of-K with the Balanced Code in deep feature learning, some classes, which are confused in 1-of-K coding, can be separated. A longer balanced code leads to more separable and distinct feature clusters.

Obviously, those skilled in the art can make variations or modifications to the present invention without departing the spirit and scope of the present invention. As such, if these variations or modifications belong to the scope of the claims and equivalent technique, they may also fall into the scope of the present invention.

Claims

An apparatus for image data classification, comprising:

a target code generator configured to retrieve a plurality of training data samples and to generate a target code for each of the retrieved training data samples, wherein the training data samples are grouped into different classes；

a target prediction generator configured to receive a plurality of arbitrary data samples and to generate a target prediction for each of the received arbitrary data samples； and

a predictor configured to predict a class for each of the received arbitrary data sample based on the generated target code and the generated target prediction.
An apparatus of claim 1, wherein the target code generator further comprises:

a matrix generating module configured to generate a Hadamard matrix, wherein entries of the Hadamard matrix being either “+1” or “-1” , and dimension of the Hadamard matrix is larger than the number of classes of the training data samples；

a removing module configured to remove a first row and a first column of the Hadamard matrix；

a changing module configured to change “+1” and “-1” in the Hadamard matrix to “0” and “1” , respectively； and

a selecting module configured to randomly select a plurality of rows of the changed Hadamard matrix as the target codes, wherein the number of the selected rows is identical to that of the classes of the training data samples and each of the selected rows corresponds to one target code.
An apparatus of claim 2, wherein the prediction generator comprises a neural network system, and

wherein the apparatus further comprises a training unit configured to train the neural network system with the retrieved training data samples such that the trained neural network system is capable of generating said target predictions.
An apparatus of claim 3, wherein the target code is a ground-truth target code.
An apparatus of claim 4, wherein the training unit further comprises:

a drawing module configured to draw a training data sample from the training data samples, wherein each of the training data samples is associated with a corresponding ground-truth target code；

an error computing module configured to compute an error between the generated target prediction of the training data sample and the ground-truth target code； and

a back-propagating module configured to back-propagate the computed error through the neural network system so as to adjust weights on connections between neurons of the neural network system,

wherein the drawing module, the error computing module and the back-propagating module repeat processes of drawing, computing and back-propagating until the error is less than a predetermined value.
An apparatus of claim 5, wherein the predictor further configured comprises:

a receiving module configured to receive an unclassified data sample；

a distance computing module configured to compute Hamming distances between a target prediction of an unclassified data sample and the corresponding ground-truth target code of each class of the training samples； and

an assigning module configured to assign the unclassified data sample to a class corresponding to the minimum Hamming distance among the computed Hamming distances.
An apparatus of claim 4, wherein the training unit further comprises:

a drawing module configured to draw a training data sample from the training data samples, wherein each of the training data samples is associated with a corresponding ground-truth target code；

an error computing module configured to compute an error between the generated target prediction of the training data sample and the ground-truth target code；

a back-propagating module configured to back-propagate the computed error through the neural network system so as to adjust weights on connections between neurons of the neural network system； and

an extracting module configured to extract hidden layer features from a penultimate layer of the neural network system and train a multiclass classifier based on the extracted hidden layer features and class labels of the training data samples, after the error is less than a predetermined value,

wherein the drawing module, the error computing module and the back-propagating module repeat processes of drawing, computing and back-propagating until the error is less than a predetermined value.
A apparatus of claim 5 or 7, wherein the error is a Hamming distance.
An apparatus of claim 7, wherein the predictor further comprises:

a receiving module configured to receive an unclassified data sample；

a retrieving module configured to retrieve the trained multiclass classifier from the training unit；

a prediction generating module configured to generate a class prediction for the unclassified data sample by the trained multiclass classifier.
An apparatus of claim 3, wherein the neural network system comprises at least one of a deep belief network and a convolutional network.
A method for image data classification, comprising:

retrieving a plurality of training data samples, wherein the training data samples being grouped into different classes；

generating a target code for each of the retrieved training data samples；

for an unclassified data sample, generating a target prediction for the unclassified data sample； and

predicting a class for the unclassified data sample based on the generated target code and the generated target prediction.
A method of claim 11, wherein the step of generating a target code comprises:

generating a Hadamard matrix, wherein entries of the Hadamard matrix being either “+1” or “-1” , and dimension of the Hadamard matrix is larger than the number of classes of the training data samples；

removing a first row and a first column of the Hadamard matrix；

changing “+1” and “-1” in the Hadamard matrix to “0” and “1” , respectively； and

randomly selecting a plurality of rows of the changed Hadamard matrix as the target codes, wherein the number of the selected rows is identical to that of the classes of the training data samples and each of the selected row corresponds to one target code.
A method of claim 12, wherein the target prediction is generated by a neural network system, the method further comprises training the neural network system with the retrieved training data samples such that the trained neural network system is capable of generating said target predictions.
A method of claim 13, wherein the target code is a ground-truth target code.
A method of claim 14, wherein the step of training a neural network system comprises:

1) drawing a training data sample from the training data samples, wherein each of the training data samples is associated with a corresponding ground-truth target code；

2) computing an error between the generated target prediction of the training data sample and the ground-truth target code；

3) back-propagating the computed error through the neural network system so as to adjust weights on connections between neurons of the neural network system； and

4) repeating steps 1) -3) until the error is less than a predetermined value.
A method of claim 15, wherein the step of predicting a class for an unclassified data sample comprises:

receiving an unclassified data sample；

computing Hamming distances between a target prediction of an unclassified data sample and the corresponding ground-truth target code of each class of the training samples； and

assigning the unclassified data sample to a class corresponding to the minimum Hamming distance among the computed Hamming distances.
A method of claim 14, wherein the step of training a neural network system further comprises:

1) drawing a training data sample from the training data samples, wherein each of the training data samples is associated with a corresponding ground-truth target code；

2) computing an error between the generated target prediction of the training data sample and the ground-truth target code；

3) back-propagating the computed error through the neural network system so as to adjust weights on connections between neurons of the neural network system；

4) determining whether the error is larger than a predetermined value,

if yes, repeating steps 1)-3) ,

if no, proceeding with 5) extracting hidden layer features from the penultimate layer of the neural network system and training a multiclass classifier based on the extracted hidden layer features and class labels of the training data samples.
A method of claim 15 or 17, wherein the error is a Hamming distance.
A method of claim 17, wherein the step of predicting the class for an unclassified data sample further comprises

receiving an unclassified data sample；

retrieving the multiclass classifier trained in step 5) ；

generating a class prediction for the unclassified data sample by the trained multiclass classifier.
A method of claim 13, wherein the neural network system comprises at least one of a deep belief network and a convolutional network.