US8463045B2 - Hierarchical sparse representation for image retrieval - Google Patents

Hierarchical sparse representation for image retrieval Download PDF

Info

Publication number
US8463045B2
US8463045B2 US12/943,805 US94380510A US8463045B2 US 8463045 B2 US8463045 B2 US 8463045B2 US 94380510 A US94380510 A US 94380510A US 8463045 B2 US8463045 B2 US 8463045B2
Authority
US
United States
Prior art keywords
image
level
features
feature
nodal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US12/943,805
Other versions
US20120114248A1 (en
Inventor
Linjun Yang
Qi Tian
Bingbing Ni
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Priority to US12/943,805 priority Critical patent/US8463045B2/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: NI, BINGBING, TIAN, QI, YANG, LINJUN
Publication of US20120114248A1 publication Critical patent/US20120114248A1/en
Application granted granted Critical
Publication of US8463045B2 publication Critical patent/US8463045B2/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5854Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using shape and object relationship
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour

Definitions

  • CBIR Content-based image retrieval
  • Some websites or search engines offer content-based image search services to Internet users. Specifically, a user submits a query image which is similar to his/her desired image to a website or search engine that provides CBIR services. Based on the query image, the website or search engine subsequently returns one or more stored images to the user.
  • the website or search engine represents or encodes the stored images in terms of image features. The website or search engine compares the image features of the stored images with image features of the query image, and retrieves one or more stored images that have image features similar to the image features of the query image.
  • This application describes example techniques for generating a hierarchical sparse codebook.
  • training image features are received.
  • a hierarchical sparse codebook is then generated based at least upon the training image features.
  • the generated hierarchical sparse codebook includes multiple levels, with each level being associated with a sparseness factor.
  • FIG. 1 illustrates an exemplary environment including an example hierarchical sparse coding system 110 .
  • FIG. 2 illustrates the example hierarchical sparse coding system 110 of FIG. 1 in more detail.
  • FIG. 3 illustrates a first example hierarchical sparse codebook.
  • FIG. 4 illustrates an exemplary method of generating a hierarchical sparse codebook.
  • FIG. 5 illustrates a second example hierarchical sparse codebook.
  • FIG. 6 illustrates an exemplary method of representing an image using a hierarchical sparse codebook.
  • This disclosure describes a hierarchical sparse coding using a hierarchical sparse codebook.
  • the described codebook includes multiple levels.
  • the described codebook allows a gradual determination/classification of an image feature into one or more groups or nodes by traversing the image feature through one or more paths to the one or more groups or nodes. That is, the described codebook compares an image feature of an image with nodes or nodal features of the nodes, beginning from a root level down to a leaf level of the codebook. Furthermore, the image feature is only compared with a subset of nodes at each level of the codebook, and therefore processing time is significant reduced relative to existing image search strategies.
  • the number of determined/classified groups for the image feature is small/sparse in comparison with the total number of available groups or nodes in the codebook.
  • Using the described codebook allows an efficient determination or classification of an image feature, and therefore provides an efficient and time-saving way of representing an image in terms of image features.
  • image retrieval can be enhanced by comparing extracted features of an image with the codebook to obtain a representation of the image that can be used as an index or a reference for retrieving one or more stored images in a database.
  • FIG. 1 illustrates an exemplary environment 100 usable to implement hierarchical sparse representation for image retrieval.
  • the environment 100 includes one or more users 102 - 1 , 102 - 2 , . . . 102 -N (which are collectively referred to as 102 ), a search engine 104 , a website 106 , an image database 108 , a hierarchical sparse coding system 110 , and a network 112 .
  • the user 102 communicates with the search engine 104 , the website 106 or the hierarchical sparse coding system 110 through the network 112 using one or more devices 114 - 1 , 114 - 2 , . . . 114 -M, which are collectively referred to as 114 .
  • the devices 114 may be implemented as a variety of conventional computing devices including, for example, a server, a desktop personal computer, a notebook or portable computer, a workstation, a mainframe computer, a mobile computing device, a handheld device, a mobile phone, an Internet appliance, a network router, etc. or a combination thereof.
  • the network 112 may be a wireless or a wired network, or a combination thereof.
  • the network 112 may be a collection of individual networks interconnected with each other and functioning as a single large network (e.g., the Internet or an intranet). Examples of such individual networks include, but are not limited to, Local Area Networks (LANs), Wide Area Networks (WANs), and Metropolitan Area Networks (MANs). Further, the individual networks may be wireless or wired networks, or a combination thereof.
  • the device 114 includes a processor 116 coupled to a memory 118 .
  • the memory 118 includes a browser 120 and other program data 122 .
  • the memory 118 may be coupled to or associated with, and/or accessible to other devices, such as network servers, router, and/or other devices 114 .
  • the user 102 uses the browser 120 of the device 114 to submit an image query to the search engine 104 or the website 106 .
  • the search engine 104 or the website 106 compares image query with images stored in the image database 108 and retrieves one or more stored images from the image database 108 using a hierarchical sparse codebook that is generated by the hierarchical sparse coding system 110 .
  • the search engine 104 or the website 106 then presents the one or more stored images to the user 102 .
  • the hierarchical sparse coding system 110 generates a hierarchical sparse codebook using images stored in the image database 108 either upon request from the search engine 104 or the website 106 , or on a regular basis.
  • the hierarchical sparse coding system 110 encodes or represents an image received from the user 102 , the search engine 104 or the website 106 based on the hierarchical sparse codebook.
  • the hierarchical sparse coding system 110 may return a representation of the received image to the user 102 , the search engine 104 or the website 106 .
  • the hierarchical sparse coding system 110 may store the representation of the received image or send the image representation to the image database 108 for storage. This image representation may further be stored as an index or a reference for the received image in the image database 108 .
  • FIG. 2 illustrates various components of the exemplary hierarchical sparse coding system 110 in more detail.
  • the system 110 can include, but is not limited to, a processor 202 , a network interface 204 , a system memory 206 , and an input/output interface 208 .
  • the memory 206 includes a computer-readable media in the form of volatile memory, such as Random Access Memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash RAM.
  • the memory 206 includes program modules 210 and program data 212 .
  • the program data 212 may include a hierarchical sparse codebook 214 and other program data 216 .
  • the memory 206 may further include a feature database 218 storing training image features that are used for generating the hierarchical sparse codebook 214 .
  • the hierarchical sparse codebook 214 may include a hierarchical tree.
  • FIG. 3 shows an example of a hierarchical sparse codebook 214 in a form of a hierarchical tree.
  • the hierarchical codebook may comprise L number of levels, including a root level 302 - 1 , one or more intermediate levels 302 - 2 , . . . 302 -L ⁇ 1, and a leaf level 302 -L.
  • Each node of the root level and the one or more intermediate levels may include K number of child nodes.
  • Each node of the hierarchical codebook is associated with a nodal feature.
  • a nodal feature is a trained image feature associated with a node of the hierarchical codebook.
  • the nodal feature may be in a form of a vector, for example. Additionally, each node may further be assigned a subset of the training image features.
  • each level of the hierarchical sparse codebook is associated with a sparseness factor to determine a degree of sparseness for each level.
  • a degree of sparseness for a level is defined as an average number of nodes or nodal features used to represent each training image feature at that level divided by the total number of nodal features at that same level.
  • the program module 210 may further include an image receiving module 220 .
  • the image receiving module 220 may receive an image from the user 102 , the search engine 104 or the website 106 .
  • the image may be a query image that the user 102 uses to find his/her desired image(s).
  • the image receiving module 220 may transfer the image to a feature extraction module 222 , which extracts features that are representative of the image.
  • the feature extraction module 222 may adopt one or more feature extraction techniques such as singular vector decomposition (SVD), Bag of Visual Words (BoW), etc. Examples of the features include, but are not limited to, scale-invariant feature transform (SIFT) features and intensity histograms.
  • SIFT scale-invariant feature transform
  • the feature extraction module 222 may send the extracted features to a feature determination module 224 , the feature database 218 , or both.
  • the feature determination module 224 determines one or more leaf nodes of the hierarchical sparse codebook 214 to represent each extracted feature. Specifically, the feature determination module 224 compares each extracted feature with nodal features associated with a subset of nodes of the hierarchical sparse codebook 214 level by level.
  • Table 1 shows a first example algorithm for representing an image using the hierarchical sparse codebook 214 .
  • the hierarchical sparse codebook 214 in FIG. 3 is one example.
  • the feature determination module 224 compares the extracted feature with each nodal feature associated with each node at next level 302 - 2 , i.e., level 1 in Table 1.
  • the feature determination module 224 may employ a distance measurement module 226 to determine a distance or a degree of overlap between the extracted feature and each nodal feature.
  • the distance measurement module 226 may measure the distance or the degree of overlap according to a predetermined distance metric. For example, if features (i.e., the extracted feature and the nodal feature) are expressed in terms of feature vectors, the predetermined distance metric may include computing a normalized Lp-distance between the extracted feature and the nodal feature, where p can be any integer greater than zero.
  • the predetermined distance metric may include computing a normalized L2-distance (i.e., Euclidean distance) or a normalized L1-distance (i.e., Manhattan distance) between the extracted feature and the nodal feature.
  • the predetermined distance metric may include computing an inner product of the extracted feature and the nodal feature to determine a degree of overlap therebetween.
  • the feature determination module 224 may select a node at level 302 - 2 whose parent has a distance from the extracted feature that is less than a predetermined distance threshold (e.g., 0.2). Alternatively, the feature determination module 224 may select a node at level 302 - 2 whose parent has a degree of overlap with the extracted feature that is greater than a predetermined overlap threshold (e.g., zero).
  • a predetermined distance threshold or the predetermined overlap threshold can be adaptively adjusted for each level in order to control a degree of sparseness for each level.
  • a degree of sparseness for a level is defined as an average number of nodes or nodal features used to represent each training image feature at that particular level divided by the total number of nodes or nodal features at that same level.
  • the feature determination module 224 repeats distance measurement for those selected nodes at level 302 - 2 and node selection for child nodes of the selected nodes at level 302 - 3 . In the above algorithm 1, the feature determination module 224 leaves those unselected nodes at level 302 - 2 and respective child nodes or branches untouched. More specifically, the feature determination module 224 does not perform any distance determination or node selection for the child nodes of the unselected nodes of level 302 - 2 .
  • leaf nodes are selected according to the above algorithm and are used to represent the extracted feature by the feature determination module 224 .
  • the feature determination module 224 may generate a histogram representation of the image.
  • the histogram representation of the image may be generated by counting a number of times each node or nodal feature at a leaf level (i.e., level 302 -L in FIG. 3 or level L ⁇ 1 in Table 1) of the codebook 214 is selected for the extracted features of the images.
  • the histogram representation may be used to represent the image, and may be stored in the image database 108 as an index or a comparison reference for the image.
  • the feature determination module 224 may additionally or alternatively employ a cost module 228 to determine which nodes are selected and which nodes are not selected for the extracted feature at each level of the codebook 214 .
  • the cost module 228 may include a cost function. Table 2 (below) shows a second example algorithm for representing an image using the hierarchical sparse codebook 214 .
  • the hierarchical sparse codebook in FIG. 3 is used for illustration.
  • an extracted feature x i arrives at the root level 302 - 1 in FIG. 3 or level 0 in Table 2, an active set A 1 is initially set to include each nodal feature associated with each node at a next level 302 - 2 , i.e., level 1.
  • L1 is then minimized with respect to a response u i .
  • Each entry, u i j , in the response u i represents a response of the extracted feature x i to corresponding nodal feature v j .
  • a new active set A 2 may be created by selecting a node or a nodal feature in level 302 - 3 in FIG. 3 or level 2 in Table 2 whose parent at level 302 - 2 or level 1 gives a response u i j greater than a predetermined response threshold.
  • the processes of cost function minimization and nodal feature selection are repeated for level 302 - 3 in FIG. 3 or level 2 in Table 2, until leaf level (i.e., level 320 -L in FIG. 3 or level L ⁇ 1 in Table 2) of the codebook 214 is reached.
  • leaf level i.e., level 320 -L in FIG. 3 or level L ⁇ 1 in Table 2
  • One or more nodes or nodal features at the leaf level of the codebook 214 having a response with the extracted feature x i greater than a predetermined response threshold may be selected to represent the extracted feature x i .
  • a parameter ⁇ which controls the degree of sparseness, may be different for different levels of the codebook 214 .
  • the parameter ⁇ may be smaller for levels closer to the root level to allow more nodes or nodal features to be selected at those levels, and may gradually increase towards the leaf level of the codebook 214 to avoid over-number of selected nodes or nodal features at the leaf level.
  • the parameter ⁇ will not be modified until the codebook 214 is reconstructed or representations of the images are redone.
  • an image may be represented using a combination of the above two algorithms.
  • algorithm 1 may first be used to find an active set up to a predetermined level of the codebook 214 for each image feature of the image.
  • Algorithm 2 may then be used for the rest of the levels of the codebook 214 to obtain one or more nodes or nodal features at the leaf level of the codebook 214 for each image feature.
  • algorithm 1 can allow more nodes or nodal features to be selected for an image feature at each level, and therefore permits a broader exploration of nodal features to represent the image feature. This avoids pre-mature elimination of nodes or nodal features that are actually good candidates for representing the image feature.
  • algorithm 2 may be employed to limit number of selected nodes or nodal features at subsequent levels in order to prevent the number of selected nodes or nodal features (i.e., active set in Table 1) from going too large in size.
  • the feature determination module 224 may save the representation in the image database 108 and use this representation as an index for retrieving the image. Additionally or alternatively, this representation can be saved as a reference for comparison with representations of other images such as a query image during image retrieval.
  • a representation e.g., histogram representation
  • the representation of the query image may be used to retrieve one or more stored images in the image database 108 .
  • the representation of the query image may be compared with representations of images stored in the image database 108 .
  • a classifier may be used to classify the query image into one of a plurality of classes (e.g., automobile class) based on the representation of the query image.
  • the classifier may include a neural network, a Bayesian belief network, support vector machines (SVMs), fuzzy logic, Hidden Markov Model (HMM), or any combination thereof, etc.
  • SVMs support vector machines
  • HMM Hidden Markov Model
  • the classifier may be trained on a subset of the representations of the images stored in the image database 108 .
  • stored images within that class may be retrieved and presented to the user 102 according to respective frequencies of retrieval within a certain interval (e.g., the past one day, past one week, past one month, etc).
  • the representation of the query image may be compared with the representations of the stored images according to an image similarity metric.
  • the image similarity metric is a measure of similarity between two images, and may return a similarity score to represent a relative resemblance of a stored image with respect to the query image.
  • a similarity measurement module 230 may be used to calculate a similarity score of a stored image with respect to the query image based upon the representation of the query image. For example, the similarity measurement module 230 calculates the similarity score based on a ratio of the number of common features in the representations of the query image and the stored image with respect to their average number of features.
  • the similarity measurement module 230 may compute a correlation between the representation of the query image with representation of a stored image. For example, if an image is represented in the form of a histogram as described above, a correlation between a histogram representation of the query image and a histogram representation of a stored image may be computed to obtain a similarity score therebetween. In one embodiment, each of these histogram representations may first be normalized such that a respective area integral of the histogram representations are normalized to one, for example.
  • one or more stored images may be presented to the user 102 , and arranged according to their similarity scores, for example, in a descending order of their similarity scores.
  • the program module 210 may further include a codebook generation module 232 .
  • the codebook generation module 232 generates the hierarchical sparse codebook 214 based on the training image features that are stored in the feature database 218 . Additionally or alternatively, the codebook generation module 232 generates the hierarchical sparse codebook 214 based on images stored in the image database 108 . In one embodiment, the codebook generation module 232 generates or reconstructs the hierarchical sparse codebook 214 on a regular basis, e.g., each day, each week, each month, or each year. Alternatively, the hierarchical sparse codebook 214 may be generated upon request, for example, from the search engine 104 or the website 106 .
  • the hierarchical sparse codebook 214 is reconstructed based on performance of the codebook 214 in retrieving stored images in response to query images submitted from the user 102 .
  • the program data 212 may further include image query data 234 .
  • the image query data 234 may include query images that have been submitted by one or more users 102 and stored images that were returned in response to the query images. Additionally or alternatively, the image query data 234 may include one or more stored images that have been selected by the users 102 in response to the query images. In one embodiment, the image query data 234 may further include similarity scores of the one or more selected images with respect to the query images.
  • the codebook 214 may be reconstructed in response to an average similarity score of the selected images in the image query data 234 being less than a predetermined similarity threshold.
  • the predetermined similarity threshold may be set by an administrator or operator of the system 110 according to the accuracy and/or computing requirements, for example. For example, if a perfect match between a query image and a stored image has a similarity score of one, the codebook 214 may be reconstructed in response to the average similarity score being less than 0.7, for example.
  • the codebook generation module 232 may receive a plurality of training image features from the feature database 218 . Additionally or alternatively, the codebook generation module 232 may receive a plurality of images from the image database 108 and use the feature extraction module 222 to extract a plurality of image features for training purposes. Upon receiving the plurality of training image features, the codebook generation module 232 generates a hierarchical sparse codebook 214 according to a codebook generation algorithm. An example algorithm is illustrated in Table 3 (below).
  • k number of nodes at level 1 are branched out from a root node at level 0.
  • Each node at level 1 is associated with a nodal feature which is a training image feature randomly selected from the plurality of training image features.
  • the plurality of training image features are then compared with each nodal feature at level 1 in order to assign a subset of training image features to the corresponding node at level 1.
  • the subset of training image features assigned to a node includes a training image feature that has a response (e.g., a degree of overlap) to a nodal feature associated with that node greater than a predetermined response threshold, e.g., zero.
  • a set of k nodal features are trained with respect to the assigned subset of training image features for the node. Specifically, based on the assigned subset of training image features, a cost function is minimized with respect to the set of k nodal features: ⁇ i
  • this set of k nodal features are assigned to child nodes of the node at next level, i.e., level 2. These processes of cost function minimization and nodal feature assignment are repeated for each node at each level until each node at the leaf level of the codebook is assigned a nodal feature and a subset of training image features or leaf level of the codebook is reached. At this point, the hierarchical sparse codebook is generated.
  • X l j For each node j at level l, i.e., o l j , collect a subset of X which has a response with a nodal feature vector associated with node o l j greater than a predetermined response threshold, and is denoted as X l j ; 2.
  • the parameter ⁇ l (which is also called a sparseness factor for level l) can be adaptively adjusted to change a degree of sparseness for the level l.
  • the parameter ⁇ l or the degree of sparseness for a level is adjusted to be less than a predetermined threshold level.
  • the parameter ⁇ l or the degree of sparseness for a level is adjusted to be within a predetermined range.
  • the parameter ⁇ l or the degree of sparseness for each level is collectively adjusted to obtain an overall degree of sparseness for the codebook and the plurality of training image features that is less than a predetermined overall threshold or within a predetermined overall range.
  • the predetermined threshold level or the predetermined range may be the same or different for different levels.
  • the above algorithm may further be modified. Specifically, after randomly assigning k number of training image features to be nodal features associated with the nodes at level 1, the algorithm may further train these nodal features to minimize the above cost function for level 1. Upon obtaining a set of optimized nodal features that minimize the cost function of level 1, the algorithm may assign these optimized nodal features to the nodes of level 1. The algorithm further assigns a subset of training image features that have responses greater than a predetermined response threshold to each node of level 1.
  • the algorithm may further specify that a training image feature that is assigned to a node is also a training image feature that has been assigned to the parent of the node.
  • the codebook 214 is described to include a hierarchical tree in the foregoing embodiments, the codebook 214 is not limited thereto.
  • the hierarchical sparse codebook 214 can include any hierarchical structure.
  • the hierarchical sparse codebook 214 may initially include a hierarchical tree. After or during the training phase of the hierarchical sparse codebook 214 , however, a node (i.e., a node at an intermediate level and/or a leaf level of the codebook 214 ) may be purged based on an average degree of overlap between associated training image features and corresponding nodal feature of the node.
  • a node may be purged if corresponding average degree of overlap between associated training image features and corresponding nodal feature is less than a predetermined threshold.
  • this predetermined threshold may vary among different levels.
  • the predetermined threshold for average degree of overlap is lower at a higher level (i.e., a level closer to the root level of the codebook 214 ), and increases towards the leaf level of the codebook 214 . This is because the number of training image features assigned to a node at the higher level is usually greater and a nodal feature associated with the node is more generalized with respect to the assigned training image features. Having a lower threshold therefore avoids pre-mature purging of the node at the higher level.
  • a node at a lower level is usually assigned with a fewer number of training image features, and a corresponding nodal feature may be more specific to the assigned training image features. Therefore, the predetermined threshold associated with the node at the lower level can be higher to reflect a change from generality to specificity of nodal features from a high level to a low level of the codebook 214 .
  • the hierarchical sparse codebook may be a hierarchical structure having a plurality of levels, with each level having a predetermined number of nodes. Rather than having an equal number of intermediate child nodes for each node at one level, the number of intermediate child nodes of a node at that level may be determined upon the number of training image features assigned to that particular node. For example, the number of intermediate child nodes of a first node at one level is greater than the number of intermediate child nodes of a second node at the same level if the number of training image features assigned to the first node is greater than the number of training image features assigned to the second node.
  • a node having a greater number of training image features is allocated more resources (i.e., child nodes) to represent these training image features while a node having a fewer number of training image features is allocated fewer resources, thereby optimizing the use of resources which are usually limited.
  • Exemplary methods for generating a hierarchical sparse codebook or representing an image using the hierarchical sparse codebook are described with reference to FIGS. 4-6 .
  • These exemplary methods can be described in the general context of computer executable instructions.
  • computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, and the like that perform particular functions or implement particular abstract data types.
  • the methods can also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communication network.
  • computer executable instructions may be located both in local and remote computer storage media, including memory storage devices.
  • the exemplary methods are illustrated as a collection of blocks in a logical flow graph representing a sequence of operations that can be implemented in hardware, software, firmware, or a combination thereof.
  • the order in which the methods are described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the methods, or alternate methods. Additionally, individual blocks may be deleted from the methods without departing from the spirit and scope of the subject matter described herein.
  • the blocks represent computer instructions that, when executed by one or more processors, perform the recited operations.
  • FIG. 4 illustrates an exemplary method 400 of generating a hierarchical sparse codebook.
  • a plurality of training image features are received.
  • This plurality of training image features may be obtained from one or more databases and/or one or more search engines.
  • the plurality of training image features may be extracted from a plurality of images that are stored in the one or more databases and/or the one or more search engines.
  • a hierarchical sparse codebook is generated based at least upon the plurality of training image features.
  • the hierarchical sparse codebook may be generated to include a plurality of levels.
  • each of the plurality of levels may be associated with a sparseness factor as shown in FIG. 3 , for example.
  • Each level of the hierarchical sparse codebook is generated by adjusting corresponding sparseness factors to be less than respective predetermined thresholds or within respective predetermined ranges.
  • the hierarchical sparse codebook may be generated by adjusting the sparseness factor of each level to obtain an overall degree of sparseness for the codebook and the plurality of training image features.
  • the sparseness factor of each level is adjusted to obtain an overall degree of sparseness that is less than a predetermined overall threshold or within a predetermined overall range.
  • This predetermined overall threshold or predetermined overall range may be set by an administrator or an operator of the system 112 based on specified computing requirements or needs.
  • generating the hierarchical sparse codebook at block 404 may include representing each training image feature by a sparse number of leaf nodes or nodal features that are associated with the leaf nodes of the hierarchical sparse codebook.
  • FIG. 5 shows an example of this hierarchical sparse codebook.
  • each training image feature j is represented by a sparse number of nodes or nodal features at the leaf level of the codebook.
  • FIG. 6 illustrates an exemplary method 600 of representing or encoding an image using a hierarchical sparse codebook.
  • an image is received.
  • This image may be received from a user for image query.
  • this image may be received from a search engine or a website for encoding the image.
  • a plurality of image features are extracted from the image.
  • each image feature of the image is compared with a hierarchical sparse codebook to obtain one or more leaf-level features (i.e., nodal features at leaf level) of the codebook.
  • the one or more leaf-level features represent a sparse code representation of the respective image feature.
  • a histogram for the image is generated based upon the one or more leaf-level features of each image feature of the image.
  • the histogram represents respective number of times that each leaf-level feature of the codebook is encountered by the plurality of image features of the image.
  • the image is represented by the histogram.
  • the histogram may further be stored in a database as an index for the image. Additionally or alternatively, the histogram may be acted a reference for comparison between another image such as a query image during image retrieval.
  • the histogram of the query image may be compared with histograms of a subset of stored images in the database. In one embodiment, the comparison may be performed by computing correlations between the histogram of the query image and the histograms of the subset of stored images.
  • One or more stored images having a correlation greater than a predetermined correlation threshold may be retrieved and presented to the user.
  • Computer-readable media can be any available media that can be accessed during generation of the hierarchical sparse codebook or encoding an image using the hierarchical sparse codebook.
  • Computer-readable media may comprise volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.
  • Computer-readable media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information. Combinations of any of the above should also be included within the scope of computer-readable media.

Abstract

A hierarchical sparse codebook allows efficient search and comparison of images in image retrieval. The hierarchical sparse codebook includes multiple levels and allows a gradual determination/classification of an image feature of an image into one or more groups or nodes by traversing the image feature through one or more paths to the one or more groups or nodes of the codebook. The image feature is compared with a subset of nodes at each level of the codebook, thereby reducing processing time.

Description

BACKGROUND
Content-based image retrieval (CBIR) is gradually gaining momentum among Internet users nowadays. Some websites or search engines offer content-based image search services to Internet users. Specifically, a user submits a query image which is similar to his/her desired image to a website or search engine that provides CBIR services. Based on the query image, the website or search engine subsequently returns one or more stored images to the user. In order to allow efficient retrieval of stored images, the website or search engine represents or encodes the stored images in terms of image features. The website or search engine compares the image features of the stored images with image features of the query image, and retrieves one or more stored images that have image features similar to the image features of the query image.
Given the increasing popularity of CBIR services, academic or business communities have conducted significant research to determine an image representation that can provide efficient comparison and retrieval of images. A number of algorithms and strategies such as Bags of Words (BOW) have been proposed. However, these proposed algorithms or strategies are either restricted to a small set of images or are too computationally intensive to be performed in real time.
SUMMARY
This summary introduces simplified concepts of a hierarchical sparse codebook that may be used for content-based image retrieval, which is further described below in the Detailed Description. This summary is not intended to identify essential features of the claimed subject matter, nor is it intended for use in determining the scope of the claimed subject matter.
This application describes example techniques for generating a hierarchical sparse codebook. In one embodiment, training image features are received. A hierarchical sparse codebook is then generated based at least upon the training image features. The generated hierarchical sparse codebook includes multiple levels, with each level being associated with a sparseness factor.
BRIEF DESCRIPTION OF THE DRAWINGS
The detailed description is set forth with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items.
FIG. 1 illustrates an exemplary environment including an example hierarchical sparse coding system 110.
FIG. 2 illustrates the example hierarchical sparse coding system 110 of FIG. 1 in more detail.
FIG. 3 illustrates a first example hierarchical sparse codebook.
FIG. 4 illustrates an exemplary method of generating a hierarchical sparse codebook.
FIG. 5 illustrates a second example hierarchical sparse codebook.
FIG. 6 illustrates an exemplary method of representing an image using a hierarchical sparse codebook.
DETAILED DESCRIPTION
Overview
As noted above, existing image search algorithms or strategies are limited to a small set of images, and are not scalable to include a large number of images. Furthermore, these algorithms or strategies require significant processing time and power, and therefore cannot be performed in real time.
This disclosure describes a hierarchical sparse coding using a hierarchical sparse codebook. The described codebook includes multiple levels. The described codebook allows a gradual determination/classification of an image feature into one or more groups or nodes by traversing the image feature through one or more paths to the one or more groups or nodes. That is, the described codebook compares an image feature of an image with nodes or nodal features of the nodes, beginning from a root level down to a leaf level of the codebook. Furthermore, the image feature is only compared with a subset of nodes at each level of the codebook, and therefore processing time is significant reduced relative to existing image search strategies. The number of determined/classified groups for the image feature is small/sparse in comparison with the total number of available groups or nodes in the codebook. Using the described codebook allows an efficient determination or classification of an image feature, and therefore provides an efficient and time-saving way of representing an image in terms of image features. Furthermore, image retrieval can be enhanced by comparing extracted features of an image with the codebook to obtain a representation of the image that can be used as an index or a reference for retrieving one or more stored images in a database.
Multiple and varied implementations and embodiments are described below. The following section describes an exemplary environment that is suitable for practicing various implementations. After this discussion, representative implementations of systems, devices, and processes for generating a hierarchical sparse codebook or representing an image using the hierarchical sparse codebook are described.
Exemplary Architecture
FIG. 1 illustrates an exemplary environment 100 usable to implement hierarchical sparse representation for image retrieval. The environment 100 includes one or more users 102-1, 102-2, . . . 102-N (which are collectively referred to as 102), a search engine 104, a website 106, an image database 108, a hierarchical sparse coding system 110, and a network 112. The user 102 communicates with the search engine 104, the website 106 or the hierarchical sparse coding system 110 through the network 112 using one or more devices 114-1, 114-2, . . . 114-M, which are collectively referred to as 114.
The devices 114 may be implemented as a variety of conventional computing devices including, for example, a server, a desktop personal computer, a notebook or portable computer, a workstation, a mainframe computer, a mobile computing device, a handheld device, a mobile phone, an Internet appliance, a network router, etc. or a combination thereof.
The network 112 may be a wireless or a wired network, or a combination thereof. The network 112 may be a collection of individual networks interconnected with each other and functioning as a single large network (e.g., the Internet or an intranet). Examples of such individual networks include, but are not limited to, Local Area Networks (LANs), Wide Area Networks (WANs), and Metropolitan Area Networks (MANs). Further, the individual networks may be wireless or wired networks, or a combination thereof.
In one embodiment, the device 114 includes a processor 116 coupled to a memory 118. The memory 118 includes a browser 120 and other program data 122. The memory 118 may be coupled to or associated with, and/or accessible to other devices, such as network servers, router, and/or other devices 114.
In one embodiment, the user 102 uses the browser 120 of the device 114 to submit an image query to the search engine 104 or the website 106. Upon receiving the image query from the user 102, the search engine 104 or the website 106 compares image query with images stored in the image database 108 and retrieves one or more stored images from the image database 108 using a hierarchical sparse codebook that is generated by the hierarchical sparse coding system 110. The search engine 104 or the website 106 then presents the one or more stored images to the user 102.
In another embodiment, the hierarchical sparse coding system 110 generates a hierarchical sparse codebook using images stored in the image database 108 either upon request from the search engine 104 or the website 106, or on a regular basis.
In still another embodiment, the hierarchical sparse coding system 110 encodes or represents an image received from the user 102, the search engine 104 or the website 106 based on the hierarchical sparse codebook. The hierarchical sparse coding system 110 may return a representation of the received image to the user 102, the search engine 104 or the website 106. Additionally or alternatively, the hierarchical sparse coding system 110 may store the representation of the received image or send the image representation to the image database 108 for storage. This image representation may further be stored as an index or a reference for the received image in the image database 108.
FIG. 2 illustrates various components of the exemplary hierarchical sparse coding system 110 in more detail. In one embodiment, the system 110 can include, but is not limited to, a processor 202, a network interface 204, a system memory 206, and an input/output interface 208.
The memory 206 includes a computer-readable media in the form of volatile memory, such as Random Access Memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash RAM. The memory 206 includes program modules 210 and program data 212. The program data 212 may include a hierarchical sparse codebook 214 and other program data 216. Additionally, the memory 206 may further include a feature database 218 storing training image features that are used for generating the hierarchical sparse codebook 214. In one embodiment, the hierarchical sparse codebook 214 may include a hierarchical tree. For example, FIG. 3 shows an example of a hierarchical sparse codebook 214 in a form of a hierarchical tree. The hierarchical codebook may comprise L number of levels, including a root level 302-1, one or more intermediate levels 302-2, . . . 302-L−1, and a leaf level 302-L. Each node of the root level and the one or more intermediate levels may include K number of child nodes. Each node of the hierarchical codebook is associated with a nodal feature. As used herein, a nodal feature is a trained image feature associated with a node of the hierarchical codebook. The nodal feature may be in a form of a vector, for example. Additionally, each node may further be assigned a subset of the training image features. In one embodiment, each level of the hierarchical sparse codebook is associated with a sparseness factor to determine a degree of sparseness for each level. A degree of sparseness for a level is defined as an average number of nodes or nodal features used to represent each training image feature at that level divided by the total number of nodal features at that same level.
The program module 210 may further include an image receiving module 220. The image receiving module 220 may receive an image from the user 102, the search engine 104 or the website 106. The image may be a query image that the user 102 uses to find his/her desired image(s). Upon receiving the image, the image receiving module 220 may transfer the image to a feature extraction module 222, which extracts features that are representative of the image. The feature extraction module 222 may adopt one or more feature extraction techniques such as singular vector decomposition (SVD), Bag of Visual Words (BoW), etc. Examples of the features include, but are not limited to, scale-invariant feature transform (SIFT) features and intensity histograms.
Depending on which mode the system 110 is performing, the feature extraction module 222 may send the extracted features to a feature determination module 224, the feature database 218, or both.
Representing an Image Using a Hierarchical Sparse Codebook
In one embodiment, in response to receiving the extracted features, the feature determination module 224 determines one or more leaf nodes of the hierarchical sparse codebook 214 to represent each extracted feature. Specifically, the feature determination module 224 compares each extracted feature with nodal features associated with a subset of nodes of the hierarchical sparse codebook 214 level by level.
Table 1 shows a first example algorithm for representing an image using the hierarchical sparse codebook 214. The hierarchical sparse codebook 214 in FIG. 3 is one example. When an extracted feature arrives at the root level 302-1 in FIG. 3 or level 0 in Table 1 (below), the feature determination module 224 compares the extracted feature with each nodal feature associated with each node at next level 302-2, i.e., level 1 in Table 1.
TABLE 1
Algorithm 1: Encode an Image using a hierarchical sparse codebook
[Input]: Feature vector set X = {x1, x2, x3, ... xM}, e.g., M
feature vectors extracted from the image l; Constructed
hierarchical sparse codebook T
[Output]: Histogram representation h
[initialization]: Active set Al = {v1, v2, v3, ... vk} for tree level l = 1
[Main]
For i = 1 to M
     For l = 1 to L−1
       1. Measure a distance or a degree of overlap between
feature vector xi and each nodal feature vector in the active set Al;
       2. Generate active set Al+1 by selecting a node or a
nodal feature at tree level l+1 whose parent has a distance from or a
degree of overlap with the feature vector xi respectively less than a
predetermined distance threshold or greater than a predetermined
overlap threshold in step 1;
       l = l + 1
     End For
     i = i + l
End For
The histogram representation h is calculated by counting number of times each node or nodal feature at level l = L of the codebook T is selected for all X = {x1, x2, x3, ... xM}.
The feature determination module 224 may employ a distance measurement module 226 to determine a distance or a degree of overlap between the extracted feature and each nodal feature. The distance measurement module 226 may measure the distance or the degree of overlap according to a predetermined distance metric. For example, if features (i.e., the extracted feature and the nodal feature) are expressed in terms of feature vectors, the predetermined distance metric may include computing a normalized Lp-distance between the extracted feature and the nodal feature, where p can be any integer greater than zero. In one embodiment, the predetermined distance metric may include computing a normalized L2-distance (i.e., Euclidean distance) or a normalized L1-distance (i.e., Manhattan distance) between the extracted feature and the nodal feature. Alternatively, the predetermined distance metric may include computing an inner product of the extracted feature and the nodal feature to determine a degree of overlap therebetween.
In response to determining the distance or the degree of overlap between the extracted feature and each nodal feature at level 302-2, the feature determination module 224 may select a node at level 302-2 whose parent has a distance from the extracted feature that is less than a predetermined distance threshold (e.g., 0.2). Alternatively, the feature determination module 224 may select a node at level 302-2 whose parent has a degree of overlap with the extracted feature that is greater than a predetermined overlap threshold (e.g., zero). The predetermined distance threshold or the predetermined overlap threshold can be adaptively adjusted for each level in order to control a degree of sparseness for each level. A degree of sparseness for a level is defined as an average number of nodes or nodal features used to represent each training image feature at that particular level divided by the total number of nodes or nodal features at that same level. The feature determination module 224 repeats distance measurement for those selected nodes at level 302-2 and node selection for child nodes of the selected nodes at level 302-3. In the above algorithm 1, the feature determination module 224 leaves those unselected nodes at level 302-2 and respective child nodes or branches untouched. More specifically, the feature determination module 224 does not perform any distance determination or node selection for the child nodes of the unselected nodes of level 302-2.
Once the leaf level 302-L of the codebook 214 is reached, one or more leaf nodes are selected according to the above algorithm and are used to represent the extracted feature by the feature determination module 224.
After comparing each extracted feature with the hierarchical sparse codebook 214, the feature determination module 224 may generate a histogram representation of the image. The histogram representation of the image may be generated by counting a number of times each node or nodal feature at a leaf level (i.e., level 302-L in FIG. 3 or level L−1 in Table 1) of the codebook 214 is selected for the extracted features of the images. The histogram representation may be used to represent the image, and may be stored in the image database 108 as an index or a comparison reference for the image.
In some embodiments, the feature determination module 224 may additionally or alternatively employ a cost module 228 to determine which nodes are selected and which nodes are not selected for the extracted feature at each level of the codebook 214. Specifically, the cost module 228 may include a cost function. Table 2 (below) shows a second example algorithm for representing an image using the hierarchical sparse codebook 214.
The hierarchical sparse codebook in FIG. 3 is used for illustration. When an extracted feature xi arrives at the root level 302-1 in FIG. 3 or level 0 in Table 2, an active set A1 is initially set to include each nodal feature associated with each node at a next level 302-2, i.e., level 1. A cost function |xi−uiAl|L1+λ|ui|L1 is then minimized with respect to a response ui. Each entry, ui j, in the response ui represents a response of the extracted feature xi to corresponding nodal feature vj. After minimizing the cost function, a new active set A2 may be created by selecting a node or a nodal feature in level 302-3 in FIG. 3 or level 2 in Table 2 whose parent at level 302-2 or level 1 gives a response ui j greater than a predetermined response threshold. The processes of cost function minimization and nodal feature selection are repeated for level 302-3 in FIG. 3 or level 2 in Table 2, until leaf level (i.e., level 320-L in FIG. 3 or level L−1 in Table 2) of the codebook 214 is reached. One or more nodes or nodal features at the leaf level of the codebook 214 having a response with the extracted feature xi greater than a predetermined response threshold may be selected to represent the extracted feature xi.
Upon representing each extracted feature of the image using one or more nodes or nodal features at the leaf level of the codebook 214, the feature determination module 224 may generate a histogram representation of the image by summing and normalizing all responses of all X={x1, x2, x3, . . . xm} at the leaf level of the codebook 214.
TABLE 2
Algorithm 2: Encode an Image using a hierarchical sparse codebook
[Input]: Feature vector set X = {x1, x2, x3, ... xM}, e.g., M feature
vectors extracted from the image l; Constructed hierarchical sparse
codebook T
[Output]: Histogram representation h
[initialization]: Active set Al = {v1, v2, v3, ... vk} for tree level l = 1
[Main]
For i = 1 to M
    For l = 1 to L−1
      1. Encode feature vector xi using the active set Al by
minimizing a cost function |xi − uiAl|L1 + λ|ui|L1, where λ is a parameter
to control a degree of sparseness for representing the feature vector xi
in terms of nodal feature vectors in Al, and | |L1 represents L1-norm.
      2. Generate active set Al+1 by selecting a node or a nodal
feature at tree level l+1 whose parent gives a response ui j greater than a
predetermined response threshold in step 1;
      l = l + 1
    End For
    i = i + 1
End For
The histogram representation h is calculated by summing and normalizing all responses of all X = {x1, x2, x3, ... xM} at level l = L of the codebook tree T.
In some embodiments, a parameter λ, which controls the degree of sparseness, may be different for different levels of the codebook 214. For example, the parameter λ may be smaller for levels closer to the root level to allow more nodes or nodal features to be selected at those levels, and may gradually increase towards the leaf level of the codebook 214 to avoid over-number of selected nodes or nodal features at the leaf level. However, once the parameter λ is determined for each level, the parameter λ will not be modified until the codebook 214 is reconstructed or representations of the images are redone.
Although two example algorithms for representing an image are described above, the present disclosure is not limited thereto. Any algorithm that takes advantage of the described hierarchical sparse codebook 214 and represents each extracted feature of an image in terms of a sparse representation of one or more nodes or nodal features of the codebook 214 are covered in the present disclosure.
In one embodiment, an image may be represented using a combination of the above two algorithms. For example, algorithm 1 may first be used to find an active set up to a predetermined level of the codebook 214 for each image feature of the image. Algorithm 2 may then be used for the rest of the levels of the codebook 214 to obtain one or more nodes or nodal features at the leaf level of the codebook 214 for each image feature. Depending on values of the thresholds employed in algorithm 1, algorithm 1 can allow more nodes or nodal features to be selected for an image feature at each level, and therefore permits a broader exploration of nodal features to represent the image feature. This avoids pre-mature elimination of nodes or nodal features that are actually good candidates for representing the image feature. As the image feature traverses towards the leaf level however, algorithm 2 may be employed to limit number of selected nodes or nodal features at subsequent levels in order to prevent the number of selected nodes or nodal features (i.e., active set in Table 1) from going too large in size.
Upon obtaining a representation (e.g., histogram representation) of the image using one of the above algorithms, the feature determination module 224 may save the representation in the image database 108 and use this representation as an index for retrieving the image. Additionally or alternatively, this representation can be saved as a reference for comparison with representations of other images such as a query image during image retrieval.
In one embodiment, if the image is a query image submitted by the user 102, or forwarded by the search engine 104 or the website 106, the representation of the query image may be used to retrieve one or more stored images in the image database 108. For example, the representation of the query image may be compared with representations of images stored in the image database 108.
In another embodiment, a classifier may be used to classify the query image into one of a plurality of classes (e.g., automobile class) based on the representation of the query image. The classifier may include a neural network, a Bayesian belief network, support vector machines (SVMs), fuzzy logic, Hidden Markov Model (HMM), or any combination thereof, etc. The classifier may be trained on a subset of the representations of the images stored in the image database 108. Upon classifying the query image into a class, stored images within that class may be retrieved and presented to the user 102 according to respective frequencies of retrieval within a certain interval (e.g., the past one day, past one week, past one month, etc).
Additionally or alternatively, the representation of the query image may be compared with the representations of the stored images according to an image similarity metric. The image similarity metric is a measure of similarity between two images, and may return a similarity score to represent a relative resemblance of a stored image with respect to the query image. In one embodiment, a similarity measurement module 230 may be used to calculate a similarity score of a stored image with respect to the query image based upon the representation of the query image. For example, the similarity measurement module 230 calculates the similarity score based on a ratio of the number of common features in the representations of the query image and the stored image with respect to their average number of features.
In another embodiment, the similarity measurement module 230 may compute a correlation between the representation of the query image with representation of a stored image. For example, if an image is represented in the form of a histogram as described above, a correlation between a histogram representation of the query image and a histogram representation of a stored image may be computed to obtain a similarity score therebetween. In one embodiment, each of these histogram representations may first be normalized such that a respective area integral of the histogram representations are normalized to one, for example.
Based on the similarity scores of these stored images with respect to the query image, one or more stored images may be presented to the user 102, and arranged according to their similarity scores, for example, in a descending order of their similarity scores.
The program module 210 may further include a codebook generation module 232. The codebook generation module 232 generates the hierarchical sparse codebook 214 based on the training image features that are stored in the feature database 218. Additionally or alternatively, the codebook generation module 232 generates the hierarchical sparse codebook 214 based on images stored in the image database 108. In one embodiment, the codebook generation module 232 generates or reconstructs the hierarchical sparse codebook 214 on a regular basis, e.g., each day, each week, each month, or each year. Alternatively, the hierarchical sparse codebook 214 may be generated upon request, for example, from the search engine 104 or the website 106.
In still another embodiment, the hierarchical sparse codebook 214 is reconstructed based on performance of the codebook 214 in retrieving stored images in response to query images submitted from the user 102. For example, the program data 212 may further include image query data 234. The image query data 234 may include query images that have been submitted by one or more users 102 and stored images that were returned in response to the query images. Additionally or alternatively, the image query data 234 may include one or more stored images that have been selected by the users 102 in response to the query images. In one embodiment, the image query data 234 may further include similarity scores of the one or more selected images with respect to the query images. In an event that the image query data 234 includes the similarity scores of the selected images, the codebook 214 may be reconstructed in response to an average similarity score of the selected images in the image query data 234 being less than a predetermined similarity threshold. The predetermined similarity threshold may be set by an administrator or operator of the system 110 according to the accuracy and/or computing requirements, for example. For example, if a perfect match between a query image and a stored image has a similarity score of one, the codebook 214 may be reconstructed in response to the average similarity score being less than 0.7, for example.
Generating a Hierarchical Sparse Codebook
When a hierarchical sparse codebook 214 is generated or reconstructed, the codebook generation module 232 may receive a plurality of training image features from the feature database 218. Additionally or alternatively, the codebook generation module 232 may receive a plurality of images from the image database 108 and use the feature extraction module 222 to extract a plurality of image features for training purposes. Upon receiving the plurality of training image features, the codebook generation module 232 generates a hierarchical sparse codebook 214 according to a codebook generation algorithm. An example algorithm is illustrated in Table 3 (below).
For example, k number of nodes at level 1 are branched out from a root node at level 0. Each node at level 1 is associated with a nodal feature which is a training image feature randomly selected from the plurality of training image features. The plurality of training image features are then compared with each nodal feature at level 1 in order to assign a subset of training image features to the corresponding node at level 1. The subset of training image features assigned to a node includes a training image feature that has a response (e.g., a degree of overlap) to a nodal feature associated with that node greater than a predetermined response threshold, e.g., zero. Upon assigning a subset of training image features to a node at level 1, a set of k nodal features are trained with respect to the assigned subset of training image features for the node. Specifically, based on the assigned subset of training image features, a cost function is minimized with respect to the set of k nodal features:
Σi |x li j −u li j V l j|L1lΣi |u li j|L1  (1)
where
    • xli j represents a training image feature in a subset Xl j
    • Vl j represents the set of k nodal features that are trained for node j at level l, i.e., ol j,
    • uli j represents a response of xli j to Vl j, and
    • λl represents a parameter to control a degree of sparseness for level l.
Upon obtaining the set of k nodal features that minimizes the above cost function for the node, this set of k nodal features are assigned to child nodes of the node at next level, i.e., level 2. These processes of cost function minimization and nodal feature assignment are repeated for each node at each level until each node at the leaf level of the codebook is assigned a nodal feature and a subset of training image features or leaf level of the codebook is reached. At this point, the hierarchical sparse codebook is generated.
TABLE 3
Algorithm 3: Generate a hierarchical sparse codebook
[Input]: Feature vector set X = {x1, x2, x3, ... xN}, e.g., N feature
vectors from a set of training images
[Output]: K-branch tree T, level l = 0, 1, 2, ... L, each node being
associated with a nodal feature vector v
[initialization]: Branch a root node (at level l = 0) into K nodes
(at level l = 1), each of the K nodes at level l = 1 is randomly
selected from the feature vector set X [Main]
For l = 1 to L−1
    1. For each node j at level l, i.e., ol j, collect a subset of X
which has a response with a nodal feature vector associated with
node ol j greater than a predetermined response threshold, and is
denoted as Xl j;
    2. For each node j at level l, ol j, based on Xl j, train a set of
K nodal features Vl j, by minimizing a cost function Σi|xli j − uli jVl j|L1 +
λl Σi|uli j|L1 with respect to a visual codebook associated with node ol j,
i.e., Vl j, then child nodes of node ol j at level l+1, are associated with
nodal features of Vl j;
    l = l + 1
End For
The parameter λl (which is also called a sparseness factor for level l) can be adaptively adjusted to change a degree of sparseness for the level l. In one embodiment, the parameter λl or the degree of sparseness for a level is adjusted to be less than a predetermined threshold level. In another embodiment, the parameter λl or the degree of sparseness for a level is adjusted to be within a predetermined range. In still another embodiment, the parameter λl or the degree of sparseness for each level is collectively adjusted to obtain an overall degree of sparseness for the codebook and the plurality of training image features that is less than a predetermined overall threshold or within a predetermined overall range. The predetermined threshold level or the predetermined range may be the same or different for different levels.
In one embodiment, the above algorithm may further be modified. Specifically, after randomly assigning k number of training image features to be nodal features associated with the nodes at level 1, the algorithm may further train these nodal features to minimize the above cost function for level 1. Upon obtaining a set of optimized nodal features that minimize the cost function of level 1, the algorithm may assign these optimized nodal features to the nodes of level 1. The algorithm further assigns a subset of training image features that have responses greater than a predetermined response threshold to each node of level 1.
Additionally or alternatively, the algorithm may further specify that a training image feature that is assigned to a node is also a training image feature that has been assigned to the parent of the node.
Alternative Embodiments
Although the hierarchical sparse codebook 214 is described to include a hierarchical tree in the foregoing embodiments, the codebook 214 is not limited thereto. The hierarchical sparse codebook 214 can include any hierarchical structure. In one embodiment, the hierarchical sparse codebook 214 may initially include a hierarchical tree. After or during the training phase of the hierarchical sparse codebook 214, however, a node (i.e., a node at an intermediate level and/or a leaf level of the codebook 214) may be purged based on an average degree of overlap between associated training image features and corresponding nodal feature of the node. For example, a node may be purged if corresponding average degree of overlap between associated training image features and corresponding nodal feature is less than a predetermined threshold. Furthermore, this predetermined threshold may vary among different levels. In one embodiment, the predetermined threshold for average degree of overlap is lower at a higher level (i.e., a level closer to the root level of the codebook 214), and increases towards the leaf level of the codebook 214. This is because the number of training image features assigned to a node at the higher level is usually greater and a nodal feature associated with the node is more generalized with respect to the assigned training image features. Having a lower threshold therefore avoids pre-mature purging of the node at the higher level. On the other hand, a node at a lower level is usually assigned with a fewer number of training image features, and a corresponding nodal feature may be more specific to the assigned training image features. Therefore, the predetermined threshold associated with the node at the lower level can be higher to reflect a change from generality to specificity of nodal features from a high level to a low level of the codebook 214.
In another embodiment, the hierarchical sparse codebook may be a hierarchical structure having a plurality of levels, with each level having a predetermined number of nodes. Rather than having an equal number of intermediate child nodes for each node at one level, the number of intermediate child nodes of a node at that level may be determined upon the number of training image features assigned to that particular node. For example, the number of intermediate child nodes of a first node at one level is greater than the number of intermediate child nodes of a second node at the same level if the number of training image features assigned to the first node is greater than the number of training image features assigned to the second node. In essence, a node having a greater number of training image features is allocated more resources (i.e., child nodes) to represent these training image features while a node having a fewer number of training image features is allocated fewer resources, thereby optimizing the use of resources which are usually limited.
Exemplary Methods
Exemplary methods for generating a hierarchical sparse codebook or representing an image using the hierarchical sparse codebook are described with reference to FIGS. 4-6. These exemplary methods can be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, and the like that perform particular functions or implement particular abstract data types. The methods can also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communication network. In a distributed computing environment, computer executable instructions may be located both in local and remote computer storage media, including memory storage devices.
The exemplary methods are illustrated as a collection of blocks in a logical flow graph representing a sequence of operations that can be implemented in hardware, software, firmware, or a combination thereof. The order in which the methods are described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the methods, or alternate methods. Additionally, individual blocks may be deleted from the methods without departing from the spirit and scope of the subject matter described herein. In the context of software, the blocks represent computer instructions that, when executed by one or more processors, perform the recited operations.
FIG. 4 illustrates an exemplary method 400 of generating a hierarchical sparse codebook.
At block 402, a plurality of training image features are received. This plurality of training image features may be obtained from one or more databases and/or one or more search engines. The plurality of training image features may be extracted from a plurality of images that are stored in the one or more databases and/or the one or more search engines.
At block 404, a hierarchical sparse codebook is generated based at least upon the plurality of training image features. The hierarchical sparse codebook may be generated to include a plurality of levels. In one embodiment, each of the plurality of levels may be associated with a sparseness factor as shown in FIG. 3, for example. Each level of the hierarchical sparse codebook is generated by adjusting corresponding sparseness factors to be less than respective predetermined thresholds or within respective predetermined ranges. Additionally or alternatively, the hierarchical sparse codebook may be generated by adjusting the sparseness factor of each level to obtain an overall degree of sparseness for the codebook and the plurality of training image features. In one embodiment, the sparseness factor of each level is adjusted to obtain an overall degree of sparseness that is less than a predetermined overall threshold or within a predetermined overall range. This predetermined overall threshold or predetermined overall range may be set by an administrator or an operator of the system 112 based on specified computing requirements or needs.
Additionally or alternatively, generating the hierarchical sparse codebook at block 404 may include representing each training image feature by a sparse number of leaf nodes or nodal features that are associated with the leaf nodes of the hierarchical sparse codebook. FIG. 5 shows an example of this hierarchical sparse codebook. Upon generating the hierarchical sparse codebook, each training image feature j is represented by a sparse number of nodes or nodal features at the leaf level of the codebook.
FIG. 6 illustrates an exemplary method 600 of representing or encoding an image using a hierarchical sparse codebook.
At block 602, an image is received. This image may be received from a user for image query. Alternatively, this image may be received from a search engine or a website for encoding the image.
At block 604, a plurality of image features are extracted from the image.
At block 606, each image feature of the image is compared with a hierarchical sparse codebook to obtain one or more leaf-level features (i.e., nodal features at leaf level) of the codebook. The one or more leaf-level features represent a sparse code representation of the respective image feature.
At block 608, a histogram for the image is generated based upon the one or more leaf-level features of each image feature of the image. In one embodiment, the histogram represents respective number of times that each leaf-level feature of the codebook is encountered by the plurality of image features of the image.
At block 610, the image is represented by the histogram. The histogram may further be stored in a database as an index for the image. Additionally or alternatively, the histogram may be acted a reference for comparison between another image such as a query image during image retrieval. For example, the histogram of the query image may be compared with histograms of a subset of stored images in the database. In one embodiment, the comparison may be performed by computing correlations between the histogram of the query image and the histograms of the subset of stored images. One or more stored images having a correlation greater than a predetermined correlation threshold may be retrieved and presented to the user.
Any of the acts of any of the methods described herein may be implemented at least partially by a processor or other electronic device based on instructions stored on one or more computer-readable media. Computer-readable media can be any available media that can be accessed during generation of the hierarchical sparse codebook or encoding an image using the hierarchical sparse codebook. By way of example, and not limitation, computer-readable media may comprise volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer-readable media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information. Combinations of any of the above should also be included within the scope of computer-readable media.
CONCLUSION
Although the invention has been described in language specific to structural features and/or methodological acts, it is to be understood that the invention is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the invention.

Claims (20)

What is claimed is:
1. One or more memory storage devices storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising:
receiving a plurality of training image features; and
generating a hierarchical sparse codebook based at least upon the plurality of training image features, the generating comprising creating a plurality of levels for the hierarchical sparse codebook, each level being associated with a respective sparseness factor.
2. The one or more memory storage devices as recited in claim 1, wherein the generating further comprises adjusting the respective sparseness factor of each level to obtain a degree of sparseness which is less than a predetermined threshold or within a predetermined range.
3. The one or more memory storage devices as recited in claim 1, wherein the generating further comprises adjusting the respective sparseness factor of each level to obtain an overall degree of sparseness for the hierarchical sparse codebook, which is less than a predetermined threshold or within a predetermined range, the overall degree of sparseness.
4. The one or more memory storage devices as recited in claim 1, wherein the generating further comprises assigning an image feature set to a node of a level of the plurality of levels, the image feature set comprising a subset of the plurality of training image features.
5. The one or more memory storage devices as recited in claim 4, wherein the assigning comprises assigning a training image feature of the plurality of training image features to the node if a degree of overlap between the training image feature and a nodal feature of the node is greater than or equal to a predetermined threshold.
6. The one or more memory storage devices as recited in claim 1, wherein the plurality of levels comprises at least a root level, a first level and a second level.
7. The one or more memory storage devices as recited in claim 6, wherein the generating further comprises:
creating a first number of nodes at the first level, the first number of nodes comprising a first number of first-level nodal features;
adjusting the first number of first-level nodal features to minimize a cost of a first cost function based at least upon the plurality of training image features, the first cost function comprising a first sparseness function to control a degree of sparseness associated with representations of the plurality of training image features at the first level; and
for each of the first number of nodes at the first level, assigning a first-level nodal feature set based at least upon the plurality of training image features and an assignment scheme.
8. The one or more memory storage devices as recited in claim 7, wherein the generating further comprises:
for each of the first number of nodes at the first level,
generating a second number of nodes at the second level, the second number of nodes comprising a second number of second-level nodal features,
adjusting the second number of second-level nodal features to minimize a cost of a second cost function based at least upon the respective first-level nodal feature set, the second cost function comprising a second sparseness function to control a degree of sparseness associated with representations of the plurality of training image features at the second level; and
for each of the second number of nodes at the second level, assigning a second-level nodal feature set based at least upon the respective first-level nodal feature set and the assignment scheme.
9. The one or more memory storage devices as recited in claim 8, wherein the assignment scheme comprises a predetermined threshold, and wherein assigning the first-level nodal feature or assigning the second-level nodal feature comprises assigning a training image feature to a node if a degree of overlap between the training image feature and a nodal feature associated with the node is greater than or equal to the predetermined threshold.
10. The one or more memory storage devices as recited in claim 1, further comprising:
receiving an image query comprising an example image from a user;
extracting one or more image features from the example image;
comparing each of the one or more image features with the hierarchical sparse codebook to obtain one or more leaf-level features of the codebook;
generating a histogram for the example image based on the one or more leaf-level features of each of the one or more image features from the example image;
retrieving one or more database images based at least upon the histogram; and
presenting the one or more database images to the user.
11. The one or more memory storage devices as recited in claim 10, wherein the retrieving comprises:
computing correlations between the histogram of the example image with histograms of a subset of database images stored in a database; and
retrieving the one or more database images that have corresponding correlations greater than a predetermined correlation threshold.
12. A computer-implemented method for generating a hierarchical sparse codebook, the method comprising:
receiving a plurality of training image features; and
generating a hierarchical sparse codebook based at least upon the plurality of training image features, the generating comprising encoding each training image feature using a sparse number of nodal features that are associated with leaf nodes of the hierarchical sparse codebook.
13. The computer-implemented method as recited in claim 12, wherein the generating further comprises:
generating a plurality of levels for the hierarchical sparse codebook, each level comprising a predetermined number of nodes;
associating each node of each level with a nodal feature;
adjusting each nodal feature of each node to minimize a cost of a cost function of the respective level based at least upon an image feature set of a parent node of the respective node; and
assigning, to each node, a subset of the image feature set of the respective parent node.
14. The computer-implemented method as recited in claim 13, wherein the cost function of each level comprises a sparseness function defining a degree of sparseness of representation of each training image feature at the respective node level.
15. The computer-implemented method as recited in claim 13, wherein the assigning comprises:
for each node, obtaining a degree of overlap between the respective nodal feature and an image feature of the image feature set of the respective parent node;
assigning, to the respective node, the image feature of the image feature set of respective parent node if the degree of overlap is greater than a predetermined threshold.
16. The computer-implemented method as recited in claim 12, further comprising:
receiving an image query comprising an example image from a user;
extracting one or more image features from the example image;
comparing each of the one or more image features with the hierarchical sparse codebook to obtain one or more leaf-level features of the codebook;
generating a histogram for the example image based on the one or more leaf-level features of each of the one or more image features from the example image;
retrieving one or more database images based at least upon the histogram; and
presenting the one or more database images to the user.
17. The computer-implemented method as recited in claim 16, wherein the retrieving comprises:
computing correlations between the histogram of the example image with histograms of a subset of database images stored in a database; and
retrieving the one or more database images that have corresponding correlations greater than a predetermined correlation threshold.
18. A computer-implemented method comprising:
receiving an image;
extracting a plurality of image features from the image;
comparing each image feature with a hierarchical sparse codebook to obtain one or more leaf-level features of the codebook, the one or more leaf-level features representing a sparse code representation of the respective image feature;
generating a histogram for the image based at least upon the one or more leaf-level features of each image feature of the image; and
representing the image by the histogram.
19. The computer-implemented method as recited in claim 18, further comprising:
generating an index for the image based at least upon the histogram; and
storing the index and the image in a database.
20. The computer-implemented method as recited in claim 18, wherein the image comprises an example image received from a user for image query, and wherein the method further comprises:
retrieving one or more database images based at least upon the histogram; and
presenting the one or more database images to the user.
US12/943,805 2010-11-10 2010-11-10 Hierarchical sparse representation for image retrieval Active 2031-08-12 US8463045B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/943,805 US8463045B2 (en) 2010-11-10 2010-11-10 Hierarchical sparse representation for image retrieval

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/943,805 US8463045B2 (en) 2010-11-10 2010-11-10 Hierarchical sparse representation for image retrieval

Publications (2)

Publication Number Publication Date
US20120114248A1 US20120114248A1 (en) 2012-05-10
US8463045B2 true US8463045B2 (en) 2013-06-11

Family

ID=46019687

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/943,805 Active 2031-08-12 US8463045B2 (en) 2010-11-10 2010-11-10 Hierarchical sparse representation for image retrieval

Country Status (1)

Country Link
US (1) US8463045B2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130142401A1 (en) * 2011-12-05 2013-06-06 Canon Kabushiki Kaisha Image processing apparatus and image processing method
US20140355880A1 (en) * 2012-03-08 2014-12-04 Empire Technology Development, Llc Image retrieval and authentication using enhanced expectation maximization (eem)
US20170132826A1 (en) * 2013-09-25 2017-05-11 Heartflow, Inc. Systems and methods for controlling user repeatability and reproducibility of automated image annotation correction
US20170364740A1 (en) * 2016-06-17 2017-12-21 International Business Machines Corporation Signal processing
US10007679B2 (en) 2008-08-08 2018-06-26 The Research Foundation For The State University Of New York Enhanced max margin learning on multimodal data mining in a multimedia database
US10319035B2 (en) 2013-10-11 2019-06-11 Ccc Information Services Image capturing and automatic labeling system
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8577131B1 (en) * 2011-07-12 2013-11-05 Google Inc. Systems and methods for visual object matching
US9311337B2 (en) * 2012-12-20 2016-04-12 Broadcom Corporation Image subset determination and processing
CN103324954B (en) * 2013-05-31 2017-02-08 中国科学院计算技术研究所 Image classification method based on tree structure and system using same
CN103559683B (en) * 2013-09-24 2016-08-10 浙江大学 The damaged image restorative procedure represented based on the popular low-rank of multi views
CN103678551B (en) * 2013-12-05 2017-09-26 银江股份有限公司 A kind of large-scale medical image retrieval encoded based on Random sparseness
CN103679150B (en) * 2013-12-13 2016-12-07 深圳市中智科创机器人有限公司 A kind of facial image sparse coding method and apparatus
KR102024867B1 (en) * 2014-09-16 2019-09-24 삼성전자주식회사 Feature extracting method of input image based on example pyramid and apparatus of face recognition
EP3166021A1 (en) * 2015-11-06 2017-05-10 Thomson Licensing Method and apparatus for image search using sparsifying analysis and synthesis operators
CN104765878A (en) * 2015-04-27 2015-07-08 合肥工业大学 Sparse coding algorithm suitable for multi-modal information and application thereof
US10102448B2 (en) * 2015-10-16 2018-10-16 Ehdp Studios, Llc Virtual clothing match app and image recognition computing device associated therewith
US10977565B2 (en) * 2017-04-28 2021-04-13 At&T Intellectual Property I, L.P. Bridging heterogeneous domains with parallel transport and sparse coding for machine learning models
CN107634787A (en) * 2017-08-22 2018-01-26 南京邮电大学 A kind of method of extensive MIMO millimeter wave channel estimations
CN109815355A (en) * 2019-01-28 2019-05-28 网易(杭州)网络有限公司 Image search method and device, storage medium, electronic equipment
CN111125577A (en) * 2019-11-22 2020-05-08 百度在线网络技术(北京)有限公司 Webpage processing method, device, equipment and storage medium

Citations (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999045483A1 (en) 1998-03-04 1999-09-10 The Trustees Of Columbia University In The City Of New York Method and system for generating semantic visual templates for image and video retrieval
US6285995B1 (en) 1998-06-22 2001-09-04 U.S. Philips Corporation Image retrieval system using a query image
US20010056415A1 (en) 1998-06-29 2001-12-27 Wei Zhu Method and computer program product for subjective image content smilarity-based retrieval
US6408293B1 (en) 1999-06-09 2002-06-18 International Business Machines Corporation Interactive framework for understanding user's perception of multimedia data
US20020136468A1 (en) 2001-03-20 2002-09-26 Hung-Ming Sun Method for interactive image retrieval based on user-specified regions
US6744935B2 (en) 2000-11-02 2004-06-01 Korea Telecom Content-based image retrieval apparatus and method via relevance feedback by using fuzzy integral
US20040111453A1 (en) 2002-12-06 2004-06-10 Harris Christopher K. Effective multi-class support vector machine classification
US20040175041A1 (en) 2003-03-06 2004-09-09 Animetrics, Inc. Viewpoint-invariant detection and identification of a three-dimensional object from two-dimensional imagery
US20040249801A1 (en) 2003-04-04 2004-12-09 Yahoo! Universal search interface systems and methods
US6847733B2 (en) 2001-05-23 2005-01-25 Eastman Kodak Company Retrieval and browsing of database images based on image emphasis and appeal
US20050057570A1 (en) 2003-09-15 2005-03-17 Eric Cosatto Audio-visual selection process for the synthesis of photo-realistic talking-head animations
US20050192992A1 (en) 2004-03-01 2005-09-01 Microsoft Corporation Systems and methods that determine intent of data and respond to the data based on the intent
WO2006005187A1 (en) 2004-07-09 2006-01-19 Parham Aarabi Interactive three-dimensional scene-searching, image retrieval and object localization
US20060112092A1 (en) 2002-08-09 2006-05-25 Bell Canada Content-based image retrieval method
US7065521B2 (en) 2003-03-07 2006-06-20 Motorola, Inc. Method for fuzzy logic rule based multimedia information retrival with text and perceptual features
US7099860B1 (en) 2000-10-30 2006-08-29 Microsoft Corporation Image retrieval systems and methods with semantic and feature based relevance feedback
US7113944B2 (en) 2001-03-30 2006-09-26 Microsoft Corporation Relevance maximizing, iteration minimizing, relevance-feedback, content-based image retrieval (CBIR).
US20070104378A1 (en) 2003-03-05 2007-05-10 Seadragon Software, Inc. Method for encoding and serving geospatial or other vector data as images
US7240075B1 (en) 2002-09-24 2007-07-03 Exphand, Inc. Interactive generating query related to telestrator data designating at least a portion of the still image frame and data identifying a user is generated from the user designating a selected region on the display screen, transmitting the query to the remote information system
US20070214172A1 (en) 2005-11-18 2007-09-13 University Of Kentucky Research Foundation Scalable object recognition using hierarchical quantization with a vocabulary tree
US20070259318A1 (en) 2006-05-02 2007-11-08 Harrison Elizabeth V System for interacting with developmentally challenged individuals
US20070288453A1 (en) 2006-06-12 2007-12-13 D&S Consultants, Inc. System and Method for Searching Multimedia using Exemplar Images
KR100785928B1 (en) 2006-07-04 2007-12-17 삼성전자주식회사 Method and system for searching photograph using multimodal
US20090125510A1 (en) 2006-07-31 2009-05-14 Jamey Graham Dynamic presentation of targeted information in a mixed media reality recognition system
US20090171929A1 (en) 2007-12-26 2009-07-02 Microsoft Corporation Toward optimized query suggeston: user interfaces and algorithms
US7624337B2 (en) 2000-07-24 2009-11-24 Vmark, Inc. System and method for indexing, searching, identifying, and editing portions of electronic multimedia files
US20100088342A1 (en) 2008-10-04 2010-04-08 Microsoft Corporation Incremental feature indexing for scalable location recognition
US7801893B2 (en) 2005-09-30 2010-09-21 Iac Search & Media, Inc. Similarity detection and clustering of images

Patent Citations (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999045483A1 (en) 1998-03-04 1999-09-10 The Trustees Of Columbia University In The City Of New York Method and system for generating semantic visual templates for image and video retrieval
US6285995B1 (en) 1998-06-22 2001-09-04 U.S. Philips Corporation Image retrieval system using a query image
US20010056415A1 (en) 1998-06-29 2001-12-27 Wei Zhu Method and computer program product for subjective image content smilarity-based retrieval
US6408293B1 (en) 1999-06-09 2002-06-18 International Business Machines Corporation Interactive framework for understanding user's perception of multimedia data
US7624337B2 (en) 2000-07-24 2009-11-24 Vmark, Inc. System and method for indexing, searching, identifying, and editing portions of electronic multimedia files
US7099860B1 (en) 2000-10-30 2006-08-29 Microsoft Corporation Image retrieval systems and methods with semantic and feature based relevance feedback
US6744935B2 (en) 2000-11-02 2004-06-01 Korea Telecom Content-based image retrieval apparatus and method via relevance feedback by using fuzzy integral
US20020136468A1 (en) 2001-03-20 2002-09-26 Hung-Ming Sun Method for interactive image retrieval based on user-specified regions
US7113944B2 (en) 2001-03-30 2006-09-26 Microsoft Corporation Relevance maximizing, iteration minimizing, relevance-feedback, content-based image retrieval (CBIR).
US6847733B2 (en) 2001-05-23 2005-01-25 Eastman Kodak Company Retrieval and browsing of database images based on image emphasis and appeal
US20060112092A1 (en) 2002-08-09 2006-05-25 Bell Canada Content-based image retrieval method
US7240075B1 (en) 2002-09-24 2007-07-03 Exphand, Inc. Interactive generating query related to telestrator data designating at least a portion of the still image frame and data identifying a user is generated from the user designating a selected region on the display screen, transmitting the query to the remote information system
US20040111453A1 (en) 2002-12-06 2004-06-10 Harris Christopher K. Effective multi-class support vector machine classification
US20070104378A1 (en) 2003-03-05 2007-05-10 Seadragon Software, Inc. Method for encoding and serving geospatial or other vector data as images
US20040175041A1 (en) 2003-03-06 2004-09-09 Animetrics, Inc. Viewpoint-invariant detection and identification of a three-dimensional object from two-dimensional imagery
US7065521B2 (en) 2003-03-07 2006-06-20 Motorola, Inc. Method for fuzzy logic rule based multimedia information retrival with text and perceptual features
US20040249801A1 (en) 2003-04-04 2004-12-09 Yahoo! Universal search interface systems and methods
US20050057570A1 (en) 2003-09-15 2005-03-17 Eric Cosatto Audio-visual selection process for the synthesis of photo-realistic talking-head animations
US20050192992A1 (en) 2004-03-01 2005-09-01 Microsoft Corporation Systems and methods that determine intent of data and respond to the data based on the intent
WO2006005187A1 (en) 2004-07-09 2006-01-19 Parham Aarabi Interactive three-dimensional scene-searching, image retrieval and object localization
US7801893B2 (en) 2005-09-30 2010-09-21 Iac Search & Media, Inc. Similarity detection and clustering of images
US20070214172A1 (en) 2005-11-18 2007-09-13 University Of Kentucky Research Foundation Scalable object recognition using hierarchical quantization with a vocabulary tree
US20070259318A1 (en) 2006-05-02 2007-11-08 Harrison Elizabeth V System for interacting with developmentally challenged individuals
US20070288453A1 (en) 2006-06-12 2007-12-13 D&S Consultants, Inc. System and Method for Searching Multimedia using Exemplar Images
KR100785928B1 (en) 2006-07-04 2007-12-17 삼성전자주식회사 Method and system for searching photograph using multimodal
US20090125510A1 (en) 2006-07-31 2009-05-14 Jamey Graham Dynamic presentation of targeted information in a mixed media reality recognition system
US20090171929A1 (en) 2007-12-26 2009-07-02 Microsoft Corporation Toward optimized query suggeston: user interfaces and algorithms
US20100088342A1 (en) 2008-10-04 2010-04-08 Microsoft Corporation Incremental feature indexing for scalable location recognition

Non-Patent Citations (120)

* Cited by examiner, † Cited by third party
Title
Abdel-Mottaleb, et al., "Performance Evaluation of Clustering Algorithms for Scalable Image Retrieval", retrieved on Jul. 30, 2010 at <<http://www.umiacs.umd.edu/˜gopal/Publications/cvpr98.pdf>>, John Wiley—IEEE Computer Society, Empirical Evaluation Techniques in Computer Vision, Santa Barbara, CA, 1998, pp. 45-56.
Abdel-Mottaleb, et al., "Performance Evaluation of Clustering Algorithms for Scalable Image Retrieval", retrieved on Jul. 30, 2010 at >, John Wiley-IEEE Computer Society, Empirical Evaluation Techniques in Computer Vision, Santa Barbara, CA, 1998, pp. 45-56.
Baumberg, "Reliable Feature Matching Across Widely Separated Views", retrieved on Jul. 30, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.21.1666&rep=rep1&type=pdf>>, IEEE, Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), Hilton Head Island, SC, vol. 1, 2000, pp. 774-781.
Baumberg, "Reliable Feature Matching Across Widely Separated Views", retrieved on Jul. 30, 2010 at >, IEEE, Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), Hilton Head Island, SC, vol. 1, 2000, pp. 774-781.
Beckmann, et al., "The R-Tree: An Efficient and Robust Access Method for Points and Rectangles", retrieved on Jul. 30, 2010 at <<http://epub.ub.uni-muenchen.de/4256/1/31.pdf>>, ACM, SIGMOD Record, vol. 19, No. 2, Jun. 1990, pp. 322-331.
Beckmann, et al., "The R-Tree: An Efficient and Robust Access Method for Points and Rectangles", retrieved on Jul. 30, 2010 at >, ACM, SIGMOD Record, vol. 19, No. 2, Jun. 1990, pp. 322-331.
Belussi, et al., "Estimating the Selectivity of Spatial Queries Using the ‘Correlation’ Fractal Dimension", retrieved on Jul. 30, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.4521&rep=rep1&type=pdf>>, Morgan Kaufmann Publishers, Proceedings of International Conference on Very Large Data Bases, 1995, pp. 299-310.
Belussi, et al., "Estimating the Selectivity of Spatial Queries Using the 'Correlation' Fractal Dimension", retrieved on Jul. 30, 2010 at >, Morgan Kaufmann Publishers, Proceedings of International Conference on Very Large Data Bases, 1995, pp. 299-310.
Bengio, et al., "Group Sparse Coding", retrieved on Jul. 7, 2010 at <<http://books.nips.cc/papers/files/nips22/NIPS2009—0865.pdf>>, MIT Press, Advances in Neural Information Processing Systems (NIPS), 2009, pp. 1-8.
Bengio, et al., "Group Sparse Coding", retrieved on Jul. 7, 2010 at >, MIT Press, Advances in Neural Information Processing Systems (NIPS), 2009, pp. 1-8.
Berchtold, et al., "Fast Parallel Similarity Search in Multimedia Databases", retrieved on Jul. 30, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.72.6847&rep=repl&type=pdf>>, ACM, Proceedings of International Conference on Management of Data, Tucson, Arizona, 1997, pp. 1-12.
Berchtold, et al., "Fast Parallel Similarity Search in Multimedia Databases", retrieved on Jul. 30, 2010 at >, ACM, Proceedings of International Conference on Management of Data, Tucson, Arizona, 1997, pp. 1-12.
Berchtold, et al., "The X-Tree: An Index Structure for High-Dimensional Data", retrieved on Jul. 30, 2010 at <<http://eref.uqu.edu.sa/files/the-x-tree-an-index-structure-for-high.pdf>>, Morgan Kaufmann Publishers, Proceedings of Conference on Very Large Data Bases, Mumbai, India, 1996, pp. 28-39.
Berg, et al., "Shape Matching and Object Recognition using Low Distortion Correspondences", retrieved on Jul. 30, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.128.1762&rep=rep1&type=pdf>>, IEEE Computer Society, Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, 2005, pp. 26-33.
Berg, et al., "Shape Matching and Object Recognition using Low Distortion Correspondences", retrieved on Jul. 30, 2010 at >, IEEE Computer Society, Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, 2005, pp. 26-33.
Berker, et al., "Very-Large Scale Incremental Clustering", retrieved on Jul. 30, 2010 at <<http://www.google.com/search?q=Berker%2C+Very-Large+Scale+Incremental+Clustering&rls=com.microsoft:en-us:IE-SearchBox&ie=UTF-8&oe=UTF-8&sourceid=ie7&rlz=117ADBF>>, Mar. 2007, pp. 1-24.
Can, et al., "Concepts and Effectiveness of the Cover Coefficient Based Clustering Methodology for Text Databases", retrieved on Jul. 30, 2010 at <<http://sc.lib.muohio.edu/bitstream/handle/2374.MIA/246/fulltext.pdf?sequence=1>>, Miami University Libraries, Oxford, Ohio, Technical Report MU-SEAS-CSA-1987-002, Dec. 1987, pp. 1-45.
Can, et al., "Concepts and Effectiveness of the Cover Coefficient Based Clustering Methodology for Text Databases", retrieved on Jul. 30, 2010 at >, Miami University Libraries, Oxford, Ohio, Technical Report MU-SEAS-CSA-1987-002, Dec. 1987, pp. 1-45.
Cui, et al., "Combining Stroke-Based and Selecion-Based Relevance Feedback for Content-Based Image Retrieval", at <<http://portal.acm.org/citation.cfm?id=1291304#abstract>>, ACM, 2007, pp. 329-332.
Cui, et al., "Combining Stroke-Based and Selecion-Based Relevance Feedback for Content-Based Image Retrieval", at >, ACM, 2007, pp. 329-332.
Datar, et al., "Locality-Sensitive Hashing Scheme Based on p-Stable Distributions", retrieved on Jul. 30, 2010 at <<http://www.cs.princeton.edu/courses/archive/spr05/cos598E/bib/p253-datar.pdf>>, ACM, Proceedings of Symposium on Computational Geometry (SCG), Brooklyn, New York, 2004, pp. 253-262.
Datar, et al., "Locality-Sensitive Hashing Scheme Based on p-Stable Distributions", retrieved on Jul. 30, 2010 at >, ACM, Proceedings of Symposium on Computational Geometry (SCG), Brooklyn, New York, 2004, pp. 253-262.
Datta, et al., "Image Retrieval: Ideas, Influences, and Tends of the New Age", at <<http://infolab.stanford.edu/˜wangz/project/imsearch/review/JOUR/datta.pdf>>, ACM, 2006, pp. 65.
Datta, et al., "Image Retrieval: Ideas, Influences, and Tends of the New Age", at >, ACM, 2006, pp. 65.
Extended European Search Report mailed Aug. 11, 2011 for European patent application No. 09755475.2, 9 pages.
Faloutsos, et al., "Efficient and Effective Querying by Image Content", retreived on Jul. 30, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.40.9013&rep=rep1&type=pdf>>, Kluwer Academic Publishers, Hingham, MA, Journal of Intelligent Information Systems, vol. 3, No. 3-4, Jul. 1994, pp. 231-262.
Faloutsos, et al., "Efficient and Effective Querying by Image Content", retreived on Jul. 30, 2010 at >, Kluwer Academic Publishers, Hingham, MA, Journal of Intelligent Information Systems, vol. 3, No. 3-4, Jul. 1994, pp. 231-262.
Fraundorfer, et al., "Evaluation of local detectors on non-planar scenes", retrieved on Jul. 30, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.3077&rep=rep1&type=pdf>>, Proceedings of Workshop of the Austrian Association for Pattern Recognition, 2004, pp. 125-132.
Fraundorfer, et al., "Evaluation of local detectors on non-planar scenes", retrieved on Jul. 30, 2010 at >, Proceedings of Workshop of the Austrian Association for Pattern Recognition, 2004, pp. 125-132.
Friedman, et al., "An Algorithm for Finding Best Matches in Logarithmic Expected Time", retrieved on Jun. 29, 2010 at <<http://delivery.acm.org/10.1145/360000/355745/p209-freidman.pdf?key1=355745&key2=3779787721&coll=GUIDE&d1=GUIDE&CFID=93370504&CFTOKEN=86954411>>, ACM Transactions on Mathematical Software, vol. 3, No. 3, Sep. 1977, pp. 209-226.
Furao, et al., "A Self-controlled Incremental Method for Vector Quantization", retrieved on Jul. 3, 2010 at <<http://www.isl.titech.ac.jp/˜hasegawalab/papers/shen—prmu—sept—2004.pdf>>, Japan Science and Technology Agency, Journal: IEIC Technical Report, Institute of Electronics, Information and Communication Engineers, vol. 104, No. 290, 2004, pp. 93-100.
Furao, et al., "A Self-controlled Incremental Method for Vector Quantization", retrieved on Jul. 3, 2010 at >, Japan Science and Technology Agency, Journal: IEIC Technical Report, Institute of Electronics, Information and Communication Engineers, vol. 104, No. 290, 2004, pp. 93-100.
Gevers et al., "The PicToSeek WWW Image Search System," Proceedings of the IEEE International Conference on Multimedia Computing and Systems, vol. 1, Jun. 7, 1999, Florence, Italy, pp. 264-269.
Grauman, et al., "The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features", retrieved on Jul. 30, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.253&rep=rep1&type=pdf>>, IEEE, Proceedings of International Conference on Computer Vision (ICCV), Beijing, China, Oct. 2005, pp. 1458-1465.
Grauman, et al., "The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features", retrieved on Jul. 30, 2010 at >, IEEE, Proceedings of International Conference on Computer Vision (ICCV), Beijing, China, Oct. 2005, pp. 1458-1465.
Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching", retrieved on Jul. 30, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.131.7887&rep=rep1&type=pdf>>, ACM, 1984, pp. 47-57.
Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching", retrieved on Jul. 30, 2010 at >, ACM, 1984, pp. 47-57.
He, et al., "An Investigation of Using K-d Tree to Improve Image Retrieval Efficiency", retrieved on Jul. 30, 2010 at <<http://www.aprs.org.au/dicta2002/dicta2002—proceedings/He128.pdf>>, Digital Image Computing Techniques and Application (DICTA), Melbourne, Australia, Jan. 2002, pp. 1-6.
He, et al., "An Investigation of Using K-d Tree to Improve Image Retrieval Efficiency", retrieved on Jul. 30, 2010 at >, Digital Image Computing Techniques and Application (DICTA), Melbourne, Australia, Jan. 2002, pp. 1-6.
He, et al., "Learning and Inferring a Semantic Space from User's Relevance Feedback for Image Retrieval", retrieved on Jul. 30, 2010 at <<http://research.microsoft.com/pubs/69949/tr-2002-62.doc>>, ACM, Proceedings of International Multimedia Conference, Juan-les-Pins, France, 2002, pp. 343-346.
He, et al., "Learning and Inferring a Semantic Space from User's Relevance Feedback for Image Retrieval", retrieved on Jul. 30, 2010 at >, ACM, Proceedings of International Multimedia Conference, Juan-les-Pins, France, 2002, pp. 343-346.
Henrich, et al., "The LSD Tree: Spatial Access to Multidimensional Point and Non-Point Objects", retrieved on Jul. 30, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.68.9784&rep=rep1&type=pdf>>, Proceedings of the International Conference on Very large Data Bases, Amsterdam, 1989, pp. 45-54.
Henrich, et al., "The LSD Tree: Spatial Access to Multidimensional Point and Non-Point Objects", retrieved on Jul. 30, 2010 at >, Proceedings of the International Conference on Very large Data Bases, Amsterdam, 1989, pp. 45-54.
Indyk, et al., "Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality", retrieved on Jul. 30, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.249&rep=rep1&type=pdf>>, ACM, Proceedings of Symposium on Theory of Computing, Dallas, TX, Apr. 1998, pp. 604-613.
Indyk, et al., "Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality", retrieved on Jul. 30, 2010 at >, ACM, Proceedings of Symposium on Theory of Computing, Dallas, TX, Apr. 1998, pp. 604-613.
Javidi, et al., "A Semantic Feedback Framework for Image Retrieval", retrieved on Jul. 30, 2010 at http://www.ijcee.org/papers/171.pdf>>, International Journal of Computer and Electrical Engineering, vol. 2, No. 3, Jun. 2010, pp. 417-425.
Jeong, et al., "Automatic Exaction of Semantic Relatonships from Images Using Ontologies and SVM Classifiers", Proceedings of the Korean Information Science Society Conference, vol. 34, No. 1(c), Jun. 2006, pp. 13-18.
Katayama, et al., "The SR-Tree: An Index Structure for High-Dimensional Nearest Neighbor Queries", retrieved on Jul. 30, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.64.4381&rep=rep1&type=pdf>>, ACM, Proceedings of Internaltional Conference on Management of Data, Tucson, Arizona, 1997, pp. 369-380.
Katayama, et al., "The SR-Tree: An Index Structure for High-Dimensional Nearest Neighbor Queries", retrieved on Jul. 30, 2010 at >, ACM, Proceedings of Internaltional Conference on Management of Data, Tucson, Arizona, 1997, pp. 369-380.
Kim, et al., "A Hierarchical Grid Feature Representation Framework for Automatic Image Annotation", retrieved on Jul. 7, 2010 at <<http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4959786>>, IEEE Computer Society, Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2009, pp. 1125-1128.
Kim, et al., "A Hierarchical Grid Feature Representation Framework for Automatic Image Annotation", retrieved on Jul. 7, 2010 at >, IEEE Computer Society, Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2009, pp. 1125-1128.
Kushki, et al., "Query Feedback for Interactve Image Rerieval", at <<http://ieeexplore.ieee.org/xpl/freeabs—all.jsp?arnumber=1294956>>, IEEE, vol. 14, No. 15, May 2004, pp. 644-655.
Kushki, et al., "Query Feedback for Interactve Image Rerieval", at >, IEEE, vol. 14, No. 15, May 2004, pp. 644-655.
Lepetit, et al., "Randomized Trees for Real-Time Keypoint Recognition", retrieved on Jul. 30, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.4902&rep=rep1&type=pdf>>, IEEE Computer Society, Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2005, pp. 775-781.
Lepetit, et al., "Randomized Trees for Real-Time Keypoint Recognition", retrieved on Jul. 30, 2010 at >, IEEE Computer Society, Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2005, pp. 775-781.
Li, et al., "An Adaptive Relevance Feedback Image Retrieval Method with Based on Possibilistic Clustering Algorithm", retrieved on Jul. 30, 2010 at <<http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04021676>>, IEEE Computer Society, Proceedings of International Conference on Intelligent Systems Design and Applications (ISDA), 2006, pp. 295-299.
Li, et al., "An Adaptive Relevance Feedback Image Retrieval Method with Based on Possibilistic Clustering Algorithm", retrieved on Jul. 30, 2010 at >, IEEE Computer Society, Proceedings of International Conference on Intelligent Systems Design and Applications (ISDA), 2006, pp. 295-299.
Likas, et al., "The global k-means clustering algorthm", retrieved on Jul. 30, 2010 at <<http://www.cs.uoi.gr/˜arly/papers/PR2003.pdf>>, Elsevier Science Ltd., Pattern Recognition, vol. 36, 2003, pp. 451-461.
Likas, et al., "The global k-means clustering algorthm", retrieved on Jul. 30, 2010 at >, Elsevier Science Ltd., Pattern Recognition, vol. 36, 2003, pp. 451-461.
Linde, et al., "An Algorithm for Vector Quantizer Design", retrieved on Jul. 30, 2010 at <<http://148.204.64.201/paginas%20anexas/voz/articulos%20interesantes/reconocimiento%20de%20voz/cuantificacion%20vectorial/An%20algorithm%20for%20Vector%20Quantizer%20Design.pdf>>, IEEE Transactions on Communications, vol. COM-28, No. 1, Jan. 1980, pp. 84-95.
Lindeberg, et al., "Shape-Adapted Smoothing in Estimation of 3-D Depth Cues from Affine Distortions of Local 2-D Brightness Structure", retrieved on Jul. 30, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.55.5090&rep=rep1&type=pdf>>, Springer-Verlag New York, Proceedings of European Conference on Computer Vision, Stockholm, Sweden, vol. 1, 1994, pp. 389-400.
Lindeberg, et al., "Shape-Adapted Smoothing in Estimation of 3-D Depth Cues from Affine Distortions of Local 2-D Brightness Structure", retrieved on Jul. 30, 2010 at >, Springer-Verlag New York, Proceedings of European Conference on Computer Vision, Stockholm, Sweden, vol. 1, 1994, pp. 389-400.
Lloyd, "Least Squares Quantization in PCM", retreived on Jul. 30, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=8A3C215650DD1680BE51B35B421D21D7?doi=10.1.1.131.1338&rep=rep1&type=pdf>>, IEEE Transactions on Information Theory, vol. IT-28, No. 2, Mar. 1982, pp. 129-137.
Lowe, "Distinctive Image Features from Scale-Invariant Keypoints", retrieved on Jul. 30, 2010 at <<http://citeseerx.ist.psu. edu/viewdoc/download?doi=10.1.1.157.3843&rep=rep1&type=pdf>>, Kluwer Academic Publishers, Hingham, MA, vol. 60, No. 2, International Journal of Computer Vision, 2004, pp. 91-110.
Lowe, "Distinctive Image Features from Scale-Invariant Keypoints", retrieved on Jul. 30, 2010 at >, Kluwer Academic Publishers, Hingham, MA, vol. 60, No. 2, International Journal of Computer Vision, 2004, pp. 91-110.
Magalhaes, et al., "High-Dimensional Visual Vocabularies for Image Retrieval", retrieved on Jul. 7, 2010 at <<http:// citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.7027&rep=rep1&type=pdf>>, ACM, Proceedings of Conference on Research and Development in Information Retrieval, Amsterdam, NL, Jul. 27, 2007, pp. 815-816.
Magalhaes, et al., "High-Dimensional Visual Vocabularies for Image Retrieval", retrieved on Jul. 7, 2010 at >, ACM, Proceedings of Conference on Research and Development in Information Retrieval, Amsterdam, NL, Jul. 27, 2007, pp. 815-816.
Mairal, et al., "Online Dictionary Learning for Sparse Coding", retrieved on Jul. 7, 2010 at <<http://www.di.ens.fr/˜fbach/mairal—icm109.pdf>>, ACM, Proceedings of International Conference on Machine Learning, Montreal, CA, vol. 382, 2009, pp. 1-8.
Mairal, et al., "Online Dictionary Learning for Sparse Coding", retrieved on Jul. 7, 2010 at >, ACM, Proceedings of International Conference on Machine Learning, Montreal, CA, vol. 382, 2009, pp. 1-8.
Matas, et al., "Robust Wide Baseline Stereo from Maximally Stable Extremal Regions", retrieved on Jul. 30, 2010 at <<http://cmp.felk.cvut.cz/˜matas/papers/matas-bmvc02.pdf>>, Proceedings of British Machine Vision Conference (BMVC), Cardiff, UK, 2002, pp. 384-393.
Matas, et al., "Robust Wide Baseline Stereo from Maximally Stable Extremal Regions", retrieved on Jul. 30, 2010 at >, Proceedings of British Machine Vision Conference (BMVC), Cardiff, UK, 2002, pp. 384-393.
Max, "Quantizing for Minimum Distortion", retrieved on Jul. 30, 2010 at <<http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1057548&userType=inst>>, IEEE Transactions on Information Theory, vol. 6, No. 3, Mar. 1960, pp. 7-12.
Max, "Quantizing for Minimum Distortion", retrieved on Jul. 30, 2010 at >, IEEE Transactions on Information Theory, vol. 6, No. 3, Mar. 1960, pp. 7-12.
Mehrotra, et al., "Feature-Based Retrieval of Similar Shapes", retrieved on Jul. 30, 2010 at http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=344072>>, IEEE Computer Society, Proceedings of International Conference on Data Engineering, 1993, pp. 108-115.
Mikolajczyk, et al., "A Comparison of Affine Region Detectors", retrieved on Jul. 30, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.595&rep=repl&type=pdf>>, Kluwer Academic Publishers Hingham, MA, International Journal of Computer Vision, vol. 65, No. 1-2, 2005, pp. 43-72.
Mikolajczyk, et al., "A Comparison of Affine Region Detectors", retrieved on Jul. 30, 2010 at >, Kluwer Academic Publishers Hingham, MA, International Journal of Computer Vision, vol. 65, No. 1-2, 2005, pp. 43-72.
Mikolajczyk, et al., "A performance evaluation of local descriptors", retrieved on Jul. 30, 2010 at <<http://www.ai.mit.edu/courses/6.891/handouts/mikolajczyk—cvpr2003.pdf>>, IEEE Computer Society, Transactions on Pattern Analysis and Machine Intelligence, vol. 27, No. 10, 2005, pp. 1615-1630.
Mikolajczyk, et al., "A performance evaluation of local descriptors", retrieved on Jul. 30, 2010 at >, IEEE Computer Society, Transactions on Pattern Analysis and Machine Intelligence, vol. 27, No. 10, 2005, pp. 1615-1630.
Mikolajczyk, et al., "Scale and Affine Invariant Interest Point Detectors", retrieved on Jul. 30, 2010 at <<http://www.robots.ox.ac.uk/~vgg/research/affine/det-eval-files/mikolajczyk-ijcv2004.pdf>>, Kluwer Academic Publishers, The Netherlands, International Journal of Computer Vision, vol. 60, No. 1, 2004, pp. 63-86.
Mikolajczyk, et al., "Scale and Affine Invariant Interest Point Detectors", retrieved on Jul. 30, 2010 at <<http://www.robots.ox.ac.uk/˜vgg/research/affine/det—eval—files/mikolajczyk—ijcv2004.pdf>>, Kluwer Academic Publishers, The Netherlands, International Journal of Computer Vision, vol. 60, No. 1, 2004, pp. 63-86.
Murthy, et al., "Content Based Image Retrieval using Hierarchical and K-Means Clustering Techniques", retrieved on Jul. 30, 2010 at <<http://www.ijest.info/docs/IJEST10-02-03-13.pdf>>, International Journal of Engineering Science and Technology, vol. 2, No. 3, 2010, pp. 209-212.
Murthy, et al., "Content Based Image Retrieval using Hierarchical and K-Means Clustering Techniques", retrieved on Jul. 30, 2010 at >, International Journal of Engineering Science and Technology, vol. 2, No. 3, 2010, pp. 209-212.
Nister, et al., "Scalable Recognition with a Vocabulary Tree", retrieved on Jul. 7, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.61.9520&rep=rep1&type=pdf>>, IEEE Computer Society, Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2006, pp. 2161-2168.
Nister, et al., "Scalable Recognition with a Vocabulary Tree", retrieved on Jul. 7, 2010 at >, IEEE Computer Society, Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2006, pp. 2161-2168.
Obdrzalek, et al., "Sub-linear Indexing for Large Scale Object Recognition", retrieved on Jul. 30, 2010 at <<http://cmp.felk.cvut.cz/˜matas/papers/obdrzalek-tree-bmvc05.pdf>>, Proceedings of the British Machine Vision Conference (BMVC), London, UK, vol. 1, Sep. 2005, pp. 1-10.
Obdrzalek, et al., "Sub-linear Indexing for Large Scale Object Recognition", retrieved on Jul. 30, 2010 at >, Proceedings of the British Machine Vision Conference (BMVC), London, UK, vol. 1, Sep. 2005, pp. 1-10.
Office Action for U.S. Appl. No. 12/938,310, mailed on Apr. 11, 2012, Linjun Yang, "Adaptive Image Retrieval Database," 12 pages.
Office action for U.S. Appl. No. 12/938,310, mailed on Oct. 25, 2012, Yang et al., "Adaptive Image Retrieval Database", 11 pages.
Patane, et al., "The Enhanced LBG Algorithm", retrieved on Jul. 30, 2010 at <<http://www.google.com.sg/url?sa=t&source=web&cd=1&ved=0CBcQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload% 3Fdoi%3D10.1.1.74.1995%26rep%3Drep1%26type%3Dpdf&rct=j&q=The%20enhanced%20LBG%20algorithm&ei=QXpSTOyFEoyevQOQ35Qa&usg=AFQjCNGkfxm5Kgm4BalKO42-FpgsDADtyw>>, Pergamon, Neural Networks, vol. 14, 2001, pp. 1219-1237.
Qi et al., "Image Retrieval Using Transaction-Based and SVM-Based Learning in Relevance Feedback Sessions," Image Analysis and Recognition; (Lecture Notes in Computer Science), Aug. 22, 2007, Heidelbert, Berlin, pp. 638-649.
Qian, et al., "Gaussian Mixture Model for Relevance Feedback in Image Retrieval", at <<http://research.microsoft.com/asia/dload—files/group/mcomputing/2003P/ICME02-qf.pdf>>, In Proceedings of International Conference on Multimedia and Expo (ICME '02), Aug. 2002, pp. 1-4.
Qian, et al., "Gaussian Mixture Model for Relevance Feedback in Image Retrieval", at >, In Proceedings of International Conference on Multimedia and Expo (ICME '02), Aug. 2002, pp. 1-4.
Rothganger, et al., "3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints", retrieved on Jul. 30, 2010 at <<http://www-cvr.ai.uiuc.edu/ponce—grp/publication/paper/ijcv04d.pdf>>, Kluwer Academic Publishers Hingham, MA, International Journal of Computer Vision, vol. 66, No. 3, Mar. 2006, pp. 231-259.
Rothganger, et al., "3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints", retrieved on Jul. 30, 2010 at >, Kluwer Academic Publishers Hingham, MA, International Journal of Computer Vision, vol. 66, No. 3, Mar. 2006, pp. 231-259.
Samet, "The Quadtree and Related Hierarchical Data Structure", retrieved on Jul. 30, 2010 at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.75.5407&rep=rep1&type=pdf>>, ACM, Computing Surveys, vol. 16, No. 2, Jun. 1984, pp. 187-260.
Samet, "The Quadtree and Related Hierarchical Data Structure", retrieved on Jul. 30, 2010 at >, ACM, Computing Surveys, vol. 16, No. 2, Jun. 1984, pp. 187-260.
Sawhney, et al., "Efficient Color Histogram Indexing", retrieved on Jul. 30, 2010 at <<http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=413532>>, IEEE. Proceedings of International Conference on Image Processing (ICIP), Austin, TX, vol. 2, Nov. 1994, pp. 66-70.
Sawhney, et al., "Efficient Color Histogram Indexing", retrieved on Jul. 30, 2010 at >, IEEE. Proceedings of International Conference on Image Processing (ICIP), Austin, TX, vol. 2, Nov. 1994, pp. 66-70.
Schmid, et al., "Evaluation of Interest Point Detectors", retrieved on Jul. 30, 2010 at <<http://cs.grnu.edu/˜zduric/cs774/Papers/Schmid-Evaluation-IJCV.pdf>>, Kluwer Academic Publishers, The Netherlands, International Journal of Computer Vision, vol. 37, No. 2, 2000, pp. 151-172.
Schmid, et al., "Evaluation of Interest Point Detectors", retrieved on Jul. 30, 2010 at >, Kluwer Academic Publishers, The Netherlands, International Journal of Computer Vision, vol. 37, No. 2, 2000, pp. 151-172.
Sclaroff et al., "Unifying Textual and Vsua Cues for Content-Based Image Retrieval on the World Wide Web", at <<http://www.csai.unipa.it/lacascia/papers/cviu99.pdf>>, Academic Press, vol. 75, No. 1/2, Aug. 1999, pp. 86-98.
Sclaroff et al., "Unifying Textual and Vsua Cues for Content-Based Image Retrieval on the World Wide Web", at >, Academic Press, vol. 75, No. 1/2, Aug. 1999, pp. 86-98.
Sellis, et al., "The R+-Tree: A Dynamic Index for Multidimensional Objects", retrieved on Jul. 30, 2010 at <<http://www.vldb.org/conf/1987/P507.PDF>>, Proceedings of the Conference on Very Large Data Bases, Brighton, 1987, pp. 507-518.
Sellis, et al., "The R+-Tree: A Dynamic Index for Multidimensional Objects", retrieved on Jul. 30, 2010 at >, Proceedings of the Conference on Very Large Data Bases, Brighton, 1987, pp. 507-518.
Sivic, et al., "Video Google: A Text Retrieval Approach to Object Matching in Videos", retrieved on Jul. 30, 2010 at <<http://www.robots.ox.ac.uk/˜vgg/publications/papers/sivic03.pdf>>, IEEE Computer Society, Proceedings of International Conference on Computer Vision (ICCV), vol. 2, 2003, pp. 1-8.
Sivic, et al., "Video Google: A Text Retrieval Approach to Object Matching in Videos", retrieved on Jul. 30, 2010 at >, IEEE Computer Society, Proceedings of International Conference on Computer Vision (ICCV), vol. 2, 2003, pp. 1-8.
Sproull, "Refinements to Nearest-Neighbor Searching in k-Dimensional Trees", Springer-Verlag NY, Algorithmica, vol. 6, 1991, pp. 579-589.
Torres et al., "Semantic Image Retrieval Using Region-Base Relevance Feedback," Adaptive Multimedia Retrieval: User, Context, and Feedback (Lecture Notes in Computer Science; LNCS), Heidelberg, Berlin, 2007, pp. 192-206.
Tuytelaars, et al., "Matching Widely Separated Views Based on Affine Invariant Regions", retrieved on Jul. 30, 2010 at <<http://www.vis.uky.edu/˜dnister/Teaching/CS684Fall2005/tuytelaars—ijcv2004.pdf>>, Kluwer Academic Publishers, The Netherlands, International Journal of Computer Vision, vol. 59, No. 1, 2004, pp. 61-85.
Tuytelaars, et al., "Matching Widely Separated Views Based on Affine Invariant Regions", retrieved on Jul. 30, 2010 at >, Kluwer Academic Publishers, The Netherlands, International Journal of Computer Vision, vol. 59, No. 1, 2004, pp. 61-85.
van Rijsbergen, "Information Retrieval", Butterworth-Heinemann, 1979, pp. 1-151.
White, et al., "Similarity Indexing: Algorithms and Performance", retrieved on Jul. 30, 2010 at <<http://citeseerx.ist.psu. edu/viewdoc/download?doi=10.1.1.48.5758&rep=rep1&type=pdf>> Proceedings of Conference on Storage and Retrieval for Image and Video Databases (SPIE), vol. 2670, San Jose, CA, 1996, pp. 62-75.
White, et al., "Similarity Indexing: Algorithms and Performance", retrieved on Jul. 30, 2010 at > Proceedings of Conference on Storage and Retrieval for Image and Video Databases (SPIE), vol. 2670, San Jose, CA, 1996, pp. 62-75.
Yang et al., "Learning Image Similarities and Categories from Content Analysis and Relevance Feedback," Proceedings ACM Multimedia 2000 Workshops, Marina Del Rey, CA, Nov. 4, 2000, vol. CONF. 8, pp. 175-178.
Yang, et al., "Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification", retrieved on Jul. 7, 2010 at <<http://www.ifp.illinois.edu/˜jyang29/papers/CVPR09-ScSPM.pdf>>, IEEE Computer Society, Conference on Computer Vision and Pattern Recognition (CVPR), 2009, Miami, FLA, pp. 1-8.
Yang, et al., "Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification", retrieved on Jul. 7, 2010 at >, IEEE Computer Society, Conference on Computer Vision and Pattern Recognition (CVPR), 2009, Miami, FLA, pp. 1-8.
Yu, et al., "Adaptive Document Clustering", retrieved on Jul. 30, 2010 at <<http://74.125.155.132/scholar?q=cache: nleqYBgXXhMJ:scholar.google.com/&h1=en&as—sdt=2000>>, ACM, Proceedings of International Conference on Research and Development in Information Retrieval, Montreal, Quebec, 1985, pp. 197-203.
Yu, et al., "Adaptive Document Clustering", retrieved on Jul. 30, 2010 at >, ACM, Proceedings of International Conference on Research and Development in Information Retrieval, Montreal, Quebec, 1985, pp. 197-203.
Zhou., et al., "Unifying Keywords and Visual Contents in Image Retrieval", at <<http://www.ifp.uiuc.edu/˜xzhou2/Research/papers/Selected—papers/IEEE—MM.pdf>>, IEEE, 2002, pp. 11.
Zhou., et al., "Unifying Keywords and Visual Contents in Image Retrieval", at >, IEEE, 2002, pp. 11.

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10007679B2 (en) 2008-08-08 2018-06-26 The Research Foundation For The State University Of New York Enhanced max margin learning on multimodal data mining in a multimedia database
US20130142401A1 (en) * 2011-12-05 2013-06-06 Canon Kabushiki Kaisha Image processing apparatus and image processing method
US9245206B2 (en) * 2011-12-05 2016-01-26 Canon Kabushiki Kaisha Image processing apparatus and image processing method
US20140355880A1 (en) * 2012-03-08 2014-12-04 Empire Technology Development, Llc Image retrieval and authentication using enhanced expectation maximization (eem)
US9158791B2 (en) * 2012-03-08 2015-10-13 New Jersey Institute Of Technology Image retrieval and authentication using enhanced expectation maximization (EEM)
US9870634B2 (en) * 2013-09-25 2018-01-16 Heartflow, Inc. Systems and methods for controlling user repeatability and reproducibility of automated image annotation correction
US20170132826A1 (en) * 2013-09-25 2017-05-11 Heartflow, Inc. Systems and methods for controlling user repeatability and reproducibility of automated image annotation correction
US10546403B2 (en) 2013-09-25 2020-01-28 Heartflow, Inc. System and method for controlling user repeatability and reproducibility of automated image annotation correction
US11742070B2 (en) 2013-09-25 2023-08-29 Heartflow, Inc. System and method for controlling user repeatability and reproducibility of automated image annotation correction
US10319035B2 (en) 2013-10-11 2019-06-11 Ccc Information Services Image capturing and automatic labeling system
US20170364740A1 (en) * 2016-06-17 2017-12-21 International Business Machines Corporation Signal processing
US9928408B2 (en) * 2016-06-17 2018-03-27 International Business Machines Corporation Signal processing
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis

Also Published As

Publication number Publication date
US20120114248A1 (en) 2012-05-10

Similar Documents

Publication Publication Date Title
US8463045B2 (en) Hierarchical sparse representation for image retrieval
US9201903B2 (en) Query by image
JP5749279B2 (en) Join embedding for item association
US7693865B2 (en) Techniques for navigational query identification
USRE47340E1 (en) Image retrieval apparatus
Zhang et al. Query specific rank fusion for image retrieval
US20180276250A1 (en) Distributed Image Search
US9460122B2 (en) Long-query retrieval
US20130110829A1 (en) Method and Apparatus of Ranking Search Results, and Search Method and Apparatus
US9317533B2 (en) Adaptive image retrieval database
US11138479B2 (en) Method for valuation of image dark data based on similarity hashing
US20120310864A1 (en) Adaptive Batch Mode Active Learning for Evolving a Classifier
WO2013129580A1 (en) Approximate nearest neighbor search device, approximate nearest neighbor search method, and program
Lin et al. Association rule mining with a correlation-based interestingness measure for video semantic concept detection
US11755671B2 (en) Projecting queries into a content item embedding space
Chang et al. Semantic clusters based manifold ranking for image retrieval
Tang et al. Remote sensing image retrieval based on semi-supervised deep hashing learning
CN109902129A (en) Insurance agent&#39;s classifying method and relevant device based on big data analysis
CN117349512B (en) User tag classification method and system based on big data
Kozhushko et al. Using hierarchical temporal memory for document ranking system identification
CN115344734A (en) Image retrieval method, image retrieval device, electronic equipment and computer-readable storage medium
Mali et al. Image Retrieval using Hash Code and Relevance Feedback Technique
Saad et al. Mining visual web knowledge utilizing multiple classifier architecture
Mirza-Mohammadi et al. Ranking error-correcting output codes for class retrieval
Hosseini Optimizing the Construction of Information Retrieval Test Collections

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YANG, LINJUN;TIAN, QI;NI, BINGBING;SIGNING DATES FROM 20101009 TO 20101010;REEL/FRAME:025352/0530

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034544/0001

Effective date: 20141014

FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8