CA2310321A1 - Method and system for determining approximate hamming distance and approximate nearest neighbors in an electronic storage device - Google Patents
Method and system for determining approximate hamming distance and approximate nearest neighbors in an electronic storage device Download PDFInfo
- Publication number
- CA2310321A1 CA2310321A1 CA002310321A CA2310321A CA2310321A1 CA 2310321 A1 CA2310321 A1 CA 2310321A1 CA 002310321 A CA002310321 A CA 002310321A CA 2310321 A CA2310321 A CA 2310321A CA 2310321 A1 CA2310321 A1 CA 2310321A1
- Authority
- CA
- Canada
- Prior art keywords
- trace
- database
- hamming distance
- approximate
- query
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99935—Query augmenting and refining, e.g. inexact access
Abstract
A method and system identify in a database one or more data entries that are the nearest neighbors of a query. The database prebuilds a first set of strings by probabilistically selecting values of respective bits in each of the first set of strings based on a probability that depends on a first hamming distance. Based on the first set of strings, the database predetermines the trace values of each data entry in the database, respectively, and stores the predetermined trace values as entries in a trace table. For each trace value entry, the database identifies the data entries whose trace values are within a second hamming distance of the trace value entry, and stores the addresses of the identified data entries in the trace value entry. When the database receives a query, by identifying the trace value entry in the trace table that match the tract value of the query, the database identifies the data entries that are within the first hamming distance of the query. In addition, a method and system estimate the hamming distance between two strings in a network.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US6693697P | 1997-11-17 | 1997-11-17 | |
US60/066,936 | 1997-11-17 | ||
PCT/US1998/024452 WO1999026235A2 (en) | 1997-11-17 | 1998-11-17 | Method and system for determining approximate hamming distance and approximate nearest neighbors in an electronic storage device |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2310321A1 true CA2310321A1 (en) | 1999-05-27 |
CA2310321C CA2310321C (en) | 2004-11-16 |
Family
ID=22072682
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002310321A Expired - Fee Related CA2310321C (en) | 1997-11-17 | 1998-11-17 | Method and system for determining approximate hamming distance and approximate nearest neighbors in an electronic storage device |
Country Status (3)
Country | Link |
---|---|
US (1) | US6226640B1 (en) |
CA (1) | CA2310321C (en) |
WO (1) | WO1999026235A2 (en) |
Families Citing this family (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6446068B1 (en) * | 1999-11-15 | 2002-09-03 | Chris Alan Kortge | System and method of finding near neighbors in large metric space databases |
US7305380B1 (en) | 1999-12-15 | 2007-12-04 | Google Inc. | Systems and methods for performing in-context searching |
US7318053B1 (en) * | 2000-02-25 | 2008-01-08 | International Business Machines Corporation | Indexing system and method for nearest neighbor searches in high dimensional data spaces |
US7227511B2 (en) * | 2000-04-24 | 2007-06-05 | Microsoft Corporation | Method for activating an application in context on a remote input/output device |
US7030837B1 (en) * | 2000-04-24 | 2006-04-18 | Microsoft Corporation | Auxiliary display unit for a computer system |
US6917373B2 (en) * | 2000-12-28 | 2005-07-12 | Microsoft Corporation | Context sensitive labels for an electronic device |
US6611837B2 (en) * | 2000-06-05 | 2003-08-26 | International Business Machines Corporation | System and method for managing hierarchical objects |
US6931393B1 (en) * | 2000-06-05 | 2005-08-16 | International Business Machines Corporation | System and method for enabling statistical matching |
US7010606B1 (en) | 2000-06-05 | 2006-03-07 | International Business Machines Corporation | System and method for caching a network connection |
US6745189B2 (en) * | 2000-06-05 | 2004-06-01 | International Business Machines Corporation | System and method for enabling multi-indexing of objects |
US6963876B2 (en) * | 2000-06-05 | 2005-11-08 | International Business Machines Corporation | System and method for searching extended regular expressions |
US6823328B2 (en) * | 2000-06-05 | 2004-11-23 | International Business Machines Corporation | System and method for enabling unified access to multiple types of data |
US7016917B2 (en) * | 2000-06-05 | 2006-03-21 | International Business Machines Corporation | System and method for storing conceptual information |
US6950855B2 (en) * | 2002-01-18 | 2005-09-27 | International Business Machines Corporation | Master node selection in clustered node configurations |
US6909358B2 (en) * | 2002-03-21 | 2005-06-21 | International Business Machines Corporation | Hamming distance comparison |
US7817820B2 (en) * | 2003-11-25 | 2010-10-19 | Florida State University | Method and system for generating and using digital fingerprints for electronic documents |
WO2006094016A2 (en) | 2005-02-28 | 2006-09-08 | The Regents Of The University Of California | Method for low distortion embedding of edit distance to hamming distance |
US8818980B2 (en) * | 2010-01-12 | 2014-08-26 | Intouchlevel Corporation | Connection engine |
US8543598B2 (en) * | 2010-03-01 | 2013-09-24 | Microsoft Corporation | Semantic object characterization and search |
CN103020321B (en) * | 2013-01-11 | 2015-08-19 | 广东图图搜网络科技有限公司 | Neighbor search method and system |
US10496900B2 (en) * | 2013-05-08 | 2019-12-03 | Seagate Technology Llc | Methods of clustering computational event logs |
JP6187237B2 (en) * | 2013-12-19 | 2017-08-30 | 富士通株式会社 | Document image retrieval apparatus, method, and program |
US10489589B2 (en) | 2016-11-21 | 2019-11-26 | Cylance Inc. | Anomaly based malware detection |
US10496706B2 (en) | 2017-04-17 | 2019-12-03 | International Business Machines Corporation | Matching strings in a large relational database |
US11593412B2 (en) | 2019-07-22 | 2023-02-28 | International Business Machines Corporation | Providing approximate top-k nearest neighbours using an inverted list |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2293741A1 (en) * | 1974-12-04 | 1976-07-02 | Anvar | METHOD AND SYSTEM FOR ITERATIVE AND SIMULTANEOUS RECONCILIATION OF DATA WITH A SET OF REFERENCE DATA |
US4084260A (en) * | 1976-07-12 | 1978-04-11 | Sperry Rand Corporation | Best match content addressable memory |
IL119444A (en) * | 1995-10-20 | 2001-10-31 | Yeda Res & Dev | Private information retrieval |
US5870754A (en) * | 1996-04-25 | 1999-02-09 | Philips Electronics North America Corporation | Video retrieval of MPEG compressed sequences using DC and motion signatures |
US5890151A (en) * | 1997-05-09 | 1999-03-30 | International Business Machines Corporation | Method and system for performing partial-sum queries on a data cube |
-
1998
- 1998-11-17 CA CA002310321A patent/CA2310321C/en not_active Expired - Fee Related
- 1998-11-17 US US09/193,207 patent/US6226640B1/en not_active Expired - Lifetime
- 1998-11-17 WO PCT/US1998/024452 patent/WO1999026235A2/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO1999026235A2 (en) | 1999-05-27 |
CA2310321C (en) | 2004-11-16 |
US6226640B1 (en) | 2001-05-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2310321A1 (en) | Method and system for determining approximate hamming distance and approximate nearest neighbors in an electronic storage device | |
WO1997020419A3 (en) | Method of determining the topology of a network of objects | |
WO2001090926A3 (en) | System and method for determining affinity using objective and subjective data | |
WO2000023863A3 (en) | Determining differences between two or more metadata models | |
EP1552425A4 (en) | A link generation system | |
ATE205029T1 (en) | COMPUTER SORTING SYSTEM FOR DATA COMPRESSION | |
WO2004104729A3 (en) | System and method for query result caching | |
WO2004015524A3 (en) | System, method and computer program product for guaranteeing electronic transactions | |
WO2002067146A3 (en) | Query resolution system | |
WO2003107127A3 (en) | System and method for personalized information retrieval based on user expertise | |
WO2004051555A3 (en) | Method and apparatus for improved information transactions | |
WO2002065102A3 (en) | System and method for grouping reflectance data | |
ATE324001T1 (en) | METHOD AND APPARATUS IN A WIRELESS TRANSCEIVER FOR SEARCHING AND TRANSMITTING INFORMATION AVAILABLE FROM A NETWORK SERVER | |
WO2003017133A3 (en) | System and method for retrieving location based site data | |
EP2211520A3 (en) | Presence and availability tracking | |
CA2049133A1 (en) | Methods and apparatus for implementing data bases to provide object-oriented invocation of applications | |
EP1211845A3 (en) | Method of determining a connection between a data emitting device and a data receiving device | |
CA2248911A1 (en) | System and method for locating resources on a network using resource evaluations derived from electronic messages | |
WO1999036864A3 (en) | Method for finding and retrieving electronic information in a network using interest agents | |
CA2226647A1 (en) | Session cache and rule caching method for a dynamic filter | |
WO2004040475A3 (en) | Improved audio data fingerprint searching | |
DE60037318D1 (en) | METHOD AND DEVICE FOR SELECTION OF MULTIPLE IP DATA TRANSMITTED WITHIN A RADIO CIRCUIT | |
WO1999017242A3 (en) | On-line recruiting system with improved candidate and position profiling | |
DE60320002D1 (en) | DEVICE AND METHOD FOR ACCESSING CONTACT INFORMATION IN A COMMUNICATION DEVICE | |
WO2002052382A3 (en) | Method and system for sharing investor information over an electronic network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKLA | Lapsed |