CA2310321A1 - Method and system for determining approximate hamming distance and approximate nearest neighbors in an electronic storage device - Google Patents

Method and system for determining approximate hamming distance and approximate nearest neighbors in an electronic storage device Download PDF

Info

Publication number
CA2310321A1
CA2310321A1 CA002310321A CA2310321A CA2310321A1 CA 2310321 A1 CA2310321 A1 CA 2310321A1 CA 002310321 A CA002310321 A CA 002310321A CA 2310321 A CA2310321 A CA 2310321A CA 2310321 A1 CA2310321 A1 CA 2310321A1
Authority
CA
Canada
Prior art keywords
trace
database
hamming distance
approximate
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA002310321A
Other languages
French (fr)
Other versions
CA2310321C (en
Inventor
Rafail Ostrovsky
Yuval Rabani
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telcordia Licensing Co LLC
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CA2310321A1 publication Critical patent/CA2310321A1/en
Application granted granted Critical
Publication of CA2310321C publication Critical patent/CA2310321C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99935Query augmenting and refining, e.g. inexact access

Abstract

A method and system identify in a database one or more data entries that are the nearest neighbors of a query. The database prebuilds a first set of strings by probabilistically selecting values of respective bits in each of the first set of strings based on a probability that depends on a first hamming distance. Based on the first set of strings, the database predetermines the trace values of each data entry in the database, respectively, and stores the predetermined trace values as entries in a trace table. For each trace value entry, the database identifies the data entries whose trace values are within a second hamming distance of the trace value entry, and stores the addresses of the identified data entries in the trace value entry. When the database receives a query, by identifying the trace value entry in the trace table that match the tract value of the query, the database identifies the data entries that are within the first hamming distance of the query. In addition, a method and system estimate the hamming distance between two strings in a network.
CA002310321A 1997-11-17 1998-11-17 Method and system for determining approximate hamming distance and approximate nearest neighbors in an electronic storage device Expired - Fee Related CA2310321C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US6693697P 1997-11-17 1997-11-17
US60/066,936 1997-11-17
PCT/US1998/024452 WO1999026235A2 (en) 1997-11-17 1998-11-17 Method and system for determining approximate hamming distance and approximate nearest neighbors in an electronic storage device

Publications (2)

Publication Number Publication Date
CA2310321A1 true CA2310321A1 (en) 1999-05-27
CA2310321C CA2310321C (en) 2004-11-16

Family

ID=22072682

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002310321A Expired - Fee Related CA2310321C (en) 1997-11-17 1998-11-17 Method and system for determining approximate hamming distance and approximate nearest neighbors in an electronic storage device

Country Status (3)

Country Link
US (1) US6226640B1 (en)
CA (1) CA2310321C (en)
WO (1) WO1999026235A2 (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6446068B1 (en) * 1999-11-15 2002-09-03 Chris Alan Kortge System and method of finding near neighbors in large metric space databases
US7305380B1 (en) 1999-12-15 2007-12-04 Google Inc. Systems and methods for performing in-context searching
US7318053B1 (en) * 2000-02-25 2008-01-08 International Business Machines Corporation Indexing system and method for nearest neighbor searches in high dimensional data spaces
US7227511B2 (en) * 2000-04-24 2007-06-05 Microsoft Corporation Method for activating an application in context on a remote input/output device
US7030837B1 (en) * 2000-04-24 2006-04-18 Microsoft Corporation Auxiliary display unit for a computer system
US6917373B2 (en) * 2000-12-28 2005-07-12 Microsoft Corporation Context sensitive labels for an electronic device
US6611837B2 (en) * 2000-06-05 2003-08-26 International Business Machines Corporation System and method for managing hierarchical objects
US6931393B1 (en) * 2000-06-05 2005-08-16 International Business Machines Corporation System and method for enabling statistical matching
US7010606B1 (en) 2000-06-05 2006-03-07 International Business Machines Corporation System and method for caching a network connection
US6745189B2 (en) * 2000-06-05 2004-06-01 International Business Machines Corporation System and method for enabling multi-indexing of objects
US6963876B2 (en) * 2000-06-05 2005-11-08 International Business Machines Corporation System and method for searching extended regular expressions
US6823328B2 (en) * 2000-06-05 2004-11-23 International Business Machines Corporation System and method for enabling unified access to multiple types of data
US7016917B2 (en) * 2000-06-05 2006-03-21 International Business Machines Corporation System and method for storing conceptual information
US6950855B2 (en) * 2002-01-18 2005-09-27 International Business Machines Corporation Master node selection in clustered node configurations
US6909358B2 (en) * 2002-03-21 2005-06-21 International Business Machines Corporation Hamming distance comparison
US7817820B2 (en) * 2003-11-25 2010-10-19 Florida State University Method and system for generating and using digital fingerprints for electronic documents
WO2006094016A2 (en) 2005-02-28 2006-09-08 The Regents Of The University Of California Method for low distortion embedding of edit distance to hamming distance
US8818980B2 (en) * 2010-01-12 2014-08-26 Intouchlevel Corporation Connection engine
US8543598B2 (en) * 2010-03-01 2013-09-24 Microsoft Corporation Semantic object characterization and search
CN103020321B (en) * 2013-01-11 2015-08-19 广东图图搜网络科技有限公司 Neighbor search method and system
US10496900B2 (en) * 2013-05-08 2019-12-03 Seagate Technology Llc Methods of clustering computational event logs
JP6187237B2 (en) * 2013-12-19 2017-08-30 富士通株式会社 Document image retrieval apparatus, method, and program
US10489589B2 (en) 2016-11-21 2019-11-26 Cylance Inc. Anomaly based malware detection
US10496706B2 (en) 2017-04-17 2019-12-03 International Business Machines Corporation Matching strings in a large relational database
US11593412B2 (en) 2019-07-22 2023-02-28 International Business Machines Corporation Providing approximate top-k nearest neighbours using an inverted list

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2293741A1 (en) * 1974-12-04 1976-07-02 Anvar METHOD AND SYSTEM FOR ITERATIVE AND SIMULTANEOUS RECONCILIATION OF DATA WITH A SET OF REFERENCE DATA
US4084260A (en) * 1976-07-12 1978-04-11 Sperry Rand Corporation Best match content addressable memory
IL119444A (en) * 1995-10-20 2001-10-31 Yeda Res & Dev Private information retrieval
US5870754A (en) * 1996-04-25 1999-02-09 Philips Electronics North America Corporation Video retrieval of MPEG compressed sequences using DC and motion signatures
US5890151A (en) * 1997-05-09 1999-03-30 International Business Machines Corporation Method and system for performing partial-sum queries on a data cube

Also Published As

Publication number Publication date
WO1999026235A2 (en) 1999-05-27
CA2310321C (en) 2004-11-16
US6226640B1 (en) 2001-05-01

Similar Documents

Publication Publication Date Title
CA2310321A1 (en) Method and system for determining approximate hamming distance and approximate nearest neighbors in an electronic storage device
WO1997020419A3 (en) Method of determining the topology of a network of objects
WO2001090926A3 (en) System and method for determining affinity using objective and subjective data
WO2000023863A3 (en) Determining differences between two or more metadata models
EP1552425A4 (en) A link generation system
ATE205029T1 (en) COMPUTER SORTING SYSTEM FOR DATA COMPRESSION
WO2004104729A3 (en) System and method for query result caching
WO2004015524A3 (en) System, method and computer program product for guaranteeing electronic transactions
WO2002067146A3 (en) Query resolution system
WO2003107127A3 (en) System and method for personalized information retrieval based on user expertise
WO2004051555A3 (en) Method and apparatus for improved information transactions
WO2002065102A3 (en) System and method for grouping reflectance data
ATE324001T1 (en) METHOD AND APPARATUS IN A WIRELESS TRANSCEIVER FOR SEARCHING AND TRANSMITTING INFORMATION AVAILABLE FROM A NETWORK SERVER
WO2003017133A3 (en) System and method for retrieving location based site data
EP2211520A3 (en) Presence and availability tracking
CA2049133A1 (en) Methods and apparatus for implementing data bases to provide object-oriented invocation of applications
EP1211845A3 (en) Method of determining a connection between a data emitting device and a data receiving device
CA2248911A1 (en) System and method for locating resources on a network using resource evaluations derived from electronic messages
WO1999036864A3 (en) Method for finding and retrieving electronic information in a network using interest agents
CA2226647A1 (en) Session cache and rule caching method for a dynamic filter
WO2004040475A3 (en) Improved audio data fingerprint searching
DE60037318D1 (en) METHOD AND DEVICE FOR SELECTION OF MULTIPLE IP DATA TRANSMITTED WITHIN A RADIO CIRCUIT
WO1999017242A3 (en) On-line recruiting system with improved candidate and position profiling
DE60320002D1 (en) DEVICE AND METHOD FOR ACCESSING CONTACT INFORMATION IN A COMMUNICATION DEVICE
WO2002052382A3 (en) Method and system for sharing investor information over an electronic network

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed