WO2000007117A3 - An index to a semi-structured database - Google Patents

An index to a semi-structured database Download PDF

Info

Publication number
WO2000007117A3
WO2000007117A3 PCT/GB1999/002517 GB9902517W WO0007117A3 WO 2000007117 A3 WO2000007117 A3 WO 2000007117A3 GB 9902517 W GB9902517 W GB 9902517W WO 0007117 A3 WO0007117 A3 WO 0007117A3
Authority
WO
WIPO (PCT)
Prior art keywords
semi
entries
index
structured
structured database
Prior art date
Application number
PCT/GB1999/002517
Other languages
French (fr)
Other versions
WO2000007117A2 (en
Inventor
Samuel William Dyne Steel
Udo Kruschwitz
Nicholas John Webb
Roeck Anne Nellie De
Paul David Scott
Raymond Turner
Kwok Ching Tsui
Wayne Raymond Wobcke
Behnam Azvine
Original Assignee
British Telecomm
Samuel William Dyne Steel
Udo Kruschwitz
Nicholas John Webb
Roeck Anne Nellie De
Paul David Scott
Raymond Turner
Kwok Ching Tsui
Wayne Raymond Wobcke
Behnam Azvine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB9816648.1A external-priority patent/GB9816648D0/en
Application filed by British Telecomm, Samuel William Dyne Steel, Udo Kruschwitz, Nicholas John Webb, Roeck Anne Nellie De, Paul David Scott, Raymond Turner, Kwok Ching Tsui, Wayne Raymond Wobcke, Behnam Azvine filed Critical British Telecomm
Priority to US09/744,393 priority Critical patent/US7409381B1/en
Priority to EP99936836A priority patent/EP1099171B1/en
Priority to DE69933123T priority patent/DE69933123T2/en
Priority to AU51810/99A priority patent/AU5181099A/en
Publication of WO2000007117A2 publication Critical patent/WO2000007117A2/en
Publication of WO2000007117A3 publication Critical patent/WO2000007117A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/83Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/912Applications of a database
    • Y10S707/917Text
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/953Organization of data
    • Y10S707/955Object-oriented
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching

Abstract

The present invention relates to a method of generating an index (2) to a semi-structured database (1). Semi-structured databases contain a number of items, each of which is stored as a set of semi-structured data including a number of related entries. The presence of these entries are determined by comparing the sets of data to a number of selection criteria, defining one or more predetermined characteristics of various entries. A set of indices is then generated representing a concordance between the determined entries and the respective items.
PCT/GB1999/002517 1998-07-30 1999-07-30 An index to a semi-structured database WO2000007117A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US09/744,393 US7409381B1 (en) 1998-07-30 1999-07-30 Index to a semi-structured database
EP99936836A EP1099171B1 (en) 1998-07-30 1999-07-30 Accessing a semi-structured database
DE69933123T DE69933123T2 (en) 1998-07-30 1999-07-30 ACCESS TO A SEMI-STRUCTURED DATABASE
AU51810/99A AU5181099A (en) 1998-07-30 1999-07-30 An index to a semi-structured database

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB9816648.1 1998-07-30
GBGB9816648.1A GB9816648D0 (en) 1998-07-30 1998-07-30 An index to a semi-structured database
EP98306106 1998-07-31
EP98306106.0 1998-07-31

Publications (2)

Publication Number Publication Date
WO2000007117A2 WO2000007117A2 (en) 2000-02-10
WO2000007117A3 true WO2000007117A3 (en) 2000-05-18

Family

ID=26151377

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB1999/002517 WO2000007117A2 (en) 1998-07-30 1999-07-30 An index to a semi-structured database

Country Status (5)

Country Link
US (1) US7409381B1 (en)
EP (1) EP1099171B1 (en)
AU (1) AU5181099A (en)
DE (1) DE69933123T2 (en)
WO (1) WO2000007117A2 (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NO325313B1 (en) * 2003-12-10 2008-03-25 Kurt Arthur Seljeseth Intentional addressing and resource request in computer networks
US7769579B2 (en) 2005-05-31 2010-08-03 Google Inc. Learning facts from semi-structured text
US9208229B2 (en) 2005-03-31 2015-12-08 Google Inc. Anchor text summarization for corroboration
US7587387B2 (en) 2005-03-31 2009-09-08 Google Inc. User interface for facts query engine with snippets from information sources that include query terms and answer terms
US8682913B1 (en) 2005-03-31 2014-03-25 Google Inc. Corroborating facts extracted from multiple sources
US8996470B1 (en) 2005-05-31 2015-03-31 Google Inc. System for ensuring the internal consistency of a fact repository
US7831545B1 (en) 2005-05-31 2010-11-09 Google Inc. Identifying the unifying subject of a set of facts
US8260785B2 (en) 2006-02-17 2012-09-04 Google Inc. Automatic object reference identification and linking in a browseable fact repository
US8122026B1 (en) 2006-10-20 2012-02-21 Google Inc. Finding and disambiguating references to entities on web pages
US8347202B1 (en) 2007-03-14 2013-01-01 Google Inc. Determining geographic locations for place names in a fact repository
US7970766B1 (en) 2007-07-23 2011-06-28 Google Inc. Entity type assignment
US8812435B1 (en) 2007-11-16 2014-08-19 Google Inc. Learning objects and facts from documents
US8219407B1 (en) 2007-12-27 2012-07-10 Great Northern Research, LLC Method for processing the output of a speech recognizer
US20090259670A1 (en) * 2008-04-14 2009-10-15 Inmon William H Apparatus and Method for Conditioning Semi-Structured Text for use as a Structured Data Source
US8473279B2 (en) * 2008-05-30 2013-06-25 Eiman Al-Shammari Lemmatizing, stemming, and query expansion method and system
US10048934B2 (en) * 2015-02-16 2018-08-14 International Business Machines Corporation Learning intended user actions

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0268367A2 (en) * 1986-11-18 1988-05-25 Nortel Networks Corporation A domain-independent natural language database interface
EP0522591A2 (en) * 1991-07-11 1993-01-13 Mitsubishi Denki Kabushiki Kaisha Database retrieval system for responding to natural language queries with corresponding tables
US5377103A (en) * 1992-05-15 1994-12-27 International Business Machines Corporation Constrained natural language interface for a computer that employs a browse function
US5671425A (en) * 1990-07-26 1997-09-23 Nec Corporation System for recognizing sentence patterns and a system recognizing sentence patterns and grammatical cases

Family Cites Families (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5727196A (en) * 1992-05-21 1998-03-10 Borland International, Inc. Optimized query interface for database management systems
US6055531A (en) * 1993-03-24 2000-04-25 Engate Incorporated Down-line transcription system having context sensitive searching capability
US5799268A (en) * 1994-09-28 1998-08-25 Apple Computer, Inc. Method for extracting knowledge from online documentation and creating a glossary, index, help database or the like
US6061675A (en) * 1995-05-31 2000-05-09 Oracle Corporation Methods and apparatus for classifying terminology utilizing a knowledge catalog
US6026388A (en) * 1995-08-16 2000-02-15 Textwise, Llc User interface and other enhancements for natural language information retrieval system and method
US5963940A (en) * 1995-08-16 1999-10-05 Syracuse University Natural language information retrieval system and method
US6076088A (en) * 1996-02-09 2000-06-13 Paik; Woojin Information extraction system and method using concept relation concept (CRC) triples
US5995963A (en) * 1996-06-27 1999-11-30 Fujitsu Limited Apparatus and method of multi-string matching based on sparse state transition list
US6052693A (en) * 1996-07-02 2000-04-18 Harlequin Group Plc System for assembling large databases through information extracted from text sources
US5920854A (en) * 1996-08-14 1999-07-06 Infoseek Corporation Real-time document collection search engine with phrase indexing
US6026410A (en) * 1997-02-10 2000-02-15 Actioneer, Inc. Information organization and collaboration tool for processing notes and action requests in computer systems
US5987447A (en) * 1997-05-20 1999-11-16 Inventec Corporation Method and apparatus for searching sentences by analyzing words
US6038560A (en) * 1997-05-21 2000-03-14 Oracle Corporation Concept knowledge base search and retrieval system
US5937408A (en) * 1997-05-29 1999-08-10 Oracle Corporation Method, article of manufacture, and apparatus for generating a multi-dimensional record structure foundation
US5991758A (en) * 1997-06-06 1999-11-23 Madison Information Technologies, Inc. System and method for indexing information about entities from different information sources
US6081774A (en) * 1997-08-22 2000-06-27 Novell, Inc. Natural language information retrieval system and method
US5983216A (en) * 1997-09-12 1999-11-09 Infoseek Corporation Performing automated document collection and selection by providing a meta-index with meta-index values indentifying corresponding document collections
US6026398A (en) * 1997-10-16 2000-02-15 Imarket, Incorporated System and methods for searching and matching databases
US6182066B1 (en) * 1997-11-26 2001-01-30 International Business Machines Corp. Category processing of query topics and electronic document content topics
US6298343B1 (en) * 1997-12-29 2001-10-02 Inventec Corporation Methods for intelligent universal database search engines
ID29592A (en) * 1998-04-01 2001-09-06 William Peterman SYSTEM AND METHOD OF ELECTRONIC DOCUMENT SEARCHING MADE WITH INTRODUCTION OF OPTICAL CHARACTERS
US6216123B1 (en) * 1998-06-24 2001-04-10 Novell, Inc. Method and system for rapid retrieval in a full text indexing system
US6470333B1 (en) * 1998-07-24 2002-10-22 Jarg Corporation Knowledge extraction system and method
US6363377B1 (en) * 1998-07-30 2002-03-26 Sarnoff Corporation Search data processor
US6243713B1 (en) * 1998-08-24 2001-06-05 Excalibur Technologies Corp. Multimedia document retrieval by application of multimedia queries to a unified index of multimedia data for a plurality of multimedia data types
US6487546B1 (en) * 1998-08-27 2002-11-26 Oracle Corporation Apparatus and method for aggregate indexes
US6519597B1 (en) * 1998-10-08 2003-02-11 International Business Machines Corporation Method and apparatus for indexing structured documents with rich data types
US6584459B1 (en) * 1998-10-08 2003-06-24 International Business Machines Corporation Database extender for storing, querying, and retrieving structured documents
US6453312B1 (en) * 1998-10-14 2002-09-17 Unisys Corporation System and method for developing a selectably-expandable concept-based search
US6460029B1 (en) * 1998-12-23 2002-10-01 Microsoft Corporation System for improving search text
US6513031B1 (en) * 1998-12-23 2003-01-28 Microsoft Corporation System for improving search area selection
JP3022539B1 (en) * 1999-01-07 2000-03-21 富士ゼロックス株式会社 Document search device
US6457014B1 (en) * 1999-03-26 2002-09-24 Computer Associates Think, Inc. System and method for extracting index key data fields
US6374241B1 (en) * 1999-03-31 2002-04-16 Verizon Laboratories Inc. Data merging techniques
US6421662B1 (en) * 1999-06-04 2002-07-16 Oracle Corporation Generating and implementing indexes based on criteria set forth in queries
US6353825B1 (en) * 1999-07-30 2002-03-05 Verizon Laboratories Inc. Method and device for classification using iterative information retrieval techniques

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0268367A2 (en) * 1986-11-18 1988-05-25 Nortel Networks Corporation A domain-independent natural language database interface
US5671425A (en) * 1990-07-26 1997-09-23 Nec Corporation System for recognizing sentence patterns and a system recognizing sentence patterns and grammatical cases
EP0522591A2 (en) * 1991-07-11 1993-01-13 Mitsubishi Denki Kabushiki Kaisha Database retrieval system for responding to natural language queries with corresponding tables
US5377103A (en) * 1992-05-15 1994-12-27 International Business Machines Corporation Constrained natural language interface for a computer that employs a browse function

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
AIRI SALMINEN ET AL: "FROM TEXT TO HYPERTEXT BY INDEXING", ACM TRANSACTIONS ON INFORMATION SYSTEMS, vol. 13, no. 1, January 1995 (1995-01-01), pages 69 - 99, XP000501182 *
FRANK SHOU-CHENG TSENG ET AL: "EXTENDING THE E-R CONCEPTS TO CAPTURE NATURAL LANGUAGE SEMANTICS FOR DATABASE ACCESS", PROCEEDINGS OF THE INTERNATIONAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE. ( COMPSAC ),US,LOS ALAMITOS, IEEE. COMP. SOC. PRESS, vol. CONF. 15, 1991, pages 30 - 35, XP000260517 *
HAMMER J ET AL: "Extracting semistructured information from the Web", PROCEEDINGS OF THE WORKSHOP ON MANAGEMENT OF SEMI-STRUCTURED DATA, PROCEEDINGS OF WORKSHOP ON MANAGEMENT OF SEMI-STRUCTURED DATA, TUCSON, AZ, USA, 16 MAY 1997, 1997, Murray Hill, NJ, USA, AT & T Labs - Research, USA, pages 18 - 25, XP002099172 *
HUFFMAN S ET AL: "Notes Explorer: entity-based retrieval in shared, semi-structured information spaces", PROCEEDINGS OF THE 1996 ACM CIKM. INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, PROCEEDINGS OF 5TH INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, ROCKVILLE, MD, USA, 12-16 NOV. 1996, ISBN 0-89791-873-8, 1996, New York, NY, USA, ACM, USA, pages 99 - 106, XP002088420 *
KOPEC G E ET AL: "Document image decoding using Markov source models", CHARACTER RECOGNITION TECHNOLOGIES, SAN JOSE, CA, USA, 1-2 FEB. 1993, vol. 1906, ISSN 0277-786X, Proceedings of the SPIE - The International Society for Optical Engineering, 1993, USA, pages 134 - 145, XP002088421 *
LUNIEWSKI, A. ET AL: "RUFUS: Managing Semi-structured information", ALMADEN COMPUTER SCIENCE SHOW AND TELL, ALMADEN RESEARCH CENTER, SAN JOSE, CALIFORNIA, USA, 4 October 1994 (1994-10-04), http://www.almaden.ibm.com/cs/showtell/rufus/overview.html, XP002088419 *
MCHUGH J ET AL: "LORE: A DATABASE MANAGEMENT SYSTEM FOR SEMISTRUCTURED DATA", SIGMOD RECORD,US,ASSOCIATION FOR COMPUTING MACHINERY, NEW YORK, vol. 26, no. 3, 1 September 1997 (1997-09-01), pages 54 - 66, XP000701384 *

Also Published As

Publication number Publication date
AU5181099A (en) 2000-02-21
DE69933123T2 (en) 2007-03-15
WO2000007117A2 (en) 2000-02-10
EP1099171A2 (en) 2001-05-16
DE69933123D1 (en) 2006-10-19
EP1099171B1 (en) 2006-09-06
US7409381B1 (en) 2008-08-05

Similar Documents

Publication Publication Date Title
WO2000007117A3 (en) An index to a semi-structured database
Derks et al. A Shapley value for games with restricted coalitions
WO2004013772A3 (en) System and method for indexing non-textual data
CA2092629A1 (en) Database searching system and method using a two dimensional marking matrix
CA2302264A1 (en) Methods and/or systems for selecting data sets
EA200300522A1 (en) DATABASE
EP0955592A3 (en) A system and method for querying a music database
CA2246949A1 (en) Method and means for encoding storing and retrieving hierarchical data processing information for a computer system
IL140906A0 (en) System and method for selectively defining access to application features
EP0961211A3 (en) Database method and apparatus using hierarchical bit vector index structure
EP1221693A3 (en) Prosody template matching for text-to-speech systems
GB2377300A (en) Temporal updates of relevancy rating of retrieved information in an information search system
SE0004043D0 (en) Method and apparatus for document indexing and searching
EP0860786A3 (en) System and method for hierarchically grouping and ranking a set of objects in a query context
CA2240155A1 (en) Specifying indexes for relational databases
CA2323650A1 (en) A database useful for configuring and/or optimizing a system and a method for generating the database
WO2004042604A3 (en) Intelligent data management system and method
WO2001069527A3 (en) Trainable, extensible, automated data-to-knowledge translator
EP0775963A3 (en) Indexing a database by finite-state transducer
WO2001088656A3 (en) Apparatus and method for performing transformation-based indexing of high-dimensional data
WO2003038671A3 (en) Adaptive web pages
Coene et al. A financially Balanced Bonus/Malus System
EP0994409A3 (en) Index tabs
WO2002033571A3 (en) Method of operating a plurality of electronic databases
CA2170452A1 (en) Fuzzy Thesaurus Generator

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AU CA NZ SG US

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AU CA NZ SG US

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

WWE Wipo information: entry into national phase

Ref document number: 1999936836

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 09744393

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 1999936836

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 1999936836

Country of ref document: EP