WO2007146107A2 - Methods for enhancing efficiency and cost effectiveness of first pass review of documents - Google Patents
Methods for enhancing efficiency and cost effectiveness of first pass review of documents Download PDFInfo
- Publication number
- WO2007146107A2 WO2007146107A2 PCT/US2007/013483 US2007013483W WO2007146107A2 WO 2007146107 A2 WO2007146107 A2 WO 2007146107A2 US 2007013483 W US2007013483 W US 2007013483W WO 2007146107 A2 WO2007146107 A2 WO 2007146107A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- documents
- search
- subset
- responsive
- collection
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 113
- 238000012552 review Methods 0.000 title description 6
- 230000002708 enhancing effect Effects 0.000 title description 2
- 238000013459 approach Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000003339 best practice Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012553 document review Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000004043 responsiveness Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3338—Query expansion
Definitions
- the present disclosure relates to review of documents, and, more specifically, to techniques for reviewing a collection of documents to identify relevant documents from the collection, efficiently and with a relatively high level of cost effectiveness.
- search engine technology It has been proposed to use search engine technology to make the document review process more manageable.
- quality and completeness of search results from conventional search engine techniques are indeterminable and therefore unreliable. For example, one does not know whether the search engine has indeed found every relevant document, at least not with any certainty.
- the main search engine technique currently used is keyword or free-text search coupled with indexing of terms in the documents.
- a user enters a search query consisting of one or a few words or phrases and the search system returns all of the documents that have been indexed as having one or more those words or phrases in the search query.
- more documents are indexed, more documents are expected to contain the specified search terms.
- such a search technique only marginally reduces the number of documents to be reviewed, and the large quantities of documents returned cannot be usefully examined by the user. There is no guarantee that the desired information is contained by any of the returned documents .
- search queries are typically developed with the object of finding every relevant document regardless of the specific nomenclature used in the document. This necessitates developing lists of synonyms and phrases that encompass every imaginable word usage combination. In practice, the total number of documents returned by these queries is very large.
- This disclosure describes assorted techniques which can be applied in the review of a collection of documents to identify relevant documents from the collection.
- a search of the collection can be run based on query terms, to return a subset of responsive documents.
- a probability of relevancy is determined for a document in the returned subset, and the document is removed from the subset if it does not reach a threshold probability of relevancy.
- a statistical technique can be applied to determine whether remaining documents (that is, not in the responsive documents subset) in the collection meet a predetermined acceptance level.
- documents in a thread of a correspondence (for example, an e-mail) in the responsive documents subset can be added to the responsive documents subset.
- the responsive documents in the responsive documents subset are scanned to automatically identify a correspondence (for example, an e-mail) in the responsive documents subset, additional documents in a thread of the correspondence automatically identified, and the additional documents are added to the responsive documents subset.
- the responsive documents in the responsive documents subset are scanned to automatically determine whether any of the responsive documents include an attachment that is not in the subset, and any such attachment is added to the responsive documents subset
- a predetermined number of documents are randomly selected from a remainder of the collection of documents not in the responsive documents subset, (b) the randomly selected documents are reviewed to determine whether the randomly selected documents include additional relevant documents, (c) if there are additional relevant documents, one or more specific terms in the additional responsive documents that render the documents relevant are identified, the query terms are expanded with the specific terms, and the search is re-run with the expanded query terms.
- Fig. 1 A block diagram of a computer or information terminal on which programs can run to implement the methodologies of this disclosure.
- Fig. 2 A flow chart for a method for reviewing a collection of documents to identify relevant documents from the collection, according to an exemplary embodiment .
- Fig. 3 A flow chart for a method for reviewing a collection of documents to identify relevant documents from the collection, according to another exemplary embodiment .
- Fig. 4 A flow chart for a method for reviewing a collection of documents to identify relevant documents from the collection, according to another exemplary embodiment .
- Fig. 5 A flow chart for a method for reviewing a collection of documents to identify relevant documents from the collection, according to another exemplary embodiment .
- Fig. 6 A flow chart for a method for reviewing a collection of documents to identify relevant documents from the collection, according to another exemplary embodiment .
- Figs. 7A and 7B A flow chart for a workflow of a process including application of some of the techniques discussed herein.
- Computer 10 includes CPU 11, program and data storage 12, hard disk (and controller) 13, removable media drive (and controller) 14, network communications controller 15 (for communications through a wired or wireless network) , display (and controller) 16 and I/O controller 17, all of which are connected through system bus 19.
- a method for reviewing a collection of documents to identify relevant documents from the collection can comprise running a search of the collection of documents based on a plurality of query terms and returning a subset of responsive documents from the collection (step S21) , determining a corresponding probability of relevancy for each document in the responsive documents subset (step S23) and removing from the responsive documents subset, documents that do not reach a threshold probability of relevancy (step S25) .
- the search is preferably applied through a search engine.
- the search can include a concept search, and the concept search is applied through a concept search engine.
- Such searches and other automated steps or actions can be coordinated through appropriate programming, as would be appreciated by one skilled in the art.
- the probability of relevancy of a document can be scaled according to a measure of obscurity of the search terms found in the document .
- the method can further comprise randomly selecting a predetermined number of documents from a remaining subset of the collection of documents not in the responsive documents subset, and determining whether the randomly selected documents include additional relevant documents, and in addition, optionally, identifying one or more specific terms in the additional relevant documents that render the documents relevant, expanding the query terms with the specific terms, and re-running at least the search with the expanded query terms. If the randomly selected documents include one or more additional relevant documents, the query terms can be expanded and the search re-run with the expanded query terms.
- the method can additionally comprise comparing a ratio of the additional relevant documents and the randomly selected documents to a predetermined acceptance level, to determine whether to apply a refined set of query terms.
- the method can further comprise selecting two or more search terms, identifying synonyms of the search terms, and forming the query terms based on the search terms and synonyms.
- the method can further comprise identifying a correspondence between a sender and a recipient, in the responsive documents subset, automatically determining one or more additional documents which are in a thread of the correspondence, the additional documents not being in the responsive documents subset, and adding the additional documents to the responsive documents subset .
- the term "correspondence" is used herein to refer to a written or electronic communication (for example, letter, memo, e-mail, text message, etc.) between a sender and a recipient, and optionally with copies going to one or more copy recipients .
- the method can further comprise determining whether any of the documents in the responsive documents subset includes an attachment that is not in the responsive documents subset, and adding the attachment to the responsive documents subset.
- the method can further comprise applying a statistical technique (for example, zero-defect testing) to determine whether remaining documents not in the responsive documents set meets a predetermined acceptance level .
- a statistical technique for example, zero-defect testing
- the search can include (a) a Boolean search of the collection of documents based on the plurality of query terms, the Boolean search returning a first subset of responsive documents from the collection, and (b) a second search by applying a recall query based on the plurality of query terms to remaining ones of the collection of documents which were not returned by the Boolean search, the second search returning a second subset of responsive documents in the collection, and wherein the responsive documents subset is constituted by the first and second subsets.
- the first Boolean search may apply a measurable precision query based on the plurality of query terms.
- the method can optionally further include automatically tagging each document in the first subset with a precision tag, reviewing the document bearing the precision tag to determine whether the document is properly tagged with the precision tag, and determining whether to narrow the precision query and rerun the Boolean search with the narrowed query terms.
- the method can optionally further comprise automatically tagging each document in the second subset with a recall tag, reviewing the document bearing the recall tag to determine whether the document is properly tagged with the recall tag, and determining whether to narrow the recall query and rerun the second search with the narrowed query terms.
- the method can optionally further include reviewing the first and second subsets to determine whether to modify the query terms and rerun the Boolean search and second search with modified query terms.
- a method for reviewing a collection of documents to identify relevant documents from the collection includes running a search of the collection of documents, based on a plurality of query terms, the search returning a subset of responsive documents in the collection (step S31), automatically identifying a correspondence between a sender and a recipient, in the responsive documents subset (step S33), automatically determining one or more additional documents which are in a thread of the correspondence, the additional documents not being in the responsive documents subset (step S35) , and adding the additional documents to the responsive documents subset (step S37) .
- the method can further comprise determining for each document in the responsive documents subset, a corresponding probability of relevancy, and removing from the responsive documents subset documents that do not reach a threshold probability of relevancy.
- the probability of relevancy of a document can be scaled according to a measure of obscurity of the search terms found in the document .
- the method can further comprise applying a statistical technique to determine whether a remaining subset of the collection of documents not in the responsive documents subset meets a predetermined acceptance level .
- the method can additionally comprise randomly selecting a predetermined number of documents from a remainder of the collection of documents not in the responsive documents subset, determining whether the randomly selected documents include additional relevant documents, identifying one or more specific terms in the additional relevant documents that render the documents relevant, expanding the query terms with the specific terms, and re-running the search with the expanded query terms.
- the method can further include randomly selecting a predetermined number of documents from a remainder of the collection of documents not in the responsive documents subset, determining whether the randomly selected documents include additional relevant documents, comparing a ratio of the additional relevant documents and the randomly selected documents to a predetermined acceptance level, and expanding the query terms and rerunning the search with the expanded query terms, if the ratio does not meet the predetermined acceptance level.
- the method can further comprise selecting two or more search terms, identifying synonyms of the search terms, and forming the query terms based on the search terms and synonyms.
- the method can additionally include determining whether any of the responsive documents in the responsive documents subset includes an attachment that is not in the subset, and adding the attachment to the subset .
- a method for reviewing a collection of documents to identify relevant documents from the collection can comprise running a search of the collection of documents, based on a plurality of query terms, the search returning a subset of responsive documents in the collection (step S41), automatically determining whether any of the responsive documents in the responsive documents subset includes an attachment that is not in the subset (step S43) , and adding the attachment to the responsive documents subset (step S45) .
- the method can further comprise determining for each document in the responsive documents subset, a corresponding probability of relevancy, and removing from the responsive documents subset documents that do not reach a threshold probability of relevancy.
- the probability of relevancy of a document is preferably scaled according to a measure of obscurity of the search terms found in the document .
- the method can additionally comprise applying a statistical technique to determine whether a remaining subset of the collection of documents not in the responsive documents subset meets a predetermined acceptance level.
- the method can further include randomly selecting a predetermined number of documents from a remainder of the collection of documents not in the responsive documents subset, determining whether the randomly selected documents include additional relevant documents, identifying one or more specific terms in the additional responsive documents that, render the documents relevant, expanding the query terms with the specific terms, and re-running the search with the expanded query terms .
- the method can further include selecting two or more search terms, identifying synonyms of the search terms, and forming the query terms based on the search terms and synonyms.
- the method can further comprise identifying a correspondence between a sender and a recipient, in the responsive documents subset, automatically determining one or more additional documents which are in a thread of the correspondence, the additional documents not being in the responsive documents subset, and adding the additional documents to the responsive documents subset .
- a method for reviewing a collection of documents to identify relevant documents from the collection comprises running a search of the collection of documents, based on a plurality of query- terms, the search returning a subset of responsive documents from the collection (step S51) , randomly selecting a predetermined number of documents from a remainder of the collection of documents not in the responsive documents subset (step S52), determining whether the randomly selected documents include additional relevant documents (step S53), identifying one or more specific terms in the additional responsive documents that render the documents relevant (step S54) , expanding the query terms with the specific terms (step S55), and re-running the search with the expanded query terms (step S56) .
- a method for reviewing a collection of documents to identify relevant documents from the collection can comprise specifying a set of tagging rules to extend query results to include attachments and email threads (step S61), expanding search query terms based on synonyms (step S62), running a precision Boolean search of the collection of documents, based on two or more search terms and returning a first subset of potentially relevant documents in the collection (step S63) , calculating the probability that the results of each Boolean query are relevant by multiplying the probability of relevancy of each search term, where those individual probabilities are determined using an algorithm constructed from the proportion of relevant synonyms for each search term (step S64) , applying a recall query based on the two or more search terms to run a second concept search of remaining ones of the collection of documents which were not returned by the first Boolean search, the second search returning a second subset of potentially relevant documents in the collection (step S65), calculating the probability that each search result in the recall query is relevant to a given topic
- the probability that results of a simple Boolean search (word search) are relevant to a given topic is directly related to the probability that the query terms themselves are relevant, i.e. that those terms are used within a relevant definition or context in the documents.
- the likelihood that a complex Boolean query will return relevant documents is a function of the probability that the query terms themselves are relevant.
- the following factors can be used to determine the probability that a word has been used in the defined context within a document: (1) the number of possible definitions of the word as compared to the number of relevant definitions; and (2) the relative obscurity of relevant definitions as compared to other definitions .
- a social networking approach can be taken to measure obscurity.
- the following method is consistent with the procedure generally used in the legal field currently for constructing query lists: (i) a list of potential query terms (keywords) is developed by the attorney team; (ii) for each word, a corresponding list of synonyms is created using a thesaurus; (iii) social network is drawn (using software) between all synonyms and keywords; (iv) a count of the number of ties at each node in the network is taken (each word is a node) ; (v) an obscurity factor is determined as the ratio between the number of ties at any word node and the greatest number of ties at any word node, or alternatively their respective z scores; and (vi) this obscurity factor is applied to the definitional probability calculated above.
- Boolean queries usually consist of multiple words, and thus a method of calculating the query terms interacting with each other is required.
- the simplest complex queries consist of query terms separated by the Boolean operators AND and/or OR.
- queries separated by an AND operator the individual probabilities of each word in the query are multiplied together to yield the probability that the complex query will return responsive results.
- query terms separated by an OR operator the probability of the query yielding relevant results is equal to the probability of the lowest ranked search term in the query string.
- Query words strung together within quotation marks are typically treated as a single phrase in Boolean engines (i.e. they are treated as if the string is one word) .
- a document is returned as a result if and only if the entire phrase exists within the document.
- the phrase is translated to its closest synonym and the probability of that word is assigned to the phrase.
- Complex Boolean queries can take the form of "A within X words B", where A and B are query terms and X is the number of words in separating them in a document which is usually a small number.
- a and B are query terms and X is the number of words in separating them in a document which is usually a small number.
- the purpose of this type of query, called a proximity query is to define the terms in relation to one another. This increases the probability that the words will be used responsively . The probability that a proximity query will return responsive documents equals the probability of the highest query term in the query will be responsive.
- FIG. 7A and 7B A workflow of a process including application of some of the techniques discussed herein, according to one example, is shown exemplarily in Figs. 7A and 7B.
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB0822623A GB2457121A (en) | 2006-06-07 | 2008-12-11 | Methods for enhancing efficiency and cost effectiveness of first pass review of documents |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/449,400 | 2006-06-07 | ||
US11/449,400 US8150827B2 (en) | 2006-06-07 | 2006-06-07 | Methods for enhancing efficiency and cost effectiveness of first pass review of documents |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2007146107A2 true WO2007146107A2 (en) | 2007-12-21 |
WO2007146107A3 WO2007146107A3 (en) | 2008-08-14 |
Family
ID=38823116
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2007/013483 WO2007146107A2 (en) | 2006-06-07 | 2007-06-07 | Methods for enhancing efficiency and cost effectiveness of first pass review of documents |
Country Status (3)
Country | Link |
---|---|
US (1) | US8150827B2 (en) |
GB (1) | GB2457121A (en) |
WO (1) | WO2007146107A2 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011072172A1 (en) | 2009-12-09 | 2011-06-16 | Renew Data Corp. | System and method for quickly determining a subset of irrelevant data from large data content |
JP2014109852A (en) * | 2012-11-30 | 2014-06-12 | Ubic:Kk | Document management system and document management method and document management program |
JP2014109871A (en) * | 2012-11-30 | 2014-06-12 | Ubic:Kk | Document management system and document management method and document management program |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8375008B1 (en) | 2003-01-17 | 2013-02-12 | Robert Gomes | Method and system for enterprise-wide retention of digital or electronic data |
US8943024B1 (en) | 2003-01-17 | 2015-01-27 | Daniel John Gardner | System and method for data de-duplication |
US7191175B2 (en) | 2004-02-13 | 2007-03-13 | Attenex Corporation | System and method for arranging concept clusters in thematic neighborhood relationships in a two-dimensional visual display space |
US8527468B1 (en) | 2005-02-08 | 2013-09-03 | Renew Data Corp. | System and method for management of retention periods for content in a computing system |
US20080189273A1 (en) * | 2006-06-07 | 2008-08-07 | Digital Mandate, Llc | System and method for utilizing advanced search and highlighting techniques for isolating subsets of relevant content data |
US8615490B1 (en) | 2008-01-31 | 2013-12-24 | Renew Data Corp. | Method and system for restoring information from backup storage media |
US8301619B2 (en) * | 2009-02-18 | 2012-10-30 | Avaya Inc. | System and method for generating queries |
US8713018B2 (en) | 2009-07-28 | 2014-04-29 | Fti Consulting, Inc. | System and method for displaying relationships between electronically stored information to provide classification suggestions via inclusion |
US8707243B2 (en) * | 2009-08-03 | 2014-04-22 | Virginia Panel Corporation | Interface configuration system and method |
EP2471009A1 (en) | 2009-08-24 | 2012-07-04 | FTI Technology LLC | Generating a reference set for use during document review |
US8738668B2 (en) | 2009-12-16 | 2014-05-27 | Renew Data Corp. | System and method for creating a de-duplicated data set |
US8296290B2 (en) * | 2010-02-05 | 2012-10-23 | Fti Consulting, Inc. | System and method for propagating classification decisions |
JP5552448B2 (en) * | 2011-01-28 | 2014-07-16 | 株式会社日立製作所 | Retrieval expression generation device, retrieval system, and retrieval expression generation method |
CN103699495A (en) * | 2013-12-27 | 2014-04-02 | 乐视网信息技术(北京)股份有限公司 | Transmission device and transmission system for splitting data |
US9805141B2 (en) | 2014-12-31 | 2017-10-31 | Ebay Inc. | Dynamic content delivery search system |
CN106021463B (en) * | 2016-05-17 | 2019-07-09 | 北京百度网讯科技有限公司 | Method, intelligent service system and the intelligent terminal of intelligent Service are provided based on artificial intelligence |
AU2017274558B2 (en) | 2016-06-02 | 2021-11-11 | Nuix North America Inc. | Analyzing clusters of coded documents |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5717913A (en) * | 1995-01-03 | 1998-02-10 | University Of Central Florida | Method for detecting and extracting text data using database schemas |
US20050144157A1 (en) * | 2003-12-29 | 2005-06-30 | Moody Paul B. | System and method for searching and retrieving related messages |
Family Cites Families (136)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5107419A (en) * | 1987-12-23 | 1992-04-21 | International Business Machines Corporation | Method of assigning retention and deletion criteria to electronic documents stored in an interactive information handling system |
US5350303A (en) * | 1991-10-24 | 1994-09-27 | At&T Bell Laboratories | Method for accessing information in a computer |
US5689699A (en) | 1992-12-23 | 1997-11-18 | International Business Machines Corporation | Dynamic verification of authorization in retention management schemes for data processing systems |
US5813015A (en) * | 1993-06-07 | 1998-09-22 | International Business Machine Corp. | Method and apparatus for increasing available storage space on a computer system by disposing of data with user defined characteristics |
US5535381A (en) * | 1993-07-22 | 1996-07-09 | Data General Corporation | Apparatus and method for copying and restoring disk files |
US5617566A (en) * | 1993-12-10 | 1997-04-01 | Cheyenne Advanced Technology Ltd. | File portion logging and arching by means of an auxilary database |
US7069451B1 (en) * | 1995-02-13 | 2006-06-27 | Intertrust Technologies Corp. | Systems and methods for secure transaction management and electronic rights protection |
US5742807A (en) * | 1995-05-31 | 1998-04-21 | Xerox Corporation | Indexing system using one-way hash for document service |
US5813009A (en) * | 1995-07-28 | 1998-09-22 | Univirtual Corp. | Computer based records management system method |
US5778395A (en) * | 1995-10-23 | 1998-07-07 | Stac, Inc. | System for backing up files from disk volumes on multiple nodes of a computer network |
US5732265A (en) * | 1995-11-02 | 1998-03-24 | Microsoft Corporation | Storage optimizing encoder and method |
US5926811A (en) * | 1996-03-15 | 1999-07-20 | Lexis-Nexis | Statistical thesaurus, method of forming same, and use thereof in query expansion in automated text searching |
US20020120925A1 (en) * | 2000-03-28 | 2002-08-29 | Logan James D. | Audio and video program recording, editing and playback systems using metadata |
US6182029B1 (en) * | 1996-10-28 | 2001-01-30 | The Trustees Of Columbia University In The City Of New York | System and method for language extraction and encoding utilizing the parsing of text data in accordance with domain parameters |
US5937401A (en) * | 1996-11-27 | 1999-08-10 | Sybase, Inc. | Database system with improved methods for filtering duplicates from a tuple stream |
US6157931A (en) | 1997-02-11 | 2000-12-05 | Connected Corporation | Database/template driven file selection for backup programs |
US6278992B1 (en) * | 1997-03-19 | 2001-08-21 | John Andrew Curtis | Search engine using indexing method for storing and retrieving data |
US5982370A (en) | 1997-07-18 | 1999-11-09 | International Business Machines Corporation | Highlighting tool for search specification in a user interface of a computer system |
US6442533B1 (en) * | 1997-10-29 | 2002-08-27 | William H. Hinkle | Multi-processing financial transaction processing system |
US6023710A (en) * | 1997-12-23 | 2000-02-08 | Microsoft Corporation | System and method for long-term administration of archival storage |
US7117227B2 (en) | 1998-03-27 | 2006-10-03 | Call Charles G | Methods and apparatus for using the internet domain name system to disseminate product information |
US6047294A (en) * | 1998-03-31 | 2000-04-04 | Emc Corp | Logical restore from a physical backup in a computer storage system |
US6216123B1 (en) * | 1998-06-24 | 2001-04-10 | Novell, Inc. | Method and system for rapid retrieval in a full text indexing system |
US6256633B1 (en) * | 1998-06-25 | 2001-07-03 | U.S. Philips Corporation | Context-based and user-profile driven information retrieval |
US6199081B1 (en) * | 1998-06-30 | 2001-03-06 | Microsoft Corporation | Automatic tagging of documents and exclusion by content |
US6226630B1 (en) * | 1998-07-22 | 2001-05-01 | Compaq Computer Corporation | Method and apparatus for filtering incoming information using a search engine and stored queries defining user folders |
US6240409B1 (en) * | 1998-07-31 | 2001-05-29 | The Regents Of The University Of California | Method and apparatus for detecting and summarizing document similarity within large document sets |
US7346580B2 (en) * | 1998-08-13 | 2008-03-18 | International Business Machines Corporation | Method and system of preventing unauthorized rerecording of multimedia content |
US6226618B1 (en) * | 1998-08-13 | 2001-05-01 | International Business Machines Corporation | Electronic content delivery system |
US6389403B1 (en) * | 1998-08-13 | 2002-05-14 | International Business Machines Corporation | Method and apparatus for uniquely identifying a customer purchase in an electronic distribution system |
US7228437B2 (en) * | 1998-08-13 | 2007-06-05 | International Business Machines Corporation | Method and system for securing local database file of local content stored on end-user system |
US6611812B2 (en) * | 1998-08-13 | 2003-08-26 | International Business Machines Corporation | Secure electronic content distribution on CDS and DVDs |
US6243713B1 (en) * | 1998-08-24 | 2001-06-05 | Excalibur Technologies Corp. | Multimedia document retrieval by application of multimedia queries to a unified index of multimedia data for a plurality of multimedia data types |
US6269382B1 (en) * | 1998-08-31 | 2001-07-31 | Microsoft Corporation | Systems and methods for migration and recall of data from local and remote storage |
IL126373A (en) * | 1998-09-27 | 2003-06-24 | Haim Zvi Melman | Apparatus and method for search and retrieval of documents |
US6226759B1 (en) * | 1998-09-28 | 2001-05-01 | International Business Machines Corporation | Method and apparatus for immediate data backup by duplicating pointers and freezing pointer/data counterparts |
US20030069873A1 (en) * | 1998-11-18 | 2003-04-10 | Kevin L. Fox | Multiple engine information retrieval and visualization system |
US6189002B1 (en) * | 1998-12-14 | 2001-02-13 | Dolphin Search | Process and system for retrieval of documents using context-relevant semantic profiles |
US6199067B1 (en) * | 1999-01-20 | 2001-03-06 | Mightiest Logicon Unisearch, Inc. | System and method for generating personalized user profiles and for utilizing the generated user profiles to perform adaptive internet searches |
US20020019814A1 (en) * | 2001-03-01 | 2002-02-14 | Krishnamurthy Ganesan | Specifying rights in a digital rights license according to events |
US6493711B1 (en) | 1999-05-05 | 2002-12-10 | H5 Technologies, Inc. | Wide-spectrum information search engine |
US6591261B1 (en) * | 1999-06-21 | 2003-07-08 | Zerx, Llc | Network search engine and navigation tool and method of determining search results in accordance with search criteria and/or associated sites |
US20020178176A1 (en) | 1999-07-15 | 2002-11-28 | Tomoki Sekiguchi | File prefetch contorol method for computer system |
US6477544B1 (en) | 1999-07-16 | 2002-11-05 | Microsoft Corporation | Single instance store for file systems |
US6324548B1 (en) | 1999-07-22 | 2001-11-27 | Unisys Corporation | Database backup and recovery using separate history files for database backup and audit backup |
US20040193695A1 (en) * | 1999-11-10 | 2004-09-30 | Randy Salo | Secure remote access to enterprise networks |
US6810395B1 (en) | 1999-11-22 | 2004-10-26 | Hewlett-Packard Development Company, L.P. | Method and apparatus for query-specific bookmarking and data collection |
US6834110B1 (en) | 1999-12-09 | 2004-12-21 | International Business Machines Corporation | Multi-tier digital TV programming for content distribution |
US7213005B2 (en) * | 1999-12-09 | 2007-05-01 | International Business Machines Corporation | Digital content distribution using web broadcasting services |
US6915435B1 (en) * | 2000-02-09 | 2005-07-05 | Sun Microsystems, Inc. | Method and system for managing information retention |
US7412462B2 (en) * | 2000-02-18 | 2008-08-12 | Burnside Acquisition, Llc | Data repository and method for promoting network storage of data |
US6421767B1 (en) | 2000-02-23 | 2002-07-16 | Storage Technology Corporation | Method and apparatus for managing a storage system using snapshot copy operations with snap groups |
US7137065B1 (en) | 2000-02-24 | 2006-11-14 | International Business Machines Corporation | System and method for classifying electronically posted documents |
US6859800B1 (en) * | 2000-04-26 | 2005-02-22 | Global Information Research And Technologies Llc | System for fulfilling an information need |
CA2307404A1 (en) * | 2000-05-02 | 2001-11-02 | Provenance Systems Inc. | Computer readable electronic records automated classification system |
US7089286B1 (en) * | 2000-05-04 | 2006-08-08 | Bellsouth Intellectual Property Corporation | Method and apparatus for compressing attachments to electronic mail communications for transmission |
US7577834B1 (en) | 2000-05-09 | 2009-08-18 | Sun Microsystems, Inc. | Message authentication using message gates in a distributed computing environment |
US6636848B1 (en) * | 2000-05-31 | 2003-10-21 | International Business Machines Corporation | Information search using knowledge agents |
DE60123442D1 (en) * | 2000-08-31 | 2006-11-09 | Ontrack Data Internat Inc | SYSTEM AND METHOD FOR DATA MANAGEMENT |
US6678679B1 (en) * | 2000-10-10 | 2004-01-13 | Science Applications International Corporation | Method and system for facilitating the refinement of data queries |
US6804662B1 (en) * | 2000-10-27 | 2004-10-12 | Plumtree Software, Inc. | Method and apparatus for query and analysis |
US6751628B2 (en) * | 2001-01-11 | 2004-06-15 | Dolphin Search | Process and system for sparse vector and matrix representation of document indexing and retrieval |
US7178099B2 (en) | 2001-01-23 | 2007-02-13 | Inxight Software, Inc. | Meta-content analysis and annotation of email and other electronic documents |
GB0104227D0 (en) * | 2001-02-21 | 2001-04-11 | Ibm | Information component based data storage and management |
US6745197B2 (en) * | 2001-03-19 | 2004-06-01 | Preston Gates Ellis Llp | System and method for efficiently processing messages stored in multiple message stores |
US7174368B2 (en) * | 2001-03-27 | 2007-02-06 | Xante Corporation | Encrypted e-mail reader and responder system, method, and computer program product |
JP4111685B2 (en) * | 2001-03-27 | 2008-07-02 | コニカミノルタビジネステクノロジーズ株式会社 | Image processing apparatus, image transmission method, and program |
JP2002288214A (en) | 2001-03-28 | 2002-10-04 | Hitachi Ltd | Search system and search service |
US6976016B2 (en) * | 2001-04-02 | 2005-12-13 | Vima Technologies, Inc. | Maximizing expected generalization for learning complex query concepts |
US20020147733A1 (en) | 2001-04-06 | 2002-10-10 | Hewlett-Packard Company | Quota management in client side data storage back-up |
US20020194324A1 (en) | 2001-04-26 | 2002-12-19 | Aloke Guha | System for global and local data resource management for service guarantees |
US7047386B1 (en) * | 2001-05-31 | 2006-05-16 | Oracle International Corporation | Dynamic partitioning of a reusable resource |
US6996580B2 (en) * | 2001-06-22 | 2006-02-07 | International Business Machines Corporation | System and method for granular control of message logging |
EP1410258A4 (en) * | 2001-06-22 | 2007-07-11 | Inc Nervana | System and method for knowledge retrieval, management, delivery and presentation |
US7188085B2 (en) * | 2001-07-20 | 2007-03-06 | International Business Machines Corporation | Method and system for delivering encrypted content with associated geographical-based advertisements |
US7793326B2 (en) | 2001-08-03 | 2010-09-07 | Comcast Ip Holdings I, Llc | Video and digital multimedia aggregator |
US6778979B2 (en) * | 2001-08-13 | 2004-08-17 | Xerox Corporation | System for automatically generating queries |
US7284191B2 (en) * | 2001-08-13 | 2007-10-16 | Xerox Corporation | Meta-document management system with document identifiers |
US6662198B2 (en) | 2001-08-30 | 2003-12-09 | Zoteca Inc. | Method and system for asynchronous transmission, backup, distribution of data and file sharing |
US6978274B1 (en) * | 2001-08-31 | 2005-12-20 | Attenex Corporation | System and method for dynamically evaluating latent concepts in unstructured documents |
AUPR797501A0 (en) | 2001-09-28 | 2001-10-25 | BlastMedia Pty Limited | A method of displaying content |
US7363425B2 (en) * | 2001-12-28 | 2008-04-22 | Hewlett-Packard Development Company, L.P. | System and method for securing drive access to media based on medium identification numbers |
US20030126247A1 (en) | 2002-01-02 | 2003-07-03 | Exanet Ltd. | Apparatus and method for file backup using multiple backup devices |
US7134020B2 (en) * | 2002-01-31 | 2006-11-07 | Peraogulne Corp. | System and method for securely duplicating digital documents |
US7693830B2 (en) * | 2005-08-10 | 2010-04-06 | Google Inc. | Programmable search engine |
US20030233455A1 (en) | 2002-06-14 | 2003-12-18 | Mike Leber | Distributed file sharing system |
US6947954B2 (en) * | 2002-06-17 | 2005-09-20 | Microsoft Corporation | Image server store system and method using combined image views |
US6941297B2 (en) * | 2002-07-31 | 2005-09-06 | International Business Machines Corporation | Automatic query refinement |
US20040064447A1 (en) * | 2002-09-27 | 2004-04-01 | Simske Steven J. | System and method for management of synonymic searching |
US7188173B2 (en) | 2002-09-30 | 2007-03-06 | Intel Corporation | Method and apparatus to enable efficient processing and transmission of network communications |
US6920523B2 (en) | 2002-10-07 | 2005-07-19 | Infineon Technologies Ag | Bank address mapping according to bank retention time in dynamic random access memories |
US7792832B2 (en) * | 2002-10-17 | 2010-09-07 | Poltorak Alexander I | Apparatus and method for identifying potential patent infringement |
US6928526B1 (en) * | 2002-12-20 | 2005-08-09 | Datadomain, Inc. | Efficient data storage system |
US20040143609A1 (en) * | 2003-01-17 | 2004-07-22 | Gardner Daniel John | System and method for data extraction in a non-native environment |
US7287025B2 (en) * | 2003-02-12 | 2007-10-23 | Microsoft Corporation | Systems and methods for query expansion |
JP4265245B2 (en) * | 2003-03-17 | 2009-05-20 | 株式会社日立製作所 | Computer system |
US6947930B2 (en) * | 2003-03-21 | 2005-09-20 | Overture Services, Inc. | Systems and methods for interactive search query refinement |
CA2536265C (en) * | 2003-08-21 | 2012-11-13 | Idilia Inc. | System and method for processing a query |
US7146388B2 (en) | 2003-10-07 | 2006-12-05 | International Business Machines Corporation | Method, system, and program for archiving files |
US7617205B2 (en) * | 2005-03-30 | 2009-11-10 | Google Inc. | Estimating confidence for query revision models |
US20050097081A1 (en) * | 2003-10-31 | 2005-05-05 | Hewlett-Packard Development Company, L.P. | Apparatus and methods for compiling digital communications |
US7249251B2 (en) | 2004-01-21 | 2007-07-24 | Emc Corporation | Methods and apparatus for secure modification of a retention period for data in a storage system |
US7912904B2 (en) * | 2004-03-31 | 2011-03-22 | Google Inc. | Email system with conversation-centric user interface |
US7293006B2 (en) * | 2004-04-07 | 2007-11-06 | Integrated Project Solutions Llc | Computer program for storing electronic files and associated attachments in a single searchable database |
US7260568B2 (en) * | 2004-04-15 | 2007-08-21 | Microsoft Corporation | Verifying relevance between keywords and web site contents |
US20050283473A1 (en) * | 2004-06-17 | 2005-12-22 | Armand Rousso | Apparatus, method and system of artificial intelligence for data searching applications |
US20060074980A1 (en) * | 2004-09-29 | 2006-04-06 | Sarkar Pte. Ltd. | System for semantically disambiguating text information |
US20080077570A1 (en) * | 2004-10-25 | 2008-03-27 | Infovell, Inc. | Full Text Query and Search Systems and Method of Use |
WO2006047654A2 (en) * | 2004-10-25 | 2006-05-04 | Yuanhua Tang | Full text query and search systems and methods of use |
US7640488B2 (en) * | 2004-12-04 | 2009-12-29 | International Business Machines Corporation | System, method, and service for using a focused random walk to produce samples on a topic from a collection of hyper-linked pages |
US20060167842A1 (en) * | 2005-01-25 | 2006-07-27 | Microsoft Corporation | System and method for query refinement |
US20060173824A1 (en) * | 2005-02-01 | 2006-08-03 | Metalincs Corporation | Electronic communication analysis and visualization |
JP2008537225A (en) * | 2005-04-11 | 2008-09-11 | テキストディガー,インコーポレイテッド | Search system and method for queries |
WO2006113597A2 (en) * | 2005-04-14 | 2006-10-26 | The Regents Of The University Of California | Method for information retrieval |
US7487146B2 (en) * | 2005-08-03 | 2009-02-03 | Novell, Inc. | System and method of searching for providing dynamic search results with temporary visual display |
US7526478B2 (en) * | 2005-08-03 | 2009-04-28 | Novell, Inc. | System and method of searching for organizing and displaying search results |
US7707146B2 (en) * | 2005-08-03 | 2010-04-27 | Novell, Inc. | System and method of searching for providing clue-based context searching |
US7747639B2 (en) * | 2005-08-24 | 2010-06-29 | Yahoo! Inc. | Alternative search query prediction |
US7844599B2 (en) * | 2005-08-24 | 2010-11-30 | Yahoo! Inc. | Biasing queries to determine suggested queries |
US20070061335A1 (en) | 2005-09-14 | 2007-03-15 | Jorey Ramer | Multimodal search query processing |
US7730081B2 (en) * | 2005-10-18 | 2010-06-01 | Microsoft Corporation | Searching based on messages |
US7650341B1 (en) * | 2005-12-23 | 2010-01-19 | Hewlett-Packard Development Company, L.P. | Data backup/recovery |
WO2007081681A2 (en) * | 2006-01-03 | 2007-07-19 | Textdigger, Inc. | Search system with query refinement and search method |
CN101000610B (en) * | 2006-01-11 | 2010-09-29 | 鸿富锦精密工业(深圳)有限公司 | Scatter storage system and method for file |
US20070198470A1 (en) * | 2006-01-27 | 2007-08-23 | Gordon Freedman | Method of reducing search space complexity using suggested search terms with display of an associated reduction factor |
US7584179B2 (en) * | 2006-01-27 | 2009-09-01 | William Derek Finley | Method of document searching |
US20070266009A1 (en) * | 2006-03-09 | 2007-11-15 | Williams Frank J | Method for searching and retrieving information implementing a conceptual control |
US8725729B2 (en) | 2006-04-03 | 2014-05-13 | Steven G. Lisa | System, methods and applications for embedded internet searching and result display |
JP4787055B2 (en) | 2006-04-12 | 2011-10-05 | 富士通株式会社 | Information processing apparatus with information division recording function |
US8762358B2 (en) * | 2006-04-19 | 2014-06-24 | Google Inc. | Query language determination using query terms and interface language |
US9529903B2 (en) * | 2006-04-26 | 2016-12-27 | The Bureau Of National Affairs, Inc. | System and method for topical document searching |
US8494281B2 (en) | 2006-04-27 | 2013-07-23 | Xerox Corporation | Automated method and system for retrieving documents based on highlighted text from a scanned source |
AU2007280092A1 (en) * | 2006-05-19 | 2008-02-07 | My Virtual Model Inc. | Simulation-assisted search |
US7752243B2 (en) * | 2006-06-06 | 2010-07-06 | University Of Regina | Method and apparatus for construction and use of concept knowledge base |
US8401841B2 (en) * | 2006-08-31 | 2013-03-19 | Orcatec Llc | Retrieval of documents using language models |
US8010534B2 (en) | 2006-08-31 | 2011-08-30 | Orcatec Llc | Identifying related objects using quantum clustering |
-
2006
- 2006-06-07 US US11/449,400 patent/US8150827B2/en active Active
-
2007
- 2007-06-07 WO PCT/US2007/013483 patent/WO2007146107A2/en active Application Filing
-
2008
- 2008-12-11 GB GB0822623A patent/GB2457121A/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5717913A (en) * | 1995-01-03 | 1998-02-10 | University Of Central Florida | Method for detecting and extracting text data using database schemas |
US20050144157A1 (en) * | 2003-12-29 | 2005-06-30 | Moody Paul B. | System and method for searching and retrieving related messages |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011072172A1 (en) | 2009-12-09 | 2011-06-16 | Renew Data Corp. | System and method for quickly determining a subset of irrelevant data from large data content |
JP2014109852A (en) * | 2012-11-30 | 2014-06-12 | Ubic:Kk | Document management system and document management method and document management program |
JP2014109871A (en) * | 2012-11-30 | 2014-06-12 | Ubic:Kk | Document management system and document management method and document management program |
Also Published As
Publication number | Publication date |
---|---|
US20070288445A1 (en) | 2007-12-13 |
GB2457121A (en) | 2009-08-05 |
GB0822623D0 (en) | 2009-01-21 |
WO2007146107A3 (en) | 2008-08-14 |
US8150827B2 (en) | 2012-04-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8150827B2 (en) | Methods for enhancing efficiency and cost effectiveness of first pass review of documents | |
US11663254B2 (en) | System and engine for seeded clustering of news events | |
CN110892399B (en) | System and method for automatically generating summary of subject matter | |
US20080189273A1 (en) | System and method for utilizing advanced search and highlighting techniques for isolating subsets of relevant content data | |
US7844595B2 (en) | Document similarity scoring and ranking method, device and computer program product | |
KR101190230B1 (en) | Phrase identification in an information retrieval system | |
US20080147642A1 (en) | System for discovering data artifacts in an on-line data object | |
US8332439B2 (en) | Automatically generating a hierarchy of terms | |
US20080147578A1 (en) | System for prioritizing search results retrieved in response to a computerized search query | |
JP5353173B2 (en) | Determining the concreteness of a document | |
US8407218B2 (en) | Role based search | |
US20050010559A1 (en) | Methods for information search and citation search | |
EP1669896A2 (en) | A machine learning system for extracting structured records from web pages and other text sources | |
US20080147588A1 (en) | Method for discovering data artifacts in an on-line data object | |
US20110145269A1 (en) | System and method for quickly determining a subset of irrelevant data from large data content | |
US20080147641A1 (en) | Method for prioritizing search results retrieved in response to a computerized search query | |
JP5391632B2 (en) | Determining word and document depth | |
WO2011091442A1 (en) | System and method for optimizing search objects submitted to a data resource | |
WO2007113546A1 (en) | Ranking of entities associated with stored content | |
CN110637316A (en) | System and method for intelligent prospective object recognition using online resources and neural network processing to classify tissue based on published material | |
CN109947902B (en) | Data query method and device and readable medium | |
Huang et al. | Quality-biased ranking of short texts in microblogging services | |
US20120130999A1 (en) | Method and Apparatus for Searching Electronic Documents | |
CA2956627A1 (en) | System and engine for seeded clustering of news events | |
Lee et al. | Reducing noises for recall-oriented patent retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07795885 Country of ref document: EP Kind code of ref document: A2 |
|
ENP | Entry into the national phase |
Ref document number: 0822623 Country of ref document: GB Kind code of ref document: A Free format text: PCT FILING DATE = 20070607 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 0822623.5 Country of ref document: GB |
|
WWE | Wipo information: entry into national phase |
Ref document number: 5230/KOLNP/2008 Country of ref document: IN |
|
NENP | Non-entry into the national phase |
Ref country code: RU |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS EPO FORM 1205A DATED 31.03.2009. |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 07795885 Country of ref document: EP Kind code of ref document: A2 |