WO2003040875A3 - Systems, methods, and software for classifying documents - Google Patents

Systems, methods, and software for classifying documents Download PDF

Info

Publication number
WO2003040875A3
WO2003040875A3 PCT/US2002/035177 US0235177W WO03040875A3 WO 2003040875 A3 WO2003040875 A3 WO 2003040875A3 US 0235177 W US0235177 W US 0235177W WO 03040875 A3 WO03040875 A3 WO 03040875A3
Authority
WO
WIPO (PCT)
Prior art keywords
target
text
classes
target classes
input text
Prior art date
Application number
PCT/US2002/035177
Other languages
French (fr)
Other versions
WO2003040875A2 (en
Original Assignee
West Publishing Company Doing
Al Kofahi Khalid
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by West Publishing Company Doing, Al Kofahi Khalid filed Critical West Publishing Company Doing
Priority to NZ533105A priority Critical patent/NZ533105A/en
Priority to DE60231005T priority patent/DE60231005D1/en
Priority to DK02786640T priority patent/DK1464013T3/en
Priority to JP2003542441A priority patent/JP4342944B2/en
Priority to CA2470299A priority patent/CA2470299C/en
Priority to EP02786640A priority patent/EP1464013B1/en
Priority to AU2002350112A priority patent/AU2002350112B8/en
Priority to CN028266501A priority patent/CN1701324B/en
Publication of WO2003040875A2 publication Critical patent/WO2003040875A2/en
Publication of WO2003040875A3 publication Critical patent/WO2003040875A3/en
Priority to AU2009202974A priority patent/AU2009202974B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99942Manipulating data structure, e.g. compression, compaction, compilation

Abstract

To reduce cost and improve accuracy, the inventors devised systems, methods, and software to aid classification of text, such as headnotes and other documents, to target classes in a target classification system. For example, one system computes composite scores based on: similarity of input text to text assigned to each of the target classes; similarity of non-target classes assigned to the input text and target classes; probability of a target class given a set of one or more non-target classes assigned to the input text; and/or probability of the input text given text assigned to the target classes. The exemplary system then evaluates the compposite scores using class-specific decision criteria, such as thresholds, ultimately assigning or recommending assignment of the input text to one or more of the target classes. The exemplary system is particularly suitable for classification systems having thousands of classes.
PCT/US2002/035177 2001-11-02 2002-11-01 Systems, methods, and software for classifying documents WO2003040875A2 (en)

Priority Applications (9)

Application Number Priority Date Filing Date Title
NZ533105A NZ533105A (en) 2001-11-02 2002-11-01 Systems, methods, and software for classifying documents
DE60231005T DE60231005D1 (en) 2001-11-02 2002-11-01 SYSTEMS, METHODS, AND SOFTWARE FOR CLASSIFYING DOCUMENTS
DK02786640T DK1464013T3 (en) 2001-11-02 2002-11-01 Document classification systems, methods and software
JP2003542441A JP4342944B2 (en) 2001-11-02 2002-11-01 System, method, and software for classifying documents
CA2470299A CA2470299C (en) 2001-11-02 2002-11-01 Systems, methods, and software for classifying documents
EP02786640A EP1464013B1 (en) 2001-11-02 2002-11-01 Systems, methods, and software for classifying documents
AU2002350112A AU2002350112B8 (en) 2001-11-02 2002-11-01 Systems, methods, and software for classifying documents
CN028266501A CN1701324B (en) 2001-11-02 2002-11-01 Systems, methods, and software for classifying text
AU2009202974A AU2009202974B2 (en) 2001-11-02 2009-07-23 Systems, methods, and software for classifying documents

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US33686201P 2001-11-02 2001-11-02
US60/336,862 2001-11-02

Publications (2)

Publication Number Publication Date
WO2003040875A2 WO2003040875A2 (en) 2003-05-15
WO2003040875A3 true WO2003040875A3 (en) 2003-08-07

Family

ID=23317997

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/035177 WO2003040875A2 (en) 2001-11-02 2002-11-01 Systems, methods, and software for classifying documents

Country Status (12)

Country Link
US (3) US7062498B2 (en)
EP (2) EP1464013B1 (en)
JP (3) JP4342944B2 (en)
CN (1) CN1701324B (en)
AT (1) ATE421730T1 (en)
AU (2) AU2002350112B8 (en)
CA (2) CA2470299C (en)
DE (1) DE60231005D1 (en)
DK (1) DK1464013T3 (en)
ES (1) ES2321075T3 (en)
NZ (1) NZ533105A (en)
WO (1) WO2003040875A2 (en)

Families Citing this family (103)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6154757A (en) * 1997-01-29 2000-11-28 Krause; Philip R. Electronic text reading environment enhancement method and apparatus
AU2002303270A1 (en) * 2001-04-04 2002-10-21 West Publishing Company System, method, and software for identifying historically related legal opinions
US7062498B2 (en) * 2001-11-02 2006-06-13 Thomson Legal Regulatory Global Ag Systems, methods, and software for classifying text from judicial opinions and other documents
US7139755B2 (en) 2001-11-06 2006-11-21 Thomson Scientific Inc. Method and apparatus for providing comprehensive search results in response to user queries entered over a computer network
US7356461B1 (en) * 2002-01-14 2008-04-08 Nstein Technologies Inc. Text categorization method and apparatus
US7188107B2 (en) * 2002-03-06 2007-03-06 Infoglide Software Corporation System and method for classification of documents
US8201085B2 (en) * 2007-06-21 2012-06-12 Thomson Reuters Global Resources Method and system for validating references
NZ541580A (en) * 2002-12-30 2009-03-31 Thomson Corp Knowledge-management systems for law firms
US20040133574A1 (en) 2003-01-07 2004-07-08 Science Applications International Corporaton Vector space method for secure information sharing
US7725544B2 (en) * 2003-01-24 2010-05-25 Aol Inc. Group based spam classification
US7089241B1 (en) * 2003-01-24 2006-08-08 America Online, Inc. Classifier tuning based on data similarities
US20040193596A1 (en) * 2003-02-21 2004-09-30 Rudy Defelice Multiparameter indexing and searching for documents
US7590695B2 (en) 2003-05-09 2009-09-15 Aol Llc Managing electronic messages
US7218783B2 (en) * 2003-06-13 2007-05-15 Microsoft Corporation Digital ink annotation process and system for recognizing, anchoring and reflowing digital ink annotations
US7739602B2 (en) 2003-06-24 2010-06-15 Aol Inc. System and method for community centric resource sharing based on a publishing subscription model
US7051077B2 (en) * 2003-06-30 2006-05-23 Mx Logic, Inc. Fuzzy logic voting method and system for classifying e-mail using inputs from multiple spam classifiers
US8473532B1 (en) * 2003-08-12 2013-06-25 Louisiana Tech University Research Foundation Method and apparatus for automatic organization for computer files
US20050097120A1 (en) * 2003-10-31 2005-05-05 Fuji Xerox Co., Ltd. Systems and methods for organizing data
US7676739B2 (en) * 2003-11-26 2010-03-09 International Business Machines Corporation Methods and apparatus for knowledge base assisted annotation
EP1704498A2 (en) * 2003-12-31 2006-09-27 Thomson Global Resources Systems, methods, interfaces and software for extending search results beyond initial query-defined boundaries
US9646082B2 (en) * 2003-12-31 2017-05-09 Thomson Reuters Global Resources Systems, methods, and software for identifying relevant legal documents
EP1704499A1 (en) * 2003-12-31 2006-09-27 Thomson Global Resources AG Systems, methods, software and interfaces for integration of case law with legal briefs, litigation documents, and/or other litigation-support documents
US7647321B2 (en) * 2004-04-26 2010-01-12 Google Inc. System and method for filtering electronic messages using business heuristics
US8484295B2 (en) 2004-12-21 2013-07-09 Mcafee, Inc. Subscriber reputation filtering method for analyzing subscriber activity and detecting account misuse
US7680890B1 (en) 2004-06-22 2010-03-16 Wei Lin Fuzzy logic voting method and system for classifying e-mail using inputs from multiple spam classifiers
US7953814B1 (en) 2005-02-28 2011-05-31 Mcafee, Inc. Stopping and remediating outbound messaging abuse
US20080320002A1 (en) * 2004-09-21 2008-12-25 Koninklijke Philips Electronics, N.V. Method of Providing Information
US9015472B1 (en) 2005-03-10 2015-04-21 Mcafee, Inc. Marking electronic messages to indicate human origination
US9160755B2 (en) * 2004-12-21 2015-10-13 Mcafee, Inc. Trusted communication network
US8738708B2 (en) * 2004-12-21 2014-05-27 Mcafee, Inc. Bounce management in a trusted communication network
US8185560B2 (en) * 2005-01-28 2012-05-22 Thomson Reuters Global Resources Systems, methods, software for integration of case law, legal briefs, and litigation documents into law firm workflow
US7499591B2 (en) * 2005-03-25 2009-03-03 Hewlett-Packard Development Company, L.P. Document classifiers and methods for document classification
US9177050B2 (en) 2005-10-04 2015-11-03 Thomson Reuters Global Resources Systems, methods, and interfaces for extending legal search results
US20070078889A1 (en) * 2005-10-04 2007-04-05 Hoskinson Ronald A Method and system for automated knowledge extraction and organization
US9552420B2 (en) * 2005-10-04 2017-01-24 Thomson Reuters Global Resources Feature engineering and user behavior analysis
US7917519B2 (en) * 2005-10-26 2011-03-29 Sizatola, Llc Categorized document bases
US7529748B2 (en) * 2005-11-15 2009-05-05 Ji-Rong Wen Information classification paradigm
CN100419753C (en) * 2005-12-19 2008-09-17 株式会社理光 Method and device for digital data central searching target file according to classified information
US8726144B2 (en) * 2005-12-23 2014-05-13 Xerox Corporation Interactive learning-based document annotation
US7333965B2 (en) * 2006-02-23 2008-02-19 Microsoft Corporation Classifying text in a code editor using multiple classifiers
KR100717401B1 (en) * 2006-03-02 2007-05-11 삼성전자주식회사 Method and apparatus for normalizing voice feature vector by backward cumulative histogram
US7735010B2 (en) * 2006-04-05 2010-06-08 Lexisnexis, A Division Of Reed Elsevier Inc. Citation network viewer and method
EP2033084A4 (en) * 2006-05-23 2012-04-11 David P Gold System and method for organizing, processing and presenting information
JP4910582B2 (en) * 2006-09-12 2012-04-04 ソニー株式会社 Information processing apparatus and method, and program
JP2008070958A (en) * 2006-09-12 2008-03-27 Sony Corp Information processing device and method, and program
US20080071803A1 (en) * 2006-09-15 2008-03-20 Boucher Michael L Methods and systems for real-time citation generation
US7844899B2 (en) * 2007-01-24 2010-11-30 Dakota Legal Software, Inc. Citation processing system with multiple rule set engine
US20080235258A1 (en) * 2007-03-23 2008-09-25 Hyen Vui Chung Method and Apparatus for Processing Extensible Markup Language Security Messages Using Delta Parsing Technology
US9323827B2 (en) * 2007-07-20 2016-04-26 Google Inc. Identifying key terms related to similar passages
DE102007034505A1 (en) * 2007-07-24 2009-01-29 Hella Kgaa Hueck & Co. Method and device for traffic sign recognition
CN100583101C (en) * 2008-06-12 2010-01-20 昆明理工大学 Text categorization feature selection and weight computation method based on field knowledge
US10354229B2 (en) 2008-08-04 2019-07-16 Mcafee, Llc Method and system for centralized contact management
US8352857B2 (en) * 2008-10-27 2013-01-08 Xerox Corporation Methods and apparatuses for intra-document reference identification and resolution
CA2764316C (en) 2009-06-01 2018-02-27 West Services Inc. Improved systems, methods, and interfaces for extending legal search results
US8713018B2 (en) * 2009-07-28 2014-04-29 Fti Consulting, Inc. System and method for displaying relationships between electronically stored information to provide classification suggestions via inclusion
CA3026879A1 (en) 2009-08-24 2011-03-10 Nuix North America, Inc. Generating a reference set for use during document review
US10146864B2 (en) * 2010-02-19 2018-12-04 The Bureau Of National Affairs, Inc. Systems and methods for validation of cited authority
WO2011159843A2 (en) 2010-06-15 2011-12-22 Thomson Reuters (Scientific) Inc. System and method for citation processing, presentation and transport for validating references
US8195458B2 (en) * 2010-08-17 2012-06-05 Xerox Corporation Open class noun classification
CN102033949B (en) * 2010-12-23 2012-02-29 南京财经大学 Correction-based K nearest neighbor text classification method
US9122666B2 (en) 2011-07-07 2015-09-01 Lexisnexis, A Division Of Reed Elsevier Inc. Systems and methods for creating an annotation from a document
US9305082B2 (en) 2011-09-30 2016-04-05 Thomson Reuters Global Resources Systems, methods, and interfaces for analyzing conceptually-related portions of text
WO2013123182A1 (en) * 2012-02-17 2013-08-22 The Trustees Of Columbia University In The City Of New York Computer-implemented systems and methods of performing contract review
US9058308B2 (en) 2012-03-07 2015-06-16 Infosys Limited System and method for identifying text in legal documents for preparation of headnotes
US9201876B1 (en) * 2012-05-29 2015-12-01 Google Inc. Contextual weighting of words in a word grouping
US8955127B1 (en) * 2012-07-24 2015-02-10 Symantec Corporation Systems and methods for detecting illegitimate messages on social networking platforms
CN103577462B (en) * 2012-08-02 2018-10-16 北京百度网讯科技有限公司 A kind of Document Classification Method and device
JP5526209B2 (en) * 2012-10-09 2014-06-18 株式会社Ubic Forensic system, forensic method, and forensic program
JP5823943B2 (en) * 2012-10-10 2015-11-25 株式会社Ubic Forensic system, forensic method, and forensic program
US9083729B1 (en) 2013-01-15 2015-07-14 Symantec Corporation Systems and methods for determining that uniform resource locators are malicious
US9189540B2 (en) * 2013-04-05 2015-11-17 Hewlett-Packard Development Company, L.P. Mobile web-based platform for providing a contextual alignment view of a corpus of documents
US20150026104A1 (en) * 2013-07-17 2015-01-22 Christopher Tambos System and method for email classification
JP2015060581A (en) * 2013-09-20 2015-03-30 株式会社東芝 Keyword extraction device, method and program
CN103500158A (en) * 2013-10-08 2014-01-08 北京百度网讯科技有限公司 Method and device for annotating electronic document
WO2015063784A1 (en) * 2013-10-31 2015-05-07 Hewlett-Packard Development Company, L.P. Classifying document using patterns
US20160048510A1 (en) * 2014-08-14 2016-02-18 Thomson Reuters Global Resources (Trgr) System and method for integration and operation of analytics with strategic linkages
US10255646B2 (en) 2014-08-14 2019-04-09 Thomson Reuters Global Resources (Trgr) System and method for implementation and operation of strategic linkages
US10572877B2 (en) * 2014-10-14 2020-02-25 Jpmorgan Chase Bank, N.A. Identifying potentially risky transactions
US9652627B2 (en) * 2014-10-22 2017-05-16 International Business Machines Corporation Probabilistic surfacing of potentially sensitive identifiers
US20160162576A1 (en) * 2014-12-05 2016-06-09 Lightning Source Inc. Automated content classification/filtering
US20160314184A1 (en) * 2015-04-27 2016-10-27 Google Inc. Classifying documents by cluster
JP5887455B2 (en) * 2015-09-08 2016-03-16 株式会社Ubic Forensic system, forensic method, and forensic program
US9852337B1 (en) * 2015-09-30 2017-12-26 Open Text Corporation Method and system for assessing similarity of documents
US11176145B2 (en) * 2015-10-17 2021-11-16 Ebay Inc. Generating personalized user recommendations using word vectors
CN106874291A (en) * 2015-12-11 2017-06-20 北京国双科技有限公司 The processing method and processing device of text classification
EP3437260B1 (en) * 2016-03-31 2021-09-29 Bitdefender IPR Management Ltd. System and methods for automatic device detection
US11347777B2 (en) * 2016-05-12 2022-05-31 International Business Machines Corporation Identifying key words within a plurality of documents
WO2017210618A1 (en) 2016-06-02 2017-12-07 Fti Consulting, Inc. Analyzing clusters of coded documents
WO2017216627A1 (en) 2016-06-16 2017-12-21 Thomson Reuters Global Resources Unlimited Company Scenario analytics system
US10146758B1 (en) 2016-09-30 2018-12-04 Amazon Technologies, Inc. Distributed moderation and dynamic display of content annotations
US10325409B2 (en) * 2017-06-16 2019-06-18 Microsoft Technology Licensing, Llc Object holographic augmentation
CN107657284A (en) * 2017-10-11 2018-02-02 宁波爱信诺航天信息有限公司 A kind of trade name sorting technique and system based on Semantic Similarity extension
CN110390094B (en) * 2018-04-20 2023-05-23 伊姆西Ip控股有限责任公司 Method, electronic device and computer program product for classifying documents
US11087088B2 (en) * 2018-09-25 2021-08-10 Accenture Global Solutions Limited Automated and optimal encoding of text data features for machine learning models
US11862305B1 (en) 2019-06-05 2024-01-02 Ciitizen, Llc Systems and methods for analyzing patient health records
US11424012B1 (en) * 2019-06-05 2022-08-23 Ciitizen, Llc Sectionalizing clinical documents
US11636117B2 (en) 2019-06-26 2023-04-25 Dallas Limetree, LLC Content selection using psychological factor vectors
US11170271B2 (en) * 2019-06-26 2021-11-09 Dallas Limetree, LLC Method and system for classifying content using scoring for identifying psychological factors employed by consumers to take action
CN110377742A (en) * 2019-07-23 2019-10-25 腾讯科技(深圳)有限公司 Text classification evaluating method, device, readable storage medium storing program for executing and computer equipment
CA3186038A1 (en) * 2020-07-14 2022-01-20 Thomson Reuters Enterprise Centre Gmbh Systems and methods for the automatic categorization of text
US11775592B2 (en) * 2020-08-07 2023-10-03 SECURITI, Inc. System and method for association of data elements within a document
US11941497B2 (en) * 2020-09-30 2024-03-26 Alteryx, Inc. System and method of operationalizing automated feature engineering
US11782957B2 (en) * 2021-04-08 2023-10-10 Grail, Llc Systems and methods for automated classification of a document

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5991755A (en) * 1995-11-29 1999-11-23 Matsushita Electric Industrial Co., Ltd. Document retrieval system for retrieving a necessary document
US6052657A (en) * 1997-09-09 2000-04-18 Dragon Systems, Inc. Text segmentation and identification of topic using language models
WO2000026795A1 (en) * 1998-10-30 2000-05-11 Justsystem Pittsburgh Research Center, Inc. Method for content-based filtering of messages by analyzing term characteristics within a message
WO2000067162A1 (en) * 1999-05-05 2000-11-09 West Publishing Company Document-classification system, method and software

Family Cites Families (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US583120A (en) * 1897-05-25 Soldeeing machine
US5054093A (en) * 1985-09-12 1991-10-01 Cooper Leon N Parallel, multi-unit, adaptive, nonlinear pattern class separator and identifier
US5157783A (en) 1988-02-26 1992-10-20 Wang Laboratories, Inc. Data base system which maintains project query list, desktop list and status of multiple ongoing research projects
US4961152A (en) * 1988-06-10 1990-10-02 Bolt Beranek And Newman Inc. Adaptive computing system
US5488725A (en) 1991-10-08 1996-01-30 West Publishing Company System of document representation retrieval by successive iterated probability sampling
US5265065A (en) 1991-10-08 1993-11-23 West Publishing Company Method and apparatus for information retrieval from a database by replacing domain specific stemmed phases in a natural language to create a search query
US5383120A (en) * 1992-03-02 1995-01-17 General Electric Company Method for tagging collocations in text
US5438629A (en) * 1992-06-19 1995-08-01 United Parcel Service Of America, Inc. Method and apparatus for input classification using non-spherical neurons
US5497317A (en) 1993-12-28 1996-03-05 Thomson Trading Services, Inc. Device and method for improving the speed and reliability of security trade settlements
US5434932A (en) 1994-07-28 1995-07-18 West Publishing Company Line alignment apparatus and process
WO1996034344A1 (en) * 1995-04-27 1996-10-31 Northrop Grumman Corporation Adaptive filtering neural network classifier
US5778397A (en) * 1995-06-28 1998-07-07 Xerox Corporation Automatic method of generating feature probabilities for automatic extracting summarization
US5918240A (en) * 1995-06-28 1999-06-29 Xerox Corporation Automatic method of extracting summarization using feature probabilities
DE19526264A1 (en) * 1995-07-19 1997-04-10 Daimler Benz Ag Process for creating descriptors for the classification of texts
US5644720A (en) 1995-07-31 1997-07-01 West Publishing Company Interprocess communications interface for managing transaction requests
EP0954854A4 (en) * 1996-11-22 2000-07-19 T Netix Inc Subword-based speaker verification using multiple classifier fusion, with channel, fusion, model, and threshold adaptation
JPH1185797A (en) * 1997-09-01 1999-03-30 Canon Inc Automatic document classification device, learning device, classification device, automatic document classification method, learning method, classification method and storage medium
JP3571231B2 (en) * 1998-10-02 2004-09-29 日本電信電話株式会社 Automatic information classification method and apparatus, and recording medium recording automatic information classification program
JP2000222431A (en) * 1999-02-03 2000-08-11 Mitsubishi Electric Corp Document classifying device
JP2001034622A (en) * 1999-07-19 2001-02-09 Nippon Telegr & Teleph Corp <Ntt> Document sorting method and its device, and recording medium recording document sorting program
NZ516822A (en) * 1999-08-06 2004-05-28 Lexis Nexis System and method for classifying legal concepts using legal topic scheme
SG89289A1 (en) * 1999-08-14 2002-06-18 Kent Ridge Digital Labs Classification by aggregating emerging patterns
US6651058B1 (en) * 1999-11-15 2003-11-18 International Business Machines Corporation System and method of automatic discovery of terms in a document that are relevant to a given target topic
US7565403B2 (en) * 2000-03-16 2009-07-21 Microsoft Corporation Use of a bulk-email filter within a system for classifying messages for urgency or importance
US20020099730A1 (en) * 2000-05-12 2002-07-25 Applied Psychology Research Limited Automatic text classification system
US6751600B1 (en) * 2000-05-30 2004-06-15 Commerce One Operations, Inc. Method for automatic categorization of items
US6782377B2 (en) * 2001-03-30 2004-08-24 International Business Machines Corporation Method for building classifier models for event classes via phased rule induction
US7295965B2 (en) * 2001-06-29 2007-11-13 Honeywell International Inc. Method and apparatus for determining a measure of similarity between natural language sentences
EP1421518A1 (en) * 2001-08-08 2004-05-26 Quiver, Inc. Document categorization engine
US7062498B2 (en) * 2001-11-02 2006-06-13 Thomson Legal Regulatory Global Ag Systems, methods, and software for classifying text from judicial opinions and other documents

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5991755A (en) * 1995-11-29 1999-11-23 Matsushita Electric Industrial Co., Ltd. Document retrieval system for retrieving a necessary document
US6052657A (en) * 1997-09-09 2000-04-18 Dragon Systems, Inc. Text segmentation and identification of topic using language models
WO2000026795A1 (en) * 1998-10-30 2000-05-11 Justsystem Pittsburgh Research Center, Inc. Method for content-based filtering of messages by analyzing term characteristics within a message
WO2000067162A1 (en) * 1999-05-05 2000-11-09 West Publishing Company Document-classification system, method and software

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
AL-KOFAHI K ET AL: "Combining multiple classifiers for text categorization", PROCEEDINGS OF THE 2001 ACM CIKM. TENTH INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, PROCEEDINGS OF CIKM'01: INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, ATLANTA, GA, USA, 5-10 NOV. 2001, 2001, New York, NY, USA, ACM, USA, pages 97 - 104, XP002231521, ISBN: 1-58113-436-3 *
DANOWSKI J A: "WORDIJ: A WORD-PAIR APPROACH TO INFORMATION RETRIEVAL", NIST SPECIAL PUBLICATION, GAITHERSBURG, MD, US, 1 March 1993 (1993-03-01), pages 131 - 136, XP000602948, ISSN: 1048-776X *
HATZIVASSILOGLOU V ET AL: "An investigation of linguistic features and clustering algorithms for topical document clustering", SIGIR 2000. 23RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, ATHENS, GREECE, 24-28 JULY 2000, vol. 34, SIGIR Forum, 2000, ACM, USA, pages 224 - 231, XP002243266, ISSN: 0163-5840 *
IYER R D ET AL: "Boosting for document routing", PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT. CIKM 2000, PROCEEDINGS OF NINTH INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT (CIKM), MCLEAN, VA, USA, 6-11 NOV. 2000, 2000, New York, NY, USA, ACM, USA, pages 70 - 77, XP002231519, ISBN: 1-58113-320-0 *
KITTLER J ET AL: "ON COMBINING CLASSIFIERS", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE INC. NEW YORK, US, vol. 20, no. 3, 1 March 1998 (1998-03-01), pages 226 - 239, XP000767916, ISSN: 0162-8828 *
LAM L: "Classifier combinations: implementations and theoretical issues", MULTIPLE CLASSIFIER SYSTEMS. FIRST INTERNATIONAL WORKSHOP, MCS 2000. PROCEEDINGS (LECTURE NOTES IN COMPUTER SCIENCE VOL.1857), PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON MULTIPLE CLASSIFIER SYSTEMS, CAGLIARI, ITALY, 21-23 JUNE 2000, 2000, Berlin, Germany, Springer-Verlag, Germany, pages 77 - 86, XP002231520, ISBN: 3-540-67704-6 *
LARKEY L S ET AL: "Combining classifiers in text categorization", 19TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, ZURICH, SWITZERLAND, 18-22 AUG. 1996, vol. spec. issue., SIGIR Forum, 1996, ACM, USA, pages 289 - 297, XP002231517, ISSN: 0163-5840 *
PAPKA R ET AL: "Document classification using multiword features", PROCEEDINGS OF THE 1998 ACM CIKM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, PROCEEDINGS OF CIKM '98 - 7TH INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, BETHESDA, MD, USA, 3-7 NOV. 1998, 1998, New York, NY, USA, ACM, USA, pages 124 - 131, XP002243267, ISBN: 1-58113-061-9 *
RAGAS H ET AL: "FOUR TEXT CLASSIFICATION ALGORITHMS COMPARED ON A DUTCH CORPUS", SIGIR '98. PROCEEDINGS OF THE 21ST ANNUAL INTERNATIONAL ACM-SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL. MELBOURNE, AUG. 24 - 28, 1998, ANNUAL INTERNATIONAL ACM-SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RET, 1998, pages 369 - 370, XP000867672, ISBN: 1-58113-015-5 *
TUMER K ET AL: "Order statistics combiners for neural classifiers", WCNN '95. WORLD CONGRESS ON NEURAL NETWORKS. 1995 INTERNATIONAL NEURAL NETWORK SOCIETY ANNUAL MEETING, PROCEEDINGS OF THE WORLD CONGRESS ON NEURAL NETWORKS, WASHINGTON, DC, USA, 17-21 JULY 1995, 1995, Mahwah, NJ, USA, Lawrence Erlbaum Associates, USA, pages 31 - 34 vol.1, XP002231518, ISBN: 0-8058-2125-2 *
YANG Y ET AL: "A RE-EXAMINATION OF TEXT CATEGORIZATION METHODS", PROCEEDINGS OF SIGIR'99. 22ND. INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL. BERKELEY, CA, AUG.;1999, ANNUAL INTERNATIONAL ACM-SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, NEW YORK, NY: ACM,, August 1999 (1999-08-01), pages 42 - 49, XP000970711, ISBN: 1-58113-096-1 *

Also Published As

Publication number Publication date
AU2002350112A1 (en) 2003-05-19
JP4342944B2 (en) 2009-10-14
US20060010145A1 (en) 2006-01-12
JP2009163771A (en) 2009-07-23
US20100114911A1 (en) 2010-05-06
DE60231005D1 (en) 2009-03-12
US7062498B2 (en) 2006-06-13
JP5392904B2 (en) 2014-01-22
JP2013178851A (en) 2013-09-09
EP1464013B1 (en) 2009-01-21
DK1464013T3 (en) 2009-05-18
ATE421730T1 (en) 2009-02-15
EP2012240A1 (en) 2009-01-07
CA2737943C (en) 2013-07-02
US7580939B2 (en) 2009-08-25
WO2003040875A2 (en) 2003-05-15
EP1464013A2 (en) 2004-10-06
CN1701324A (en) 2005-11-23
NZ533105A (en) 2006-09-29
CA2737943A1 (en) 2003-05-15
CN1701324B (en) 2011-11-02
CA2470299C (en) 2011-04-26
JP2005508542A (en) 2005-03-31
ES2321075T3 (en) 2009-06-02
AU2009202974A1 (en) 2009-08-13
AU2002350112B8 (en) 2009-04-30
US20030101181A1 (en) 2003-05-29
AU2009202974B2 (en) 2012-07-19
AU2002350112B2 (en) 2009-04-23
CA2470299A1 (en) 2003-05-15

Similar Documents

Publication Publication Date Title
WO2003040875A3 (en) Systems, methods, and software for classifying documents
WO2004086192A3 (en) Systems and methods for interactive search query refinement
WO2003060767A3 (en) System, method and software for automatic hyperlinking of persons’ names in documents to professional directories
WO2002080071A3 (en) Optimized system and method for finding best fares
WO2001082114A3 (en) System for fulfilling an information need
WO2004075029A8 (en) Using distinguishing properties to classify messages
WO2005062210A8 (en) Methods and systems for personalized network searching
WO2003057648A3 (en) Methods and systems for searching and associating information resources such as web pages
AUPR824501A0 (en) Methods and systems (npw003)
WO2004057497A3 (en) Reordered search of media fingerprints
WO2003012684A3 (en) A retrieval system and method based on a similarity and relative diversity
WO2007063328A3 (en) Information retrieval system and method using a bayesian algorithm based on probabilistic similarity scores
WO2004025391A3 (en) System and method of searching data utilizing automatic categorization
AUPR824301A0 (en) Methods and systems (npw001)
MY142877A (en) System and method for determining target failback and target priority for a distributed file system.
AUPR824601A0 (en) Methods and system (npw004)
EP1168199A3 (en) Indexing method and apparatus
WO2003107127A3 (en) System and method for personalized information retrieval based on user expertise
CN105808709A (en) Quick retrieval method and device of face recognition
WO2005106529A3 (en) Relational millimeter-wave interrogating
WO2004097685A3 (en) Distributed search methods, architectures, systems, and software
CN110096703B (en) Data processing method and device for intention recognition, server and client
WO2007025148A3 (en) Method and system for processing ambiguous, multi-term search queries
EP1530195A3 (en) Song search system and song search method
WO2007121105A3 (en) Systems and methods for predicting if a query is a name

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2003542441

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 533105

Country of ref document: NZ

WWE Wipo information: entry into national phase

Ref document number: 2002786640

Country of ref document: EP

Ref document number: 2002350112

Country of ref document: AU

Ref document number: 742/KOLNP/2004

Country of ref document: IN

Ref document number: 00742/KOLNP/2004

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 2470299

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 20028266501

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 2002786640

Country of ref document: EP