WO2003012681A1 - Method and system for automatically enhancing semantic resources with a real-time question-answer electronic system - Google Patents
Method and system for automatically enhancing semantic resources with a real-time question-answer electronic system Download PDFInfo
- Publication number
- WO2003012681A1 WO2003012681A1 PCT/FR2002/002513 FR0202513W WO03012681A1 WO 2003012681 A1 WO2003012681 A1 WO 2003012681A1 FR 0202513 W FR0202513 W FR 0202513W WO 03012681 A1 WO03012681 A1 WO 03012681A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- arguments
- question
- search
- answer
- documents
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/243—Natural language query formulation
Definitions
- the invention relates to a system for the automatic enrichment of semantic resources from a real-time electronic question-answer system.
- electronic question-answer system a system according to which a user asks a question in natural language from a computer tool such as a microcomputer (PC) connected through a communication network to a system comprising a database server in order to obtain information constituting the answer to his question.
- a computer tool such as a microcomputer (PC) connected through a communication network
- a system comprising a database server in order to obtain information constituting the answer to his question.
- the most recent systems incorporate a linguistic analysis to translate the question into a specialized language adapted to the search for information by a search engine.
- the invention provides a method for automatic enrichment of semantic resources from a real-time electronic question-and-answer system.
- the real-time electronic question and answer system is the tool used in accordance with the invention to automatically enrich the semantic resources of a semantic network.
- This enrichment system can therefore be used to enrich the semantic resources of a real-time electronic question-answer system and more generally of a semantic network.
- the system is capable of subjecting the question posed to a linguistic analysis operation in order to construct a pattern for finding the answer which is presented in the simplest form: argumentl - relation - argument2. This pattern is then sought and extracted (by direct matching) from the documents reported by the search engine or argument 1 is the response sought.
- the system integrates an operation of re-formulation of the questions in order to search for the same answer by means of several formulations which are se cantically equivalent.
- the re-formulation of the questions increases the quantity of documents that can provide an answer.
- the re-formulation of the questions is done by learning and makes it possible to automatically obtain a set of formulations semantically equivalent to a first formulation of a relationship between two arguments.
- the invention more particularly relates to a method for enriching semantic resources mainly characterized in that it consists in using an electronic real time system of question answers to obtain an automatic enrichment of said resources.
- the method comprises the following phases:
- a learning phase comprising: • a search for one or more re-formulations of the relationship formulated in the pattern of the initial question, based on the arguments known by this question and the answer or answers obtained,
- the search for a re-formulation includes:
- the subject of the invention is also a system of enrichment of semantic resources, mainly characterized in that it is coupled to an electronic real time system of question answers to obtain an automatic enrichment of said resources.
- the system includes: - means for implementing an initialization phase of these means comprising:
- the means of searching for a re-formulation include:
- the reference means, the means of extracting sentences containing the arguments known to these documents and the means of linguistic analysis of the sentences are produced by modules of computer programs or processes capable of implementing the stated functions and the means of search are carried out by a program called search engine.
- FIGS. 2A, 2B represent the detailed functional diagrams of the system of FIG. 1, - Figure 3 shows the device for implementing the functional blocks of Figures 2A and 2B according to one embodiment.
- an electronic system for real-time information search is used.
- This system is coupled to the system of enrichment of semantic resources through a communication network.
- a communication network can be an intra-net network giving access to one or more databases which can provide answers to questions asked by a user. It can also be the Internet.
- the search system whatever it is, is qualified as a basic system, it being understood that it can rely only on a linguistic analysis operation making it possible to extract at least one answer to a question asked in natural language by a user using to do this keywords.
- system is capable of constructing the question in the form of a search pattern: argument 1, relation, argument 2.
- This pattern then allows thanks to the search engine of the system to extract the response "X" in the phase found in one or more documents found at the end of the search "X killed Kennedy".
- This basic search system carries the reference
- FIG. 1 It is coupled through a communication network R to one or more databases 3.
- the system also includes a device for automatic enrichment of linguistic resources 2.
- This device carries out learning operations for re-formulations to automatically obtain a set of semantic formulations equivalent to a first formulation of a relationship between two arguments. .
- the question is submitted to a linguistic analysis allowing to build a pattern of research for the answer; the pattern which is then sought makes it possible to extract the response.
- step by step the system establishes several relationships which globally form a more abstract relationship generalizable by the same predicate of the type:
- the treatment of the question also makes it possible to obtain semantic labels: Who killed President Kennedy? allows to extract the “president” label by a simple syntactic treatment.
- the overall process consists in acquiring new data and new relationships by building on the previously acquired data and relationships. For a given statement, it is a question of exploring a new category of relations, starting from a known category by limiting the variable element.
- this technique makes it possible to gradually enrich all of the known relationships of a natural language processing system (for example a semantic network).
- a natural language processing system for example a semantic network
- This system allows the automatic creation of a semantic network. This enrichment can then be used for all applications using such a system.
- the system is directly usable for the extraction of information involving the acquired relationships. For example, to extract from a text all of the assassins and destroyed people, we can use the relationships learned as "murderer of" and "killed”. Second consequence: the system allows in the same way to acquire formulations in another language by using multilingual data from the WEB.
- the query [X, Kennedy] we get: X assassinated Kennedy, X killed Kennedy. This constitutes a learning system for translation and which makes it possible to provide a translation into another language.
- FIGS. 2A and 2B make it possible to illustrate in detail the operations implemented during the initialization phase I and during the learning phase II operated respectively by the search system 1 and by the device enrichment 2 illustrated in Figure 1.
- the initial phase is carried out by a search system which according to a preferred embodiment implements operations 10 to 17 of FIG. 2A.
- the question asked relates triplets of the type Argumentl, Relation, Argument2 (step 10).
- a linguistic analysis (step 11) makes it possible to obtain the extraction pattern of Argumentl: semantic type of Argl, Structure Relation-Arg2 (step 15).
- the search engine searches Arg2 close to Relation on the databases to which it has access or across the Web. According to the example taken:.
- a second linguistic analysis makes it possible to search in these documents for the structure of the Argumentl-Relation-Argument2 type (step 16). Knowing Argument 2 and Relation (from the question) the analyzer extracts Argument 1. Step 16 thus makes it possible to extract the answer Argument 1. For each question asked the system proceeds to this first phase to find at least one answer . The system continues with the next phase II which allows it to carry out a learning process. Knowing Argument 1 and Argument 2 (step 20; Arg 1 and Arg 2 are known after the initialization step), the search engine performs a search using the keywords Argument 1 close to Argument 2 (step 21) . The search engine finds documents in response and provides them in HTML format (step 22). These documents are filtered to obtain a well-formed text format and language paragraphs.
- step 23 a filter is used to extract sentences containing Argument 1 and Argument 2. A linguistic analysis is performed on these sentences (step 24).
- the system obtains synonyms of the desired relationship and more generally paraphrases and stores these synonyms in a database.
- the system therefore makes it possible to enrich the resources for research on a question asked.
- Element 1 for searching for the first answer comprises a module 100 for formatting and calling a module for linguistic analysis 110 and a search engine 120.
- the module 100 receives on an input to the question Q in ASCII coming from an html interface; this module formats this question according to a format predetermined in practice in a file usable by the linguistic analysis module 11.
- This module 100 delivers to an output c the question formatted in the desired format.
- This module 100 receives at another input b the data extracted by the analysis module on the typology of the question.
- This data includes keywords and other characteristic data.
- the module 100 makes it possible to deliver to another output e keywords.
- the other characteristic data are in the form of computer files and are stored in the data memory of the linguistic module 110.
- the keywords are transmitted to the search engine 12 so that it can provide the intermediate response as an output.
- This response is a candidate sentence containing the keywords.
- the 2000 response processing set includes:
- a module 200 for receiving on an input has the intermediate response, formatting it according to a predetermined format, delivering to an output c the formatted response, and supplying another output with the final response in a format suitable for the transmission over the network;
- a linguistic analysis module 110 in practice, this is the same module as that which extracts the keywords and other data.
- This module 110 is capable of receiving at an input the formatted intermediate response, of verifying this response from the stored characteristic data other than the keywords, extracting the final response after this verification, and output the final response.
- FIG. 1 shows a unit 1100 representing the operating program memory of the module, which comprises a program allowing to pilot all the operations of this module and the launching of the execution of a linguistic analysis program.
- the learning phase is also executed by this device and enriches the data memory.
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP02791500A EP1407389A1 (en) | 2001-07-19 | 2002-07-15 | Method and system for automatically enhancing semantic resources with a real-time question-answer electronic system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR01/09684 | 2001-07-19 | ||
FR0109684A FR2827685B1 (en) | 2001-07-19 | 2001-07-19 | METHOD AND SYSTEM FOR AUTOMATICALLY ENRICHING SEMANTIC RESOURCES FROM A REAL-TIME QUESTION-ANSWER ELECTRONIC SYSTEM |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2003012681A1 true WO2003012681A1 (en) | 2003-02-13 |
Family
ID=8865718
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FR2002/002513 WO2003012681A1 (en) | 2001-07-19 | 2002-07-15 | Method and system for automatically enhancing semantic resources with a real-time question-answer electronic system |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1407389A1 (en) |
FR (1) | FR2827685B1 (en) |
WO (1) | WO2003012681A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008004722A1 (en) * | 2006-07-03 | 2008-01-10 | Ismaker Co., Ltd. | System and method for realtime question and answer by communication media transferring interactive data and voice |
CN112528005A (en) * | 2020-12-25 | 2021-03-19 | 中山大学 | Chinese dialogue knowledge retrieval method based on knowledge retrieval graph and pre-training model |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0631244A2 (en) * | 1993-06-24 | 1994-12-28 | Xerox Corporation | A method and system of information retrieval |
WO2000057302A1 (en) * | 1999-03-19 | 2000-09-28 | Ask Jeeves, Inc. | Grammar template query system |
WO2001020500A2 (en) * | 1999-09-17 | 2001-03-22 | Sri International | Information retrieval by natural language querying |
US6263335B1 (en) | 1996-02-09 | 2001-07-17 | Textwise Llc | Information extraction system and method using concept-relation-concept (CRC) triples |
-
2001
- 2001-07-19 FR FR0109684A patent/FR2827685B1/en not_active Expired - Fee Related
-
2002
- 2002-07-15 WO PCT/FR2002/002513 patent/WO2003012681A1/en not_active Application Discontinuation
- 2002-07-15 EP EP02791500A patent/EP1407389A1/en not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0631244A2 (en) * | 1993-06-24 | 1994-12-28 | Xerox Corporation | A method and system of information retrieval |
US6263335B1 (en) | 1996-02-09 | 2001-07-17 | Textwise Llc | Information extraction system and method using concept-relation-concept (CRC) triples |
WO2000057302A1 (en) * | 1999-03-19 | 2000-09-28 | Ask Jeeves, Inc. | Grammar template query system |
WO2001020500A2 (en) * | 1999-09-17 | 2001-03-22 | Sri International | Information retrieval by natural language querying |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008004722A1 (en) * | 2006-07-03 | 2008-01-10 | Ismaker Co., Ltd. | System and method for realtime question and answer by communication media transferring interactive data and voice |
US8046342B2 (en) | 2006-07-03 | 2011-10-25 | Ismaker Co., Ltd. | System and method for providing real time answering service by using communication media capable of transmitting and receiving data and voice |
CN112528005A (en) * | 2020-12-25 | 2021-03-19 | 中山大学 | Chinese dialogue knowledge retrieval method based on knowledge retrieval graph and pre-training model |
CN112528005B (en) * | 2020-12-25 | 2022-08-09 | 中山大学 | Chinese dialogue knowledge retrieval method based on knowledge retrieval graph and pre-training model |
Also Published As
Publication number | Publication date |
---|---|
FR2827685A1 (en) | 2003-01-24 |
EP1407389A1 (en) | 2004-04-14 |
FR2827685B1 (en) | 2003-09-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
RU2747425C2 (en) | Real-time answer system to questions from different fields of knowledge | |
US6199067B1 (en) | System and method for generating personalized user profiles and for utilizing the generated user profiles to perform adaptive internet searches | |
US6671681B1 (en) | System and technique for suggesting alternate query expressions based on prior user selections and their query strings | |
EP0996899B1 (en) | Apparatus and methods for an information retrieval system that employs natural language processing of search results to improve overall precision | |
EP1364316A2 (en) | Device for retrieving data from a knowledge-based text | |
US6766320B1 (en) | Search engine with natural language-based robust parsing for user query and relevance feedback learning | |
US20030182391A1 (en) | Internet based personal information manager | |
WO2007136402A1 (en) | Systems and methods for answering user questions | |
Lenz et al. | CBR for document retrieval: The FA ll Q project | |
Zechner | A literature survey on information extraction and text summarization | |
WO2001090934A1 (en) | Automatic and secure data search method using a data transmission network | |
WO2003012681A1 (en) | Method and system for automatically enhancing semantic resources with a real-time question-answer electronic system | |
US20090234836A1 (en) | Multi-term search result with unsupervised query segmentation method and apparatus | |
US20030204500A1 (en) | Process and apparatus for automatic retrieval from a database and for automatic enhancement of such database | |
KR100225855B1 (en) | Schedule management method | |
CN113987146B (en) | Dedicated intelligent question-answering system of electric power intranet | |
Bruder et al. | GETESS: Constructing a linguistic search index for an Internet search engine | |
De Roeck et al. | The YPA–An Assistant for Classified Directory Enquiries | |
Klein et al. | Distributed knowledge-based parsing for document analysis and understanding | |
Buzikashvili | An exploratory web log study of multitasking | |
Nwe et al. | Replacing same meaning in sentences using natural language understanding | |
CN114722170A (en) | Interactive information pushing method and device based on user intention | |
FR2839168A1 (en) | Information extraction method through Internet, involves transmitting additional documents for specialized processing and formatted information, to server | |
Johansson | Computer forensic text analysis with open source software | |
FR3096157A1 (en) | multidimensional textual content indexing process |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BY BZ CA CH CN CO CR CU CZ DE DM DZ EC EE ES FI GB GD GE GH HR HU ID IL IN IS JP KE KG KP KR LC LK LR LS LT LU LV MA MD MG MN MW MX MZ NO NZ OM PH PL PT RU SD SE SG SI SK SL TJ TM TN TR TZ UA UG US UZ VN YU ZA ZM Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ UG ZM ZW AM AZ BY KG KZ RU TJ TM AT BE BG CH CY CZ DK EE ES FI FR GB GR IE IT LU MC PT SE SK TR BF BJ CF CG CI GA GN GQ GW ML MR NE SN TD TG US |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2002791500 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2002791500 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |