US20090019032A1 - Method and a system for semantic relation extraction - Google Patents
Method and a system for semantic relation extraction Download PDFInfo
- Publication number
- US20090019032A1 US20090019032A1 US11/979,534 US97953407A US2009019032A1 US 20090019032 A1 US20090019032 A1 US 20090019032A1 US 97953407 A US97953407 A US 97953407A US 2009019032 A1 US2009019032 A1 US 2009019032A1
- Authority
- US
- United States
- Prior art keywords
- relation
- feature
- entities
- semantic
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16Z—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS, NOT OTHERWISE PROVIDED FOR
- G16Z99/00—Subject matter not provided for in other main groups of this subclass
Abstract
The invention provides a method for semantic relation extraction, wherein on the basis of an annotated training corpus having tokens with associated relational labels each indicating a relation between the respective token and a selectable key entity semantic relation between said key entity and other entities are directly extracted from unstructured text using a probabilistic extraction model.
Description
- The invention relates to a method and a system for semantic relation extraction in particular from biomedical data.
- The rapid growth of published literature in many fields of technology such as the biomedical domain renders automated information extraction tools indispensable for researchers to make use of this immense source of knowledge.
- The past decade has been undergone an unprecedented increase of biomedical data in published literature. Progress in computational and biomedical methods has increased the pace of biomedical research. High throughput experiments, such as micro-arrays, produce large quantities of high-quality data which consequently leads to an increase of new findings and results. This development has caused an explosion of scientific literature published in this technical field. The overwhelming amount of textual information makes it necessary to use automated text information extraction tools to efficiently use the enormous amount of knowledge contained in biomedical literature stored in data bases. Text mining applications are provided to transfer unstructured information such as unstructured text information into structured form. Some text mining applications can only identify named entities. Possible entities in the biomedical field are genes, diseases, drugs, compounds, proteins etc. More important than identifying entities in an unstructured information data base is the identification of associations and relations between these entities. Relation extraction (RE) is the finding of associations and roles between entities having an unstructured information base such as text phrases. These text phrases are usually but not necessarily formed by a sentence.
- The conventional semantic relation extraction methods comprise two consecutive steps. In a first step the entities are identified by means of a named entity recognition (NER). In a second step for each pair of entities a relation type is predicted.
-
FIG. 1 shows a flow-chart for explaining a conventional method for semantic relation extraction. In a preprocessing phase features for evaluating text information are defined and an annotated training corpus is generated. The features for evaluating the unstructured text information can be predefined character strings being typical for a certain entity, such as “CADH”. Another example for a feature might be whether a number can be found in the text. In the preprocessing phase an annotated training corpus is generated by experts in the respective technical field. The training corpus can be formed by sentences annotated by the experts. -
FIG. 2 shows a table as an example for an annotated training corpus used by a conventional extraction method according to the state of the art. In the given example the training corpus consists of only two sentences i.e. “we found that TP53 is a lung cancer gene” and “smoking is bad for your lungs”. In real systems, the training corpus consists of a plurality of sentences or a plurality of documents or abstracts. Both sentences of the annotated training corpus consist of several words and tokens which are labeled by the experts according to a predefined classification scheme. It can be seen fromFIG. 2 that most tokens of the annotated training corpus are labeled to be common words (C). However, some tokens such as “TP53”, “lung” and “cancer” are labeled differently. The token “TP53” is labeled to be a “gene”. The neighboring tokens “lung” and “cancer” are both labeled as a disease d. Note that in the table ofFIG. 2 the word “lung” in the context ofsentence 1 is labeled to be a disease d because the next word is “cancer”, whereas “lungs” in theother sentence 2 of the training corpus is labeled to be a common word c. - After the feature definition and the generation of the annotated training corpus in the preprocessing phase, a feature set is provided for the annotated training corpus and weights are calculated on the basis of a feature label distribution in a training phase.
- In a further step an input query is input by a user to extract a semantic relation. A possible example is the sentence “Inactivating TP53 mutations were found in 55% of lethal metastatic pancreatic neoplasms”. The input query is tokenized into a sequence of tokens.
- The table of
FIG. 3 shows a token sequence consisting of twelve tokens x1 to x12 generated on the basis of the query input by the user. It can be seen from the flowchart ofFIG. 1 that in a conventional method for semantic relation extraction entity detection is performed after tokenization of the query. By means of a Viterbi algorithm the most likely label sequence is calculated.FIG. 3 shows the most likely label sequence for the given example. In the given example two entities are detected, i.e. one gene G and one disease D. Please note that the labels Y9, Y10, Y11, Y12 are recognized to represent one disease D. - After completion of the entity detection a second step for relation extraction is performed in the conventional method as shown in the flow-chart of
FIG. 1 . The relation extraction is for example rule-based. -
FIG. 4 shows a rule-based relation extraction performed by the conventional method. A possible way for a rule-based relation extraction according to the state of the art as shown inFIG. 4 is for the algorithm to check whether the tokens xi, which are labeled as common words c include keywords which are indicative for a corresponding relation. In the given example the token x3 “mutations” forms a common word c, but the token “mutations” is also an indicator for a particular relation, i.e. in this case genetic variation. After the rule-based relation extraction, the extracted relation is indicated to the user as shown inFIG. 5 . The user is informed that there is a relation “genetic variation” between the primary entity “gene TP53” and a second entity, i.e. a disease “lethal metastatic pancreatic neoplasms”. - As can be seen from the given example, relation extraction in conventional methods performed in a two-step manner, i.e. first the participating entities are identified and then the relations between the entities are extracted. Both pairs of entities are enumerated for a given text phrase and for each pair a prediction is made whether there is a relation or not.
- However, this conventional method for relation extraction as shown in the flow-chart of
FIG. 1 has several disadvantages. During calculation of the most likely label sequence by means of a Viterbi algorithm it can occur that the extracted entities are not labeled correctly. The conventional method is very sensitive to errors made during a named entity recognition (NER). A disease mislabeled as another entity in the NER-phase cannot be taken into account in a gene disease relation classification phase. As another example for instance if tokens X9 to X12 shown as in tableFIG. 3 , i.e. “lethal”, “metastatic”, “pancreatic”, “neoplasms” are mislabeled as genes (G) following a rule-based relation extraction the error is carried along so that the user receives as an output a genetic variation relation between a gene TP53 and a gene “lethal metastatic pancreatic neoplasms”. A further possible disadvantage of the conventional method for extracting relations is that for training one needs to process all pairs of entities within sentences which results in a lower number of positive examples and, thus, lower accuracy. - It is an object of the present invention to provide a method and a system for overcoming the disadvantages of the conventional method for semantic relation extraction as shown in
FIG. 1 . - The invention provides a method and a system for semantic relation extraction on the basis of an annotated training corpus having tokens with associated relation labels each indicating a relation between the respective token and a selectable key entity wherein semantic relations between the key entity and other entities are directly extracted from unstructured text using a probabilistic extraction model.
- In an embodiment of the system according to the present invention the probabilistic extraction model is a conditional random field (CRF).
-
FIG. 1 shows a flow-chart of a conventional method for semantic relation extraction according to the state of the art; -
FIG. 2 shows a table of an example for an annotated training corpus as used by the conventional method for semantic relation extraction shown in the flow-chart ofFIG. 1 ; -
FIG. 3 is a table of a calculated most likely label sequence of a tokenized input query as an intermediate result of the conventional method for semantic relation extraction shown in the flow-chart ofFIG. 1 ; -
FIG. 4 illustrates a rule-based relation extraction step as employed by a conventional method for semantic relation extraction as shown in the flow-chart ofFIG. 1 ; -
FIG. 5 shows the output of a conventional method for semantic relation extraction according to the state of the art for the exemplary input query ofFIG. 3 and the exemplary annotated training corpus indicated inFIG. 2 ; -
FIG. 6 shows a block diagram of a possible embodiment of a system for semantic relation extraction according to the present invention; -
FIG. 7 shows a flow-chart of a possible embodiment of the method for semantic relation extraction according to the present invention; -
FIG. 8 shows a simple flow-chart illustrating the calculation of weighting factors as employed by an embodiment of the method for semantic relation extraction according to the present invention; -
FIG. 9 shows a simple flow-chart illustrating the tokenization of an input query as employed by an embodiment of the method for semantic relation extraction according to the present invention; -
FIG. 10 shows a simple flow-chart indicating the extraction of relations of a key entity as employed by an embodiment of the method for semantic relation extraction according to the present invention; -
FIG. 11 shows an example of an annotated training corpus and a query for illustrating the functionality of an embodiment of the method for semantic relation extraction according to the present invention; -
FIG. 12 shows a table illustrating the functionality of a method and a system for semantic relation extraction according to the present invention; -
FIG. 13 shows a table indicating a calculated most-likely label sequence for a tokenized exemplary query as shown inFIG. 11 ; -
FIG. 14 shows an exemplary output of a result of the method for semantic relation extraction according to an embodiment of the present invention for the given example ofFIG. 11 . -
FIG. 6 shows a block diagram of a possible embodiment of a semanticrelation extraction system 1. It can be seen fromFIG. 6 that unstructured text comprising-a plurality of documents is stored in adata base 2. Thedata base 2 is connected to processing means 3. Thedata base 2 is connected either directly or via a network to the processing means 3. In other embodiments the processing means 3 are connected to a plurality of different data bases each having a plurality of unstructured documents. In a memory 4, an annotated training corpus is stored. The annotated training corpus comprises a plurality of tokens each having an associated relational label indicating a relation between the respective token and a selectable key entity. An example for an annotated training corpus used by the system according to the present invention is shown inFIG. 11 . The processing means 3 can be formed by any processor. The processing means 3 is connected to input means 5 and output means 6. The user can input a query, for instance an input query sentence by means of the input means 5. For example the input means 5 can be formed by a keyboard. The output means 6 can be formed by adisplay 6. The processing means 3 extracts semantic relations between a key entity and other entities from the unstructured text in thedata base 2 on the basis of the annotated training corpus stored in the memory 4. Semantic relations extracted by the processing means 3 can be stored by the processing means 3 in a structuredrelational database 7. -
FIG. 7 shows a flow-chart of a possible embodiment of the method for semantic relation extraction according to the present invention. - In a preprocessing phase a feature definition is performed in step S1 and the training corpus is generated in step S2. An example for an annotated training corpus generated in step S2 is shown in
FIG. 11 . - During a training phase consisting of step S3, S4 as shown in
FIG. 7 a feature set for the annotated training corpus is provided and weights are calculated on the basis of a feature-label-distribution. - The features used by the method according to the present invention comprise a set of standard condition features and additional relation recognition features. The standard recognition features can comprise orthographic feature, work shape features, n-gram features, dictionary features or context features.
- The biomedical entities often yield some orthographic characteristics. In many cases, biomedical entities consist of capitalized letters, include some numbers or are composed of combinations of both. Accordingly, orthographic features can help to distinguish various types of biomedical entities. Another recognition feature is a word shape feature.
- Some words belonging to the same class of entities have the same word shape. For instance, for disease abbreviations it is common that no number plus normal letters appear in the token as for gene/protein co-occurrence of numbers and letters is typical.
- As a further recognition feature according to the method according to the present invention uses character n-gram features for 2≦n≦4. This recognition feature helps to recognize informative sub-strikings like “ASE” or “HOMEO”, especially for words not seen in training.
- A further group of recognition features are dictionary features. For example, a disease dictionary can be used and is constructed by taking all names and synonyms of concepts covered by the disease branch (C) of the MeSH ontology. Furthermore, as a possible embodiment keyword dictionaries are used for different relation types such as altered expression, genetic variation, regulatory modification and unrelated. For example, a genetic variation dictionary can contain words like “mutation” and “polymorphism”. A dictionary feature is on, if the token matches with at least one keyword in the corresponding dictionary. Note that the presence of a certain keyword in a sentence is indicative, but not imperative for a specific relation. This is handled by the method according to the present invention because of its probabilistic nature.
- A further group of recognition features are context features. These context features consider the properties of preceding or following tokens for a current token xi in order to determine its category. Context features are important for several reasons. Thus, in case of nested entities such as: “
breast cancer 2 protein is expressed . . . ”. In this text phrase one does not want to extract a disease entity. Thus, when determining the correct label y for the token “breast”, it is important that one of the preceding word features will be “protein” indicating that “breast” refers to gene/protein entity and not to a disease. In a possible embodiment a window size is set to three. Context features are not only important in case of nested entities but also for relation extraction. - In the method and system according to the present invention besides the recognition features further relation recognition features are provided. These additional relation recognition features comprise for example a dictionary window feature, a key entity neighborhood feature, a start window feature and a negation feature.
- Each of the relation type dictionaries, for example for the relation type dictionaries mentioned above, i.e. the altered expression dictionary, the genetic variation dictionary, the regulatory modification dictionary and the unrelated dictionary it is defined that a feature is on, if at least keyword from the corresponding dictionary matches a word in a window size of N, i.e.
-
- tokens away from the current token. In an embodiment N=20.
- Furthermore, as a key entity neighborhood feature for each of the relation type dictionaries a feature is defined to be on if at least one keyword matches a word in a window size of M, i.e.
-
- tokens away from the key entity token. In a possible embodiment M=6.
- As a start window feature for each of the relation type dictionaries it is defined that the feature is on if at least one keyword matches a word in the first L tokens of a sentence. In a possible embodiment L=3. With this feature the fact is addressed that for many sentences important properties of a gene-disease-relation are mentioned at the beginning of a sentence.
- A negation feature is defined such that this feature is on, if none of the three above-mentioned relation recognition features matches a dictionary keyword.
- In an embodiment relation type features are based solely on dictionary information. In alternative embodiments, further information is integrated as relation type features such as word shape or n-gram features.
- In step S3 of the flow-chart of
FIG. 7 a feature a set of different features is provided for the annotated training corpus. For each feature of the feature set a corresponding weight λ is calculated by means of a maximum likelihood algorithm on the basis of a feature label distribution as shown in the flow-chart ofFIG. 8 . Accordingly, for each feature f a corresponding weighting factor λ is calculated as shown in the table ofFIG. 12 . A conditional random field CRF is defined as an undirected graphical model represented by a graph with vertices representing random variables and edges representing conditional independence assumptions. The most common graph is a graph which obeys a first order Markov property for each random variable yi. This means that each label variable y1 and yi+1 are associated in the graph G. Then y is said to be a linear chain CRF. - A conditional probability p of a label or state sequence for a given input sequence is defined as:
-
- wherein Zx is a normalization factor, fk(yi−1, yi, x, i) is an arbitrary feature function and λK is a calculated weight for a feature function ranging between −∞ and +∞.
- Each feature function fi specifies an association between a token x at a certain position and a label y for that position. Therefore, with each feature f one can express some characteristics of an empirical distribution of training data that should also be true for a model distribution.
- The corresponding feature weight λk specifies whether the association should be favored or disfavored. Higher values of λ indicate that their corresponding label transitions are more likely. In general, a weight λ for each feature f is high if the feature f tends to be on for the correct labeling. The weight λ is negative if the feature tends to be off for the correct labeling and should be around zero if it is uninformative. The weights λ are learned in a possible embodiment from labeled training data of the training corpus by a maximum likelihood estimation (MLE) algorithm.
- The normalization factor Zx is the sum over all possible state or label sequences SN, while N is the length of the input sequence:
-
- After the training phase the user can input a query via the
keyboard 5 to perform a semantic relation extraction in the extraction phase as shown inFIG. 7 . In a step S5 the user inputs the query Q. The query Q can consist of a sentence, i.e. a sequence of words. The query Q comprises a key entity. As can be seen from the example inFIG. 11 , the annotated training corpus employed by the method and system according to the present invention has a token labeled by the expert as key entities. As can be seen from the example inFIG. 11 , token “TP53” is labeled as a key entity. The user inputs for example a query Q such as “inactivating TP53 mutations were found in 55% of lethal metastatic pancreatic neoplasms” in step S5. - In a further step S6 the query Q is tokenized, i.e. a token sequence x1, x2, . . . xm is generated as illustrated by
FIG. 9 .FIG. 13 shows a table with the generated token sequence consisting of twelve tokens x1 to x12 for the given query example. - As can be seen from the table in
FIG. 11 in the annotated training corpus as used by the method according to the present invention, some tokens x such as “lung” and “cancer” are labeled with a relation such as “genetic variation disease GVD”. By comparing the annotated training corpus as used by the method according to the present invention as shown inFIG. 11 with the annotated training corpus used by the conventional method for semantic relation as shown inFIG. 2 it becomes evident that some tokens x such as “lung” or “cancer” in the annotated training corpus according to the present invention are not only labeled as a disease d but a relation of this token x to the key entity KE is also encoded or labeled. In the given example the encoded relation of the tokens “lung” and “cancer” to the key entity, i.e. TP53, is “genetic variation disease” (GVD). - In a step S7 the token sequence of the input query Q is labeled by means of a Viterbi algorithm to find a most likely label sequence as shown in
FIG. 10 . -
FIG. 13 shows a most likely label sequence generated by means of a Viterbi algorithm for the token sequence of the given example. By comparingFIG. 3 withFIG. 13 it becomes evident that with the method according to the present invention in step S7 a semantic relation of the key entity KE (in this case TP53) to other entities are directly extracted, i.e. in one single step. On thedisplay 6 the user can see directly the relation between the key entity TP53 and secondary entities. In the given example the user is informed that there is a genetic variation as a relation between the key entity TP53 and the secondary entity “lethal metastatic pancreatic neoplasms”. - In the present invention the investigated text phrase refers to a key entity KE such as “TP53” so that all other entities in the text phrase state a kind of relation to the key entity KE.
- For example, a biographical text usually gives information about an entity such as “Tony Blair” and all other entities in the text are involved in a certain relation with the entity (for example his family). Thus, with the present invention it is possible to predict a kind of relation holding between the key entity KE and all other secondary entities. With the method and system according to the present invention relation extraction is treated as a sequence labeling task. Accordingly, with the present invention a named entity recognition NER and a relation extraction step are merged together.
- Accordingly, with the method and system according to the present invention the entities' label y encodes a relation to the key entity KE and there is no initial labeling of the named entities.
- Gene RIF-sentences represent a similar style of text in the biomedical domain as biographical text. Gene RIF-sentences describe the function of a gene/protein, the key entity KE, as a concise phrase. As a consequence, gene RIF-sentences are an adequate source for transferring relation extraction to a sequence labeling problem.
- For example, the following gene RIF sentence is linked to a gene COX-2:
- “COX-2 expression is significantly more common in endometrical adenocarcinoma and ovarian serous cystadenocarcinoma, but not in cervical squamous carcinoma, compared with normal tissue.”
- This sentence states three disease relations with COX-2 (the key entity), namely two altered expression relations (expression of COX-2 relates to endometrical adenocarcinoma and ovarian serous cystadenocarcinoma) and one unrelated relation (cervical squamous carcinoma).
- Relation extraction RE is treated by the method according to the present invention as a tagging task such as NER or part of speech POS tagging. Accordingly, for each secondary entity the method of the present invention predicts the type of relation it has to the key entity KE. Each word in a sentence is regarded as a token x. Each token x is associated with a tag or label y which indicates the type of the token x. In the given example sentence about COX-2, the label “unrelated” is assigned to the tokens “cervical”, “squamous”, “carcinoma”, as they are evidently not related with the key entity gene whereas the tokens “endometrical”, “adenocarcinoma”, “ovarian”, “serous”, “cystadenocarcinoma” are labeled each as a disease related to the gene altered expression behaviour, thus, “altered expression”. These are the words representing diseases in the sentence. The other tokens x are labeled as not forming part of an entity. Two random variables X and Y are used to denote any input token sequences with associated label sequences. In the method according to the present invention to the given token sequence x1, x2, . . . , x, xn a correct label sequence y1, Y2, . . . yn is assigned.
- The method of the relation extraction according to the present invention is based on a one-step probabilistic extraction model, such as a linear chain conditional random field CRF. The method according to the present invention extracts the relations. For example, the method according to the present invention extracts relations between genes and diseases from Gene RIF (Gene Reference Into Function) sentences. Gene RIF (Gene Reference Into Function) are sentences which refer to a particular gene in the Entrez gene data base and describe its function in a concise phrase. The semantic relations extracted by the method and system according to the present invention can comprise different relations such as “altered expression”, “other genetic variation”, “regulatory modification”, “a general relation” or “an existing relation” between two entities. For example gene-disease-relations are categorized based on whether a gene is causing a disease state is a predisposition factor or is just associated with the disease. In an embodiment of the method according to the present invention, the gene-disease-relation categories are based on the observed state of a gene or protein, e.g. transcription level or mutation associated with the disease state. A class for sentences reporting evidence of no association between a gene state and a disease and a neutral class given not specific observe state are provided.
- The “altered expression” level of a gene/protein is reported to be associated with a certain disease or state of a disease. For example “low expression of BRCA-1 was associated with colorectal cancer”.
- As a further semantic relation, the “genetic variation” relates to a mutational event which is reported to be related with a disease. For example, “Inactivating TP53 mutations were found in 55% of lethal metastatic pancreatic neoplasms”.
- A further semantic relation “regulatory modification” states a modification of the gene/protein through methylation or phosphorylation. For example “e-cadherin and P16INK4A are commonly methylated in non-small cell lung cancer”.
- The semantic relation “any” is given when relation between a gene and a disease is reported without any further information regarding the gene's state. For example: “e-cadherin has a role in preventing peritoneal dissemination in gastric cancer”.
- As a further semantic relation, the relation “unrelated” indicates that a sentence is evident for an independence between a gene an a certain disease. For example “variations in TP53NBAX alleles are unrelated to the development of pemphigus foleaceus”. The method and system according to the present invention has in comparison to conventional methods a high recall, precision and f-score value.
- On a manually annotated data set of gene RIFS, the recall, precision and f-score of the method and system according to the present invention are evaluated. The recall and precision depend of true positive TP, false negative TN and false positive FP as follows:
-
- A true positive TP is a label sequence for a certain entity which exactly matches the label sequence for this entity from the standard. For example, in the following sentence “BRCA2 is mutated in stage II breast cancer” a human annotator labels “stage II breast cancer” as a disease related via genetic variation. Under the assumption that the method and system according to the present invention only recognizes “breast cancer” as a disease entity and categorizes the relation to gene-“BRCA2” as a “genetic variation”, the system gets assigned a false negative (FN) for not recognizing the whole sequence as well as one false positive (FP). In general, since this is hard matching criteria in many situations a more gentle criteria of correctness can be used.
- Table 1 shows a text corpus statistics for an annotated data set of 5.469 gene RIFs.
-
TABLE 1 Altered Genetic Regulatory Any Unrelated expression variation modification All Corpus 1396 369 1750 1695 186 5369 - Table 2 shows the results of a relation extraction RF as performed using the method and system according to the present invention.
-
TABLE 2 Recall Precision F-score Any 69.94 79.20 74.28 Unrelated 56.01 66.93 60-09 Altered 73.89 74.92 74.40 expression Genetic 75.99 778.06 77.01 variation Regulatory 61.13 70.50 65.48 modification Overall 71.54 76.31 73.84 - Table 2 lists accuracy measures for each of the predefined regulation types. For any, altered expression and genetic variation relations the method and system according to the present invention exceeds a boundary 74 F-measure. Average over all relations types the method and system according to the present invention achieves an overall accuracy of 73.84 F-measure for the given data set.
- Table 3 shows a comparison of different methods of semantic relation extraction. The first two models are based on a conventional two-step approach according to the state of the art consisting of an NER-step and a successive RE-step. In a first baseline model (dictionary plus rule-base) the NER-step is done via a dictionary longest matching approach while in the CRF plus rule-based model the NER-step is tackled via a disease NER CRF.
-
TABLE 3 Recall Precision F-score Dictionary + rule- 43.31 42.98 43.10 based CRF + rule- 67.62 71.88 69.68 based Relation CRF 71.54 76.31 73.84 - As can be seen from table 3, the method and system according to the present invention clearly outperforms the conventional two baseline approaches. The difference between the two-step approach according to the prior art methods with disease CRF tagger plus additional successive rules for RE and the method according to the present invention is 4.16 F-measure. This result indicates that the unified CRF performed by the method according to the present invention is able to learn additional patterns from the empirical distribution which are important for inferring the type of relation holding between gene and disease pairs.
- The method and system according to the present invention allows in a possible embodiment the identification of semantic gene disease relations based on a probabilistic extraction model. As can be seen from table 3, the overall performance of the method and system according to the present invention is better than conventional methods employing a two-step approach.
- Since method and system according to the present invention is discussed mostly with respect to biomedical data it is emphasized that the method and system according to the present invention can be used for semantic relation extraction for any kind of unstructured text.
- Further, the method and system according to the present invention can be used for semantic relation extraction for any unstructured text written in any language and any alphabet. The method and system according to the present invention allows to detect entities and their relations at the same time. The method and system according to the present invention has a higher performance, i.e. sensitivity and F-score, than conventional methods. The method and system according to the present invention do not only allow for a detection of a relation but also its characterization of its nature as far as mentioned in the unstructured text.
- In a possible embodiment the method according to the present invention is performed by a computer program on a computer. A possible embodiment this computer program comprises instructions to perform the method and is stored on a data carrier.
Claims (20)
1. A method for semantic relation extraction comprising: extracting directly on the basis of an annotated training corpus having tokens with associated relational labels each indicating a relation between the respective token and a selectable key entity semantic relation between said key entity and other entities from unstructured text using a probabilistic extraction model.
2. The method according to claim 1 ,
wherein the probabilistic extraction model is a conditional random field.
3. The method according to claim 1 ,
wherein weighting factors (λ) for each feature are calculated on the basis of a feature label distribution of said annotated training corpus by means of a maximum likelihood algorithm.
4. The method according to claim 1 , wherein a query comprising said key entity is input by a user.
5. The method according to claim 4 , wherein the input query is tokenized to generate a token sequence.
6. The method according to claim 5 , wherein a most likely label sequence is calculated for the generated token sequence by means of a Viterbi algorithm using said calculated weighting factors.
7. The method according to claim 6 , wherein a conditional probability (P) of the label sequence is calculated as follows:
wherein Zx is a normalization factor,
fk(yi−1, yi,x, i) is an arbitrary feature function, λK is a calculated weight factor for a feature function ranging between −∞ and +∞.
8. The method according to claim 7 , wherein the normalization factor Zx is calculated as follows:
wherein N is the length of the input sequence.
9. The method according to claim 1 , wherein the semantic relations are formed by biomedical relations.
10. The method according to claim 9 , wherein the biomedical relations is
an altered expression,
a genetic variation,
a regulatory modification,
a general relation, and
a non-existing relation between two entities.
11. The method according to claim 1 , wherein a set of recognition features is provided.
12. The method according to claim 11 , wherein the set of recognition features comprises:
orthographic features
word shape features,
n-gram features,
dictionary features, and
context features.
13. The method according to claim 1 , wherein a set of relation recognition features is provided.
14. The method according to claim 13 , wherein the set of relation recognition features comprises:
a dictionary window feature,
a key entity neighbourhood feature
a start window feature, and
a negation feature.
15. Method according to claim 1 , wherein the entities are formed by biomedical entities.
16. The method according to claim 15 , wherein the entities comprise genes, diseases, drugs, compounds and proteins.
17. A computer program for performing the method for semantic relation extraction according to claim 1 .
18. A data carrier for storing instructions of a computer program which performs the method for semantic relation extraction according to claim 1 .
19. A semantic relation extraction system comprising:
(a) means for storing unstructured text;
(b) means for storing an annotated training corpus having tokens with associated relational labels each indicating a relation between the respective token and a selectable key entity; and
(c) means for extracting semantic relations between the key entity and other entities from said unstructured text on the basis of said training corpus using a probabilistic extraction model.
20. The semantic relation extraction system according to claim 19 , wherein said probabilistic extraction model is a conditional random field.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP07013828 | 2007-07-13 | ||
EPEP07013828 | 2007-07-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
US20090019032A1 true US20090019032A1 (en) | 2009-01-15 |
Family
ID=40253985
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/979,534 Abandoned US20090019032A1 (en) | 2007-07-13 | 2007-11-05 | Method and a system for semantic relation extraction |
Country Status (1)
Country | Link |
---|---|
US (1) | US20090019032A1 (en) |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090282012A1 (en) * | 2008-05-05 | 2009-11-12 | Microsoft Corporation | Leveraging cross-document context to label entity |
US20110035210A1 (en) * | 2009-08-10 | 2011-02-10 | Benjamin Rosenfeld | Conditional random fields (crf)-based relation extraction system |
US20110251984A1 (en) * | 2010-04-09 | 2011-10-13 | Microsoft Corporation | Web-scale entity relationship extraction |
US20120158639A1 (en) * | 2010-12-15 | 2012-06-21 | Joshua Lamar Moore | Method, system, and computer program for information retrieval in semantic networks |
US20120226715A1 (en) * | 2011-03-04 | 2012-09-06 | Microsoft Corporation | Extensible surface for consuming information extraction services |
US20120253793A1 (en) * | 2011-04-01 | 2012-10-04 | Rima Ghannam | System for natural language understanding |
US8290968B2 (en) | 2010-06-28 | 2012-10-16 | International Business Machines Corporation | Hint services for feature/entity extraction and classification |
US20130086059A1 (en) * | 2011-10-03 | 2013-04-04 | Nuance Communications, Inc. | Method for Discovering Key Entities and Concepts in Data |
US20130246046A1 (en) * | 2012-03-16 | 2013-09-19 | International Business Machines Corporation | Relation topic construction and its application in semantic relation extraction |
US8849732B2 (en) | 2010-09-28 | 2014-09-30 | Siemens Aktiengesellschaft | Adaptive remote maintenance of rolling stocks |
WO2015077942A1 (en) * | 2013-11-27 | 2015-06-04 | Hewlett-Packard Development Company, L.P. | Relationship extraction |
WO2015080561A1 (en) | 2013-11-27 | 2015-06-04 | Mimos Berhad | A method and system for automated relation discovery from texts |
CN104933164A (en) * | 2015-06-26 | 2015-09-23 | 华南理工大学 | Method for extracting relations among named entities in Internet massive data and system thereof |
WO2016010245A1 (en) * | 2014-07-14 | 2016-01-21 | Samsung Electronics Co., Ltd. | Method and system for robust tagging of named entities in the presence of source or translation errors |
US20160085971A1 (en) * | 2014-09-22 | 2016-03-24 | Infosys Limited | System and method for tokenization of data for privacy |
US20160148116A1 (en) * | 2014-11-21 | 2016-05-26 | International Business Machines Corporation | Extraction of semantic relations using distributional relation detection |
US20170011023A1 (en) * | 2015-07-07 | 2017-01-12 | Rima Ghannam | System for Natural Language Understanding |
WO2018005203A1 (en) * | 2016-06-28 | 2018-01-04 | Microsoft Technology Licensing, Llc | Leveraging information available in a corpus for data parsing and predicting |
CN107992597A (en) * | 2017-12-13 | 2018-05-04 | 国网山东省电力公司电力科学研究院 | A kind of text structure method towards electric network fault case |
US10200397B2 (en) | 2016-06-28 | 2019-02-05 | Microsoft Technology Licensing, Llc | Robust matching for identity screening |
CN109791570A (en) * | 2018-12-13 | 2019-05-21 | 香港应用科技研究院有限公司 | Efficiently and accurately name entity recognition method and device |
CN110134772A (en) * | 2019-04-18 | 2019-08-16 | 五邑大学 | Medical text Relation extraction method based on pre-training model and fine tuning technology |
US10394955B2 (en) | 2017-12-21 | 2019-08-27 | International Business Machines Corporation | Relation extraction from a corpus using an information retrieval based procedure |
CN110348015A (en) * | 2019-07-12 | 2019-10-18 | 北京百奥知信息科技有限公司 | A kind of method of entity in automatic marking medicine text |
CN110781683A (en) * | 2019-11-04 | 2020-02-11 | 河海大学 | Entity relation joint extraction method |
CN110931084A (en) * | 2018-08-31 | 2020-03-27 | 国际商业机器公司 | Extraction and normalization of mutant genes from unstructured text for cognitive search and analysis |
US20200175020A1 (en) * | 2018-11-30 | 2020-06-04 | International Business Machines Corporation | Automated document filtration and prioritization for document searching and access |
US20200175021A1 (en) * | 2018-11-30 | 2020-06-04 | International Business Machines Corporation | Automated document filtration and priority scoring for document searching and access |
WO2020118741A1 (en) * | 2018-12-13 | 2020-06-18 | Hong Kong Applied Science and Technology Research Institute Company Limited | Efficient and accurate named entity recognition method and apparatus |
US20200218719A1 (en) * | 2019-01-04 | 2020-07-09 | International Business Machines Corporation | Automated document filtration with machine learning of annotations for document searching and access |
US10949607B2 (en) | 2018-12-10 | 2021-03-16 | International Business Machines Corporation | Automated document filtration with normalized annotation for document searching and access |
US10977292B2 (en) | 2019-01-15 | 2021-04-13 | International Business Machines Corporation | Processing documents in content repositories to generate personalized treatment guidelines |
CN113032523A (en) * | 2021-03-22 | 2021-06-25 | 平安科技(深圳)有限公司 | Extraction method and device of triple information, electronic equipment and storage medium |
WO2022072346A1 (en) * | 2020-09-29 | 2022-04-07 | Xcures, Inc. | Automated individualized recommendations for medical treatment |
CN114490928A (en) * | 2021-12-31 | 2022-05-13 | 广州探迹科技有限公司 | Implementation method, system, computer equipment and storage medium of semantic search |
US11721441B2 (en) | 2019-01-15 | 2023-08-08 | Merative Us L.P. | Determining drug effectiveness ranking for a patient using machine learning |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5963894A (en) * | 1994-06-24 | 1999-10-05 | Microsoft Corporation | Method and system for bootstrapping statistical processing into a rule-based natural language parser |
US6070134A (en) * | 1997-07-31 | 2000-05-30 | Microsoft Corporation | Identifying salient semantic relation paths between two words |
US20030217335A1 (en) * | 2002-05-17 | 2003-11-20 | Verity, Inc. | System and method for automatically discovering a hierarchy of concepts from a corpus of documents |
US20040093331A1 (en) * | 2002-09-20 | 2004-05-13 | Board Of Regents, University Of Texas System | Computer program products, systems and methods for information discovery and relational analyses |
US20060053098A1 (en) * | 2004-09-03 | 2006-03-09 | Bio Wisdom Limited | System and method for creating customized ontologies |
US20060053171A1 (en) * | 2004-09-03 | 2006-03-09 | Biowisdom Limited | System and method for curating one or more multi-relational ontologies |
US20060053151A1 (en) * | 2004-09-03 | 2006-03-09 | Bio Wisdom Limited | Multi-relational ontology structure |
US20060053099A1 (en) * | 2004-09-03 | 2006-03-09 | Biowisdom Limited | System and method for capturing knowledge for integration into one or more multi-relational ontologies |
US20060294037A1 (en) * | 2003-08-06 | 2006-12-28 | Microsoft Corporation | Cost-benefit approach to automatically composing answers to questions by extracting information from large unstructured corpora |
US20070016863A1 (en) * | 2005-07-08 | 2007-01-18 | Yan Qu | Method and apparatus for extracting and structuring domain terms |
US7194406B2 (en) * | 2000-06-22 | 2007-03-20 | Hapax Limited | Method and system for information extraction |
US20070067280A1 (en) * | 2003-12-31 | 2007-03-22 | Agency For Science, Technology And Research | System for recognising and classifying named entities |
US20070219776A1 (en) * | 2006-03-14 | 2007-09-20 | Microsoft Corporation | Language usage classifier |
US20080010274A1 (en) * | 2006-06-21 | 2008-01-10 | Information Extraction Systems, Inc. | Semantic exploration and discovery |
US20090192954A1 (en) * | 2006-03-15 | 2009-07-30 | Araicom Research Llc | Semantic Relationship Extraction, Text Categorization and Hypothesis Generation |
US7899666B2 (en) * | 2007-05-04 | 2011-03-01 | Expert System S.P.A. | Method and system for automatically extracting relations between concepts included in text |
US20110270604A1 (en) * | 2010-04-28 | 2011-11-03 | Nec Laboratories America, Inc. | Systems and methods for semi-supervised relationship extraction |
US8280719B2 (en) * | 2005-05-05 | 2012-10-02 | Ramp, Inc. | Methods and systems relating to information extraction |
-
2007
- 2007-11-05 US US11/979,534 patent/US20090019032A1/en not_active Abandoned
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5963894A (en) * | 1994-06-24 | 1999-10-05 | Microsoft Corporation | Method and system for bootstrapping statistical processing into a rule-based natural language parser |
US6070134A (en) * | 1997-07-31 | 2000-05-30 | Microsoft Corporation | Identifying salient semantic relation paths between two words |
US7194406B2 (en) * | 2000-06-22 | 2007-03-20 | Hapax Limited | Method and system for information extraction |
US20030217335A1 (en) * | 2002-05-17 | 2003-11-20 | Verity, Inc. | System and method for automatically discovering a hierarchy of concepts from a corpus of documents |
US20040093331A1 (en) * | 2002-09-20 | 2004-05-13 | Board Of Regents, University Of Texas System | Computer program products, systems and methods for information discovery and relational analyses |
US20060294037A1 (en) * | 2003-08-06 | 2006-12-28 | Microsoft Corporation | Cost-benefit approach to automatically composing answers to questions by extracting information from large unstructured corpora |
US20070067280A1 (en) * | 2003-12-31 | 2007-03-22 | Agency For Science, Technology And Research | System for recognising and classifying named entities |
US20060053171A1 (en) * | 2004-09-03 | 2006-03-09 | Biowisdom Limited | System and method for curating one or more multi-relational ontologies |
US20060053099A1 (en) * | 2004-09-03 | 2006-03-09 | Biowisdom Limited | System and method for capturing knowledge for integration into one or more multi-relational ontologies |
US20060053151A1 (en) * | 2004-09-03 | 2006-03-09 | Bio Wisdom Limited | Multi-relational ontology structure |
US20060053098A1 (en) * | 2004-09-03 | 2006-03-09 | Bio Wisdom Limited | System and method for creating customized ontologies |
US7505989B2 (en) * | 2004-09-03 | 2009-03-17 | Biowisdom Limited | System and method for creating customized ontologies |
US8280719B2 (en) * | 2005-05-05 | 2012-10-02 | Ramp, Inc. | Methods and systems relating to information extraction |
US20070016863A1 (en) * | 2005-07-08 | 2007-01-18 | Yan Qu | Method and apparatus for extracting and structuring domain terms |
US20070219776A1 (en) * | 2006-03-14 | 2007-09-20 | Microsoft Corporation | Language usage classifier |
US20090192954A1 (en) * | 2006-03-15 | 2009-07-30 | Araicom Research Llc | Semantic Relationship Extraction, Text Categorization and Hypothesis Generation |
US20080010274A1 (en) * | 2006-06-21 | 2008-01-10 | Information Extraction Systems, Inc. | Semantic exploration and discovery |
US7558778B2 (en) * | 2006-06-21 | 2009-07-07 | Information Extraction Systems, Inc. | Semantic exploration and discovery |
US7899666B2 (en) * | 2007-05-04 | 2011-03-01 | Expert System S.P.A. | Method and system for automatically extracting relations between concepts included in text |
US20110270604A1 (en) * | 2010-04-28 | 2011-11-03 | Nec Laboratories America, Inc. | Systems and methods for semi-supervised relationship extraction |
Non-Patent Citations (1)
Title |
---|
Extracting Relations from Unstructured Text, Ryan McDonald, April 15, 2005 * |
Cited By (60)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7970808B2 (en) * | 2008-05-05 | 2011-06-28 | Microsoft Corporation | Leveraging cross-document context to label entity |
US20090282012A1 (en) * | 2008-05-05 | 2009-11-12 | Microsoft Corporation | Leveraging cross-document context to label entity |
US20110035210A1 (en) * | 2009-08-10 | 2011-02-10 | Benjamin Rosenfeld | Conditional random fields (crf)-based relation extraction system |
US8918348B2 (en) | 2010-04-09 | 2014-12-23 | Microsoft Corporation | Web-scale entity relationship extraction |
US20110251984A1 (en) * | 2010-04-09 | 2011-10-13 | Microsoft Corporation | Web-scale entity relationship extraction |
US8504490B2 (en) * | 2010-04-09 | 2013-08-06 | Microsoft Corporation | Web-scale entity relationship extraction that extracts pattern(s) based on an extracted tuple |
US9317569B2 (en) | 2010-04-09 | 2016-04-19 | Microsoft Technology Licensing, Llc | Displaying search results with edges/entity relationships in regions/quadrants on a display device |
US8290968B2 (en) | 2010-06-28 | 2012-10-16 | International Business Machines Corporation | Hint services for feature/entity extraction and classification |
US8849732B2 (en) | 2010-09-28 | 2014-09-30 | Siemens Aktiengesellschaft | Adaptive remote maintenance of rolling stocks |
US8566273B2 (en) * | 2010-12-15 | 2013-10-22 | Siemens Aktiengesellschaft | Method, system, and computer program for information retrieval in semantic networks |
US20120158639A1 (en) * | 2010-12-15 | 2012-06-21 | Joshua Lamar Moore | Method, system, and computer program for information retrieval in semantic networks |
US20120226715A1 (en) * | 2011-03-04 | 2012-09-06 | Microsoft Corporation | Extensible surface for consuming information extraction services |
US9064004B2 (en) * | 2011-03-04 | 2015-06-23 | Microsoft Technology Licensing, Llc | Extensible surface for consuming information extraction services |
US20120253793A1 (en) * | 2011-04-01 | 2012-10-04 | Rima Ghannam | System for natural language understanding |
US9710458B2 (en) * | 2011-04-01 | 2017-07-18 | Rima Ghannam | System for natural language understanding |
US9110883B2 (en) * | 2011-04-01 | 2015-08-18 | Rima Ghannam | System for natural language understanding |
US20160041967A1 (en) * | 2011-04-01 | 2016-02-11 | Rima Ghannam | System for Natural Language Understanding |
US20130086059A1 (en) * | 2011-10-03 | 2013-04-04 | Nuance Communications, Inc. | Method for Discovering Key Entities and Concepts in Data |
US20130246046A1 (en) * | 2012-03-16 | 2013-09-19 | International Business Machines Corporation | Relation topic construction and its application in semantic relation extraction |
US9037452B2 (en) * | 2012-03-16 | 2015-05-19 | Afrl/Rij | Relation topic construction and its application in semantic relation extraction |
WO2015080561A1 (en) | 2013-11-27 | 2015-06-04 | Mimos Berhad | A method and system for automated relation discovery from texts |
US10643145B2 (en) | 2013-11-27 | 2020-05-05 | Micro Focus Llc | Relationship extraction |
WO2015077942A1 (en) * | 2013-11-27 | 2015-06-04 | Hewlett-Packard Development Company, L.P. | Relationship extraction |
WO2016010245A1 (en) * | 2014-07-14 | 2016-01-21 | Samsung Electronics Co., Ltd. | Method and system for robust tagging of named entities in the presence of source or translation errors |
US10073673B2 (en) | 2014-07-14 | 2018-09-11 | Samsung Electronics Co., Ltd. | Method and system for robust tagging of named entities in the presence of source or translation errors |
US20160085971A1 (en) * | 2014-09-22 | 2016-03-24 | Infosys Limited | System and method for tokenization of data for privacy |
US9953171B2 (en) * | 2014-09-22 | 2018-04-24 | Infosys Limited | System and method for tokenization of data for privacy |
US20160148116A1 (en) * | 2014-11-21 | 2016-05-26 | International Business Machines Corporation | Extraction of semantic relations using distributional relation detection |
US20160148096A1 (en) * | 2014-11-21 | 2016-05-26 | International Business Machines Corporation | Extraction of semantic relations using distributional relation detection |
US9785887B2 (en) * | 2014-11-21 | 2017-10-10 | International Business Machines Corporation | Extraction of semantic relations using distributional relation detection |
US9792549B2 (en) * | 2014-11-21 | 2017-10-17 | International Business Machines Corporation | Extraction of semantic relations using distributional relation detection |
CN104933164A (en) * | 2015-06-26 | 2015-09-23 | 华南理工大学 | Method for extracting relations among named entities in Internet massive data and system thereof |
US9824083B2 (en) * | 2015-07-07 | 2017-11-21 | Rima Ghannam | System for natural language understanding |
US20170011023A1 (en) * | 2015-07-07 | 2017-01-12 | Rima Ghannam | System for Natural Language Understanding |
WO2018005203A1 (en) * | 2016-06-28 | 2018-01-04 | Microsoft Technology Licensing, Llc | Leveraging information available in a corpus for data parsing and predicting |
US10200397B2 (en) | 2016-06-28 | 2019-02-05 | Microsoft Technology Licensing, Llc | Robust matching for identity screening |
CN109416705A (en) * | 2016-06-28 | 2019-03-01 | 微软技术许可有限责任公司 | It parses and predicts for data using information available in corpus |
US10311092B2 (en) | 2016-06-28 | 2019-06-04 | Microsoft Technology Licensing, Llc | Leveraging corporal data for data parsing and predicting |
CN107992597A (en) * | 2017-12-13 | 2018-05-04 | 国网山东省电力公司电力科学研究院 | A kind of text structure method towards electric network fault case |
US10394955B2 (en) | 2017-12-21 | 2019-08-27 | International Business Machines Corporation | Relation extraction from a corpus using an information retrieval based procedure |
CN110931084A (en) * | 2018-08-31 | 2020-03-27 | 国际商业机器公司 | Extraction and normalization of mutant genes from unstructured text for cognitive search and analysis |
US11170031B2 (en) * | 2018-08-31 | 2021-11-09 | International Business Machines Corporation | Extraction and normalization of mutant genes from unstructured text for cognitive search and analytics |
US20200175020A1 (en) * | 2018-11-30 | 2020-06-04 | International Business Machines Corporation | Automated document filtration and prioritization for document searching and access |
US20200175021A1 (en) * | 2018-11-30 | 2020-06-04 | International Business Machines Corporation | Automated document filtration and priority scoring for document searching and access |
US11074262B2 (en) | 2018-11-30 | 2021-07-27 | International Business Machines Corporation | Automated document filtration and prioritization for document searching and access |
US11061913B2 (en) | 2018-11-30 | 2021-07-13 | International Business Machines Corporation | Automated document filtration and priority scoring for document searching and access |
US10949607B2 (en) | 2018-12-10 | 2021-03-16 | International Business Machines Corporation | Automated document filtration with normalized annotation for document searching and access |
CN109791570A (en) * | 2018-12-13 | 2019-05-21 | 香港应用科技研究院有限公司 | Efficiently and accurately name entity recognition method and device |
WO2020118741A1 (en) * | 2018-12-13 | 2020-06-18 | Hong Kong Applied Science and Technology Research Institute Company Limited | Efficient and accurate named entity recognition method and apparatus |
US11068490B2 (en) | 2019-01-04 | 2021-07-20 | International Business Machines Corporation | Automated document filtration with machine learning of annotations for document searching and access |
US20200218719A1 (en) * | 2019-01-04 | 2020-07-09 | International Business Machines Corporation | Automated document filtration with machine learning of annotations for document searching and access |
US11721441B2 (en) | 2019-01-15 | 2023-08-08 | Merative Us L.P. | Determining drug effectiveness ranking for a patient using machine learning |
US10977292B2 (en) | 2019-01-15 | 2021-04-13 | International Business Machines Corporation | Processing documents in content repositories to generate personalized treatment guidelines |
CN110134772A (en) * | 2019-04-18 | 2019-08-16 | 五邑大学 | Medical text Relation extraction method based on pre-training model and fine tuning technology |
CN110348015A (en) * | 2019-07-12 | 2019-10-18 | 北京百奥知信息科技有限公司 | A kind of method of entity in automatic marking medicine text |
CN110781683A (en) * | 2019-11-04 | 2020-02-11 | 河海大学 | Entity relation joint extraction method |
WO2022072346A1 (en) * | 2020-09-29 | 2022-04-07 | Xcures, Inc. | Automated individualized recommendations for medical treatment |
CN113032523A (en) * | 2021-03-22 | 2021-06-25 | 平安科技(深圳)有限公司 | Extraction method and device of triple information, electronic equipment and storage medium |
WO2022198747A1 (en) * | 2021-03-22 | 2022-09-29 | 平安科技(深圳)有限公司 | Triplet information extraction method and apparatus, electronic device and storage medium |
CN114490928A (en) * | 2021-12-31 | 2022-05-13 | 广州探迹科技有限公司 | Implementation method, system, computer equipment and storage medium of semantic search |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20090019032A1 (en) | Method and a system for semantic relation extraction | |
Leaman et al. | TaggerOne: joint named entity recognition and normalization with semi-Markov Models | |
US9971974B2 (en) | Methods and systems for knowledge discovery | |
Bhasuran et al. | Stacked ensemble combined with fuzzy matching for biomedical named entity recognition of diseases | |
US20210012215A1 (en) | Hierarchical multi-task term embedding learning for synonym prediction | |
Li et al. | Two-phase biomedical named entity recognition using CRFs | |
Li et al. | Recognizing irregular entities in biomedical text via deep neural networks | |
Lamurias et al. | Extracting microRNA-gene relations from biomedical literature using distant supervision | |
Jiang et al. | De-identification of medical records using conditional random fields and long short-term memory networks | |
Verbeke et al. | A statistical relational learning approach to identifying evidence based medicine categories | |
Ahmad et al. | Bengali word embeddings and it's application in solving document classification problem | |
Kim et al. | Classifying protein-protein interaction articles using word and syntactic features | |
Liu et al. | Multi-granularity sequence labeling model for acronym expansion identification | |
Florez et al. | Named entity recognition using neural networks for clinical notes | |
Ekbal et al. | Combining feature selection and classifier ensemble using a multiobjective simulated annealing approach: application to named entity recognition | |
Palakal et al. | A multi-level text mining method to extract biological relationships | |
Hernandez et al. | An automated approach to identify scientific publications reporting pharmacokinetic parameters | |
Flores et al. | CREGEX: A biomedical text classifier based on automatically generated regular expressions | |
US20240013931A1 (en) | Method for constructing variation literature interpretation knowledge base, and interpretation method and electronic device | |
Bokharaeian et al. | Automatic extraction of ranked SNP-phenotype associations from text using a BERT-LSTM-based method | |
He et al. | End-to-end relation extraction based on bootstrapped multi-level distant supervision | |
de Arriba-Pérez et al. | Explainable machine learning multi-label classification of Spanish legal judgements | |
Liu et al. | Learning conditional random fields with latent sparse features for acronym expansion finding | |
Mekki et al. | Tokenization of Tunisian Arabic: a comparison between three Machine Learning models | |
Sheikhshab et al. | Graphner: Using corpus level similarities and graph propagation for named entity recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SIEMENS AKTIENGESELLSCHAFT, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BUNDSCHUS, MARKUS;DEJORI, MATHAEUS;STETTER, MARTIN;AND OTHERS;REEL/FRAME:020148/0856 Effective date: 20071022 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |