WO2009053613A1

WO2009053613A1 - Method and system for annotating multimedia documents

Info

Publication number: WO2009053613A1
Application number: PCT/FR2008/051823
Authority: WO
Inventors: Stéphane Canu; Bruno Grilheres; Stephan Brunessaux
Original assignee: Eads Defence And Security Systems
Priority date: 2007-10-10
Filing date: 2008-10-08
Publication date: 2009-04-30
Also published as: DE112008002713T8; GB2466752A; FR2922338A1; DE112008002713T5; GB201007180D0

Abstract

A method of annotating a plurality of multimedia documents, each multimedia document comprising at least one section, and each section comprising at least one characteristic, comprises the steps of: manual annotation (11) of at least one document of the plurality of documents by assigning each section of said document at least one annotation class, automatic creation (15) of an annotation model defining relationships between the annotation classes and the characteristics of the sections, said automatic creation being carried out by iterative learning based on the selecting of relevant sections, automatic annotation (23) of an unannotated document of the plurality of multimedia documents by applying the annotation model, charcterised in that the step of automatic creation comprises the sub-steps of selection, adjustment and suppression.

Description

METHOD AND SYSTEM FOR ANNOTATING DOCUMENTS

MULTIMEDIA.

The present invention relates to a method and a system for annotating multimedia documents as well as a computer program product for implementing the method. An annotation system is a system for adding high-level information called metadata to a multimedia document, that is, a text, image, audio and / or video document. Annotations are of various granularities and apply to a complete document as to any section of a document. They are also of varied nature. For example, annotations are temporal, spatial, semantic, and so on. and apply to either a document section or multiple sections in the case of relation extraction.

The annotation then allows advanced processing on documents. For example, it allows filtering against annotations, reasoning, or advanced annotation searches.

The annotation is usually done manually by a person responsible for reading the documents. Manually annotating documents, however, is a particularly time-consuming task. Also, in some cases, the annotation is performed completely automatically but then no improvement of the system over time is possible, except to change version of the annotation engine.

All automatic systems are based on an annotation model that contains the relationships between the annotations and the characteristics of the document, or a part / section of the document. Thus, when a new document is to be annotated, the system looks for characteristics identical to those contained in the model in order to apply the corresponding annotations to the document.

Generally speaking, a semantic annotation platform is based on matching (in English "matching") the instances of a domain ontology with the content of a document, generally with a step of semantic disambiguation making it possible to find the best instance of ontology according to the context of the section to annotate. Some annotation platforms allow you to learn an annotation model from examples. In this case, the generated models are often unclear to a non-expert user and can not be easily validated by him. It would be desirable to define a method and an annotation system that combine the efficiency of automatic systems with the flexibility and versatility of manual systems.

To solve one or more of the aforementioned drawbacks, a method of annotating a plurality of multimedia documents, each multimedia document comprising at least one section, and each section comprising at least one characteristic, comprises the steps of:

Manual annotation of at least one of the plurality of documents by assigning to each section of said document at least one annotation class, automatic creation of an annotation model defining relations between the annotation classes and the characteristics of the sections, said automatic creation being performed by iterative learning based on the selection of relevant sections, this step comprising the sub-steps of: selecting the most representative document section of the set of annotated sections and the furthest from the previously selected sections,

• adjustment of weights associated with the different sections,

• deleting sections with weights lower than a predetermined value or sections substantially identical to the selected sections.

• Automatic annotation of a non-annotated document of the plurality of multimedia documents by application of the annotation template.

Thus, the annotation method advantageously creates a model based on a set of annotations provided by the user. It is therefore understandable that, by a suitable selection of manually annotated documents, the user has a decisive influence on the quality of this model. Particular features or embodiments of this method are:

• It also includes a step of manual validation of the automatic annotation, followed by an iteration of at least the step of automatic model creation to replace the annotation model to take into account the validated annotations, the steps automatic creation, automatic annotation and manual validation thus forming an iterative loop for improving the annotation model.

• a relation between a representative graph of the annotation classes and a representative graph of the characteristics of the documents is defined as the product of exponential families based on the sections, and a cost function is defined as the log likelihood of at least a part sections so that the automatic creation of the model consists of minimizing the cost function by selecting the most representative sections and adjusting the weights.

• Cost minimization involves the following iterative steps:

• creation of a set of active sections initialized empty,

• iteration as long as the cost function decreases substeps:

• generation of all possible sections on annotated documents,

Calculating, for each selected section, a gradient of the cost function for zero weighting,

• addition to the set of active sections, of the section or sections whose gradient is maximum, • iteration as long as the weightings of the sections of the set of active sections change substeps: o calculation of the cost function and associated gradients, o calculating the weights of the sections of the set of active sections by a gradient descent method.

Thus, the method advantageously makes it possible to create an iterative loop for improving the annotation model because the validation and the possible correction of the annotations generated automatically makes it possible to on the one hand, to provide new input data for the automatic creation of the model and, on the other hand, to validate the quality of the model by the number of corrections to be made.

The method of automatic creation of the model also advantageously allows to limit as much as possible the number of sections used by it while retaining only those which are most representative of the links of the model.

It is based on the CRF models described in John Lafferty, Andrew McCallum and Fernando Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Sequence Data Labeling", Proceedings of the International Conference on Machine Learning (ICML-2001), 2001.

It is also based on the KCRF models described in John Lafferty, Xiaojin Zhu and Yan Liu, "Kernel conditional random fields: representation and clique selection", ICML '04: Proceedings of the twenty-first international conference on Machine learning.

However, the method advantageously makes it possible to make these models iterative and incremental.

In a second aspect of the invention, a computer program product includes program code instructions recorded on a computer readable medium, for implementing the steps of the preceding method when said program is running on a computer.

In a third aspect of the invention, a system for annotating a plurality of multimedia documents, each multimedia document comprising at least one section, and each section comprising at least one characteristic, comprises:

User interface means adapted to manually annotate at least one document of the plurality of documents by assigning to each section of said document at least one annotation class; means for automatically creating a document; annotation model defining relations between the annotation classes and the characteristics of the sections, said automatic creation means comprising iterative learning means based on the selection of relevant sections, as well as means for • selection of the document section most representative of all the existing sections and the farthest from the previously selected sections,

• adjustment of weightings associated with the different sections, • deletion of sections having weights lower than a predetermined value or sections substantially identical to the selected sections,

Means for automatically annotating a non-annotated document of the plurality of multimedia documents by applying the annotation model.

Features or particular embodiments of this system are:

The interface means are further adapted to manually validate the automatic annotation, and in that the automatic model creation means are adapted to take account of the validated annotations.

The invention will be better understood on reading the description which follows, given solely by way of example, and with reference to the appended figures in which:

FIG. 1 is a schematic view of an annotation system according to one embodiment of the invention; and

FIG. 2 is a flowchart of an annotation method according to one embodiment of the invention.

With reference to FIG. 1, an annotation system comprises a terminal 1 having a man / machine interface 3. This interface 3 is adapted to annotate a document manually, it is classically based on a hardware information presentation interface composed, for example, of a screen, and on information input means composed, for example, of a keyboard and a mouse.

This interface 3 allows various elementary operations related to manual document annotation.

It thus makes it possible to cut the document into one or more homogeneous sections on which the annotations will be based. Depending on the type of document, a section corresponds to a variable granularity representing a certain homogeneity. For example, a section is a word in a text or a sequence of images in a video.

The division into sections is either automatic, that is to say that the system performs only this one, either manual and therefore performed by the operator by means of the interface, or semi-automatic allowing the operator to modify a slice prepared by the system. Once the document is cut into sections, the interface 3 allows the user to assign each section at least one class, or type, of annotation. For example, the user assigns syntactic annotations to text and / or semantic annotations as a class of an ontology. To help with this assignment, the interface includes selection tools. In a relatively simple form, these tools may be only list forms allowing a choice of annotation among a predefined list. In more sophisticated forms, these tools may offer annotations based on a first automatic analysis of the document or section concerned, for example using a pre-existing annotation template.

The human / machine interface 3 thus includes a specialized editor for adding, modifying or deleting annotations to a multimedia document.

The terminal 1 is connected to a learning server 5 by a data link 6. The learning server 5 comprises means 7 for automatically creating an annotation model defining relations between the annotation classes and the characteristics of the sections. The automatic creation means 7 use manually annotated documents from the terminal 1 as input parameters.

The learning server 5 also comprises means 9 for automatic annotation of a non-annotated document by application of the created annotation model.

The operation of the system will now be described in connection with FIG.

In a step 11, a user annotates a multimedia document using the terminal 1 and the adapted man-machine interface 3.

The annotated document is sent at 13 to the learning server 5. The learning server 5 then starts, step 15, the execution of the means 7 of automatic model creation. These perform iteratively the following steps: • selection, step 17, of the section of the most representative document of the set and the furthest away from the sections that have already been possibly selected. By distant section, or near, another section, is meant a distance in the mathematical sense of the term defined in a section metric. Thus, a close section is a section that has substantially the same characteristics or very similar characteristics of other sections.

Adjustment, in step 19, of the weightings associated with the various selected sections, deletion, step 21, of the least representative sections of all the documents or sections closest to the selected sections.

These three steps 17, 19 and 21 of selection, adjustment and deletion are iterated until a satisfactory model is obtained, which corresponds, for example, to a minimization of a cost function as explained below.

With the template created, a new document is annotated, step 23, automatically either at the request of the user or as part of a batch process. In step 25, the automatically annotated document is sent to the terminal 1 so that the user can study, and possibly modify, in step 27, the annotations proposed by the system.

In particular, in the case where the user modifies the annotations, the step 15 of launching the model creation is again executed by integrating into the input data of this creation the new document with its annotations modified by the user. .

Thus, by successive iterations alternating automatic step of model creation, application of the model on a new document and correction of the annotations proposed by the model, the model is refined to reach a level of quality such that no user intervention is necessary. is no longer necessary.

A template creation or learning algorithm particularly well suited to the annotation system described above will now be described.

Mathematically, we consider a model as expressing the conditional law of a label graph Y (annotations) as a function of an observation graph X (the characteristics of multimedia documents).

This conditional law is expressed as the product of exponential families involving nuclei or distances between graphs, namely:

The creation of the model then consists in minimizing the log likelihood of a sample while limiting the number of cores to use.

This log likelihood, which appears as a cost function, is written:

# κty.xi

The selection of the nuclei and the adjustment of the weights α, is done according to the iterative algorithm below. The selected nuclei are then those that minimize the gradient: A- - 1 T λ £ (Xjk (X jx _k) δ _(k yj.y) +] TK (x, .x _k) (f \) (y _k \ x) - δ (y, .y _k ))

The iterative algorithm of selection of the kernels and adjustment of the weights is expressed in pseudo-language:

• generate all the possible kernels on all the annotated documents,

• initialize set A active cores empty,

• as long as the cost C continues to decrease: o select R nuclei randomly among the M existing nuclei, R can be equal to M which corresponds to the selection of all the nuclei, o calculate the gradient at the point αι = 0 of each of the R da, selected nuclei, o add the nuclei whose gradient is

Oa ₁ up to all active nuclei A, o as the weights has _been the nuclei of all active nuclei A continue to evolve

• calculate the cost C and the gradients, da _t

• recalculate the values of the ai weights for h € [A] by a gradient descent method, such as, for example, the quasi- newton method, o as long as

• end as long as

It should be noted that the kernels correspond to the sections of annotated documents and thus constitute the annotation model.

The invention has been illustrated and described in detail in the drawings and the foregoing description. This must be considered as illustrative and given by way of example and not as limiting the invention to this description alone. Many alternative embodiments are possible. For example, the distribution between terminal and learning server may actually correspond to a functional distribution, all the functions of the system being realized on a workstation programmed accordingly. It is also understood that an embodiment corresponds to a software implementation of the annotation method and that thus a computer program product includes instructions such as, executed on a computer, the annotation method is implemented.

However, it may appear that for technical reasons, such as, for example, a search for speed of execution, the method can also be implemented in hardware form, for example, by programming a network of doors of type FPGA (user programmable gate array) or in a combined hardware-software form according to design rules well known to those skilled in the art. In the claims, the word "comprising" does not exclude other elements and the indefinite article "a" does not exclude a plurality.

Claims

CLAIMS, Method for annotating a plurality of multimedia documents, each multimedia document comprising at least one section, and each section comprising at least one characteristic, comprising the steps of:

Manual annotation (11) of at least one of the plurality of documents by assigning to each section of said document at least one annotation class,

Automatic creation (15) of an annotation model defining relations between the annotation classes and the characteristics of the sections, said automatic creation being performed by iterative learning based on the selection of relevant sections,

Automatic annotation (23) of a non-annotated document of the plurality of multimedia documents by application of the annotation model, characterized in that the automatic creation step (15) comprises the sub-steps of:

Selecting (17) the most representative document section of the set of annotated sections and the farthest from the previously selected sections; adjusting (19) weights associated with the different sections;

• deleting (21) sections having weights less than a predetermined value or sections substantially identical to the selected sections.

A method according to claim 1, characterized in that it further comprises a step of manual validation (27) of the automatic annotation, followed by an iteration of at least the automatic model creation step in order to replace the annotation model to account for validated annotations, the steps of automatic creation, automatic annotation and manual validation thus forming an iterative loop for improving the annotation model.

3. Method according to claim 1, characterized in that a relation between a graph representative of the annotation classes and a representative graph of the characteristics of the documents is defined as the product of exponential families based on the sections, and a cost function. is defined as the log likelihood of at least a portion of the sections so that automatic model creation consists of minimizing the cost function by selecting the most representative sections and adjusting the weights.

4. Method according to claim 3, characterized in that the minimization of the cost comprises the following iterative steps:

• creation of a set of active sections initialized empty,

• iteration as long as the cost function decreases substeps:

• generation of all possible sections on annotated documents,

A computer program product comprising program code instructions recorded on a computer readable medium, for carrying out the steps of the method according to any one of claims 1 to 4 when said program is running on a computer.

An annotation system of a plurality of multimedia documents, each multimedia document having at least one section, and each section comprising at least one feature, comprising:

Means (3) for interfacing with a user adapted to manually annotate at least one of the plurality of documents by assigning each section of said document with at least one annotation class,

Means (7) for automatically creating an annotation model defining relations between the annotation classes and the characteristics of the sections, said automatic creation means comprising iterative learning means based on the selection of relevant sections ,

Means (9) for automatic annotation of a non-annotated document of the plurality of multimedia documents by application of the annotation model, characterized in that the automatic creation means comprises means for:

• selection of the document section most representative of all the existing sections and the farthest from the previously selected sections,

• adjustment of weights associated with the different sections,

7. System according to claim 6, characterized in that the interface means are further adapted to manually validate the automatic annotation, and in that the automatic model creation means are adapted to take into account validated annotations.

8. System according to claim 6, characterized in that a relation between a graph representative of the annotation classes and a representative graph document characteristics is defined as the product of section-based exponential families, and a cost function is defined as the log likelihood of at least some of the sections so that automatic model creation consists of minimizing the function cost by selecting the most representative sections and adjusting the weights.