WO2006005715A1 - System for interrogating heterogeneous databases and method for interrogation - Google Patents

System for interrogating heterogeneous databases and method for interrogation Download PDF

Info

Publication number
WO2006005715A1
WO2006005715A1 PCT/EP2005/053248 EP2005053248W WO2006005715A1 WO 2006005715 A1 WO2006005715 A1 WO 2006005715A1 EP 2005053248 W EP2005053248 W EP 2005053248W WO 2006005715 A1 WO2006005715 A1 WO 2006005715A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
interrogation system
interrogation
module
metadomains
Prior art date
Application number
PCT/EP2005/053248
Other languages
French (fr)
Inventor
Christophe-Paul Varoutas
Alain Livartowski
Original Assignee
Institut Curie
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institut Curie filed Critical Institut Curie
Publication of WO2006005715A1 publication Critical patent/WO2006005715A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24522Translation of natural language queries to structured queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Definitions

  • the present invention relates to a system and a method for interrogating heterogeneous databases.
  • systems for managing heterogeneous databases are known especially in the field of medical files.
  • An advantage of the invention is to make it possible to reduce the total cost of acquisition of a system for interrogating heterogeneous databases by reusing the maximum of existing computer infrastructures, both hardware and software.
  • Another advantage of the invention is to make it possible to offer new possibilities of use to the users of a system for interrogating heterogeneous databases while reducing the development and maintenance cost of the system for interrogating heterogeneous databases. Such a gain is possible in terms of interrogation system management time and of human resources with respect to an information system which is perpetually being upgraded within an institution or a firm.
  • Another advantage of the invention is to allow the use of technologies common to the Unix world and to the Windows world, thereby ensuring easy migration from one world to the other. As a result, it is also possible to produce upgrades based on the software components characteristic of these two worlds, and when these components are upgraded, to provide the benefit thereof to the system for interrogating heterogeneous databases of the invention.
  • Another advantage of the present invention is to allow automated generation of the computer interface between a user of the interrogation system of the invention and the mass of the searchable data without the user needs to know the structure of the heterogeneous data or any particular language for managing databases.
  • Another advantage of the invention is to make it possible to profit from a computer interface, in particular of "web” type, which is open to the outside, which interfaces with other tools of the intranet, and which exhibits the simplicity of use that are characteristic of web interfaces.
  • Another advantage of the invention is to make it possible to execute interrogations swiftly on sizeable masses of heterogeneous documents and to perform cross checks on these data in a calculation time which is linear with respect to the mass of data. By virtue of this swiftness of execution, the users may reformulate and/or refine their questions as many times as necessary before saving the queries and their results.
  • Another advantage of the invention is to allow the users to share knowledge of pluridisciplinary type searchable in the system for interrogating heterogeneous databases, thereby facilitating the search for computer solutions on the one hand and the search for solutions to problems characteristic of the business carried out in the firm in which the system for interrogating heterogeneous databases of the invention is installed.
  • the invention relates to a system for interrogating heterogeneous databases of the kind comprising: - a set of databases, all or some of which may be furnished with their own interrogation system,
  • the means for generating an interaction between a user and a plurality of data of the set of databases comprises:
  • the interrogation system comprises a data repository module and a means for connecting to the said data collection at least at copy dates and in that the data repository module comprises: - a relational database part;
  • the invention also relates to a method for interrogating heterogeneous databases, which is characterized by two main phases: a first phase of organization of the data by an administrator and executed on the basis of at least one structural model of the data to be interrogated; a second phase of interrogation of the data by a user and executed on the basis of forms generated as a function of the data organization determined during said first phase of organization of the data.
  • FIG. 1 represents a block diagram in a particular embodiment of the interrogation system of the invention
  • FIG. 2 represents a diagram of the data repository module of the embodiment of the interrogation system of Figure 1 ;
  • Figure 3 represents a diagram of the data structuring unit of the embodiment of the interrogation system of Figure 1 ;
  • Figure 4 represents a metamodel of data suited to a hospital information system in an exemplary application of the interrogation system of Figure 1 ;
  • Figure 5 represents a block diagram of the interrogation unit of the embodiment of the interrogation system of Figure 1 ;
  • - Figure 6 represents a part of a means implemented in the data organization module
  • - Figure 7 represents an embodiment of the interrogation system represented during the execution of an interrogation query by a user.
  • FIG. 1 is a block diagram of a system for interrogating heterogeneous databases according to the invention.
  • the databases 1 pre-exist on the establishment of the interrogation system of the invention. They may also be created or enhanced, updated and undergo any other maintenance operation directly with the aid of the interrogation system of the invention.
  • the database interrogation system of the invention comprises three main modules:
  • a module 2 for depositing data emanating from the collection 1 of databases; - a module 3 for organizing the data so as to execute a structuring according to organization entities as will be explained later;
  • module 4 for interrogation which allows users 5 to use the interrogation system by producing at least one interrogation and by receiving in response a set of interrogation results.
  • the databases 1 may be utilized separately from the interrogation system of the invention by the systems for managing databases which already exist in the state of the art or yet others.
  • the advantage of the invention is to provide a means for federating heterogeneous databases whose cooperation is difficult on account of the differences between the various different data structures, and between the various interrogation languages.
  • the invention will not demand the disappearance of the systems for managing databases, in particular relational databases, which make it possible locally in one of the databases 1 to maintain, according to local constraints, the content and the structure of the data.
  • the interrogation system of the invention comprises the data repository module 2 which makes it possible to retrieve the data in the guise of local copies, or else in the guise of references such as hypertext links.
  • the data thus deposited are at the disposal of the remainder of the interrogation system.
  • the retrieval of the data may be done during periods of non use of the collection of data 1 , for example outside of the working hours in a conventional office organization. This also results in the access of a user of the interrogation system of the invention to data deposited in the organization module 2 not interfering with a modification of a database of the collection of data 1 by a producer.
  • the retrieval of the data is, at least partially, performed in the guise of a reference for each accessible unit of data, such as a reference to the address of a document on a web server.
  • an organization module 3 executes a structuring 7 of the data into organization entities.
  • the organization entities are metadomains and domains which reproduce a characteristic diagram of directed acyclic graphs, which diagram will be described later.
  • Such an organization is conducted by an organization Administrator, such as the Administrator Ad, which has at its disposal a connection RL1 to the organization module 3.
  • Such an organization Administrator can access the data, and especially the data copied into the module 2, so that it is able to carry out at least one initial phase of structuring the data.
  • it has available an interface IG2 for accessing the data repository module 2 and interface IG1 for accessing the organization module 3 so that entities for organizing the data copied are produced as will be described later.
  • the Administrator Ad is connected to the two resources IG1 and IG2 by a computer linkage network RL1 which may be autonomous from the information network of the firm especially if the administration function of the organization module 3 is subcontracted to a specialist outside the entity or firm in which the invention is implemented.
  • the data repository module 2 and/or the data organization module 3 also comprises an entity of server-client type, especially of web type, for accessing data stored in directory trees.
  • the Administrator Ad for organizing the data may access a resource for managing queries set up by the users 5 with the aid of another linkage RL2 so as to be in contact with the users 5 of the entity or firm in which the interrogation system of the invention is implemented.
  • Such an organization Administrator Ad then has available a resource making it possible to cooperate with the users during the organization of the data for the fabrication of the organization entities implemented in the organization module 3 and/or a resource for utilizing interrogation statistics so as to modify the organization entities with a view to optimizing the interrogation of the data repository module 2 in terms of time to set up the query and/or to use the caches as will be described later.
  • An organization Administrator therefore constitutes a technical entity which can manage one or more interrogation systems according to the invention, which are independent or interconnected for example by way of a web connection.
  • the interrogation module 4 which utilizes a form generator and a search metaengine by using structural elements which correspond to the organization entities of the organization module 3.
  • the search metaengine utilizes several search engines to execute at least one query for interrogating the data of the data repository module 2.
  • the structural elements of the query forms are metakeys and primary keys which correspond, during their construction 9, to the structuring according to the acyclic tree described hereinabove.
  • the whole collection of users 5 is interconnected by way of a local network and/or of a tele- informatics network, such as the internet network, or else of an intranet network which allows them access 8 to the interrogation system proper.
  • a tele- informatics network such as the internet network
  • intranet network which allows them access 8 to the interrogation system proper.
  • at least one (not represented) of the user machines 5 is dedicated to the administration of the heterogeneous database interrogation system of the invention, and its role will be set forth subsequently in the text.
  • FIG. 2 Represented in Figure 2 is a particular embodiment of the interrogation system of Figure 1.
  • the data repository module 2 executes, during the operation 6 ( Figure 1 ) a repository of the data of the collection of data 1. Databases and other data, such as references to referenceable data of the data collection 1 , and disposed in the collection of data sets 1 , are deposited in the data repository and organization module 2.
  • the data repository module 2 is composed of two parts which are respectively:
  • the repository of data is executed as is into a data box consisting of a storage space based on hard disks which may be distributed over a computer network of Internet type or Intranet type.
  • the data are therefore not modified or restructured during repository.
  • relational databases 72 among which may be found in particular the databases created by Oracle (registered trademark), Microsoft SQL Server (registered trademark), Sybase, MySQL, etc. applications;
  • the interrogation system of the invention has available a plurality of computer resources capable of transferring the data. Particularly, in the exemplary application of Figure 2 may be found in succession:
  • Each of these applications or methods of importation makes it possible to copy the data contained in the data system 1 to the data repository module 2.
  • the method used to enable the functioning of the interrogation system of the invention will now be described. It describes at least two phases which may be repeated.
  • the method comprises; a first phase of organization of the data by an administrator and executed on the basis of at least one structural model of the data to be interrogated; a second phase of interrogation of the data by a user and executed on the basis of forms generated as a function of the data organization determined during said first phase of organization of the data.
  • a first step of organizing the data is determined on the basis of at least one structural model of the data to be interrogated so as to associate in a bijective manner at least one set of real data with a query element.
  • a second step is executed in the course of which at least one form is generated with form elements associated with query elements.
  • a user when he wishes to conduct an interrogation, a user selects a form and produces on his base at least one query, in particular by selecting and customizing each form element in an arbitrary manner.
  • the user's query is transmitted to a module for analyzing the query as a function of the organization elements so that the interrogation query of the user is decomposed as a function of the query elements.
  • the data organization determined during the first phase is traversed according to a directional scheme which is deduced therefrom and the references of the responses located in the set of data interrogated are then recorded. Once the responses have been recorded, in a fourth step, the responses are then collated so as to be produced to the user in a response form which thereafter generates a final result.
  • the part of the interrogation system of the invention which is paired up with the user essentially comprises a user interface 10 connected by a bidirectional channel 20, 21 , to a query formulation module 11 connected by a bidirectional link 22 to the organization module 3.
  • the data organization module 3 works according to several levels of organization.
  • the upper level of organization is the data metamodel.
  • Each metamodel consists of metadomains, then of query domains which are set up on the basis of the forms generator 11. Metadomains and domains are linked together by virtue of a data model.
  • the query domains are the basic entities hooked up directly with the data sources located in the data repository module 2.
  • the metamodel is a data structure based upon an directed acyclic graph or tree, composed of edges, of nodes and of leaves which appear in particular in Figures 3 and 4.
  • an edge corresponds to a model
  • a node corresponds to a metadomain
  • a leaf corresponds to a domain.
  • a node is a starting point for navigating over the whole of the tree, but is itself inaccessible from other nodes of a deeper level.
  • the starting metadomain is constituted by the "patient" metadomain 40 which will be detailed elsewhere.
  • the metadomains are represented by an oval whereas domains are represented by rectangles.
  • the metamodel of the information system is composed of a graph having four levels.
  • the second level comprises three metadomains 42 of hospitalizations, 43 of pathology samplings, 45 of biochemistry samplings.
  • the information system comprises two domains, namely the domain 41 of consultation reports 44, of radiology reports.
  • the third level comprises a single metadomain and seven domains which are respectively: - the metadomain 46 of visits of the patient to the medical unit;
  • edges of the graph make it possible to associate the metadomain of hospitalizations 42 with the metadomain 46 and the domain 47, the metadomain 43 of pathology samplings with the four domains 49, 51 and 52, the metadomain of biochemistry samplings 45 with the domains 50 and 53.
  • the fourth level of the metamodel of the hospital information system is composed of the three domains linked to the metadomain 46, namely respectively: - the domain 54 of visits;
  • first step of the method of the invention determines the organization of the searchable data, he determines the metadomains from among:
  • the interrogation system uses a means for managing metadomains which allows him to generate the metadomains suitable to his institution; - the entities to which the users of the interrogation system would want to refer systematically any query result if the calculation time so permits.
  • the metadomains depending on no other entity are located at the vertex of the directed acyclic graph representative of a metamodel whereas the other metadomains constitute the nodes of the metamodel thereof defined from the first entity.
  • the administrator of the interrogation system of the invention has available in the interrogation system of the invention a means for creating metamodels which is furnished with a means for hierarchizing the metadomains according to whether they are independent of or dependent on other metadomains.
  • the number of independent metadomains is generally less than five whereas most of the activities of one and the same firm are limited to fifteen dependent metadomains.
  • the domain is the basis entity which makes it possible to directly interrogate data in the collection 1 or indirectly a collection of data deposited in the data repository module 2.
  • Each domain is recorded in the module 3.
  • the domain in the guise of base entity is constituted by a collection of computer objects and is fully autonomous so that it contains all the resources and information necessary for interrogating a data source. It comprises:
  • a resource of connectivity to the data source such as a relational database interrogation system
  • a visual interface allowing a user to formulate questions on the data.
  • Detailed in a diagram in Figure 6 is the articulation between a domain and a metadomain as well as the architecture of the means disposed in particular in the data organization module 3 for organizing the data under the action of an organization Administrator Ad, then for interrogating the data organized under the action of a query of a user 5.
  • the prior analysis of the data of the collection 1 has been carried out so as to form a metamodel of the collection of data 1 seen by the interrogation system, so that a plurality of metadomains is defined with dependence relations disposed according to a directed acyclic graph recorded in a suitable memory of the data organization module 3.
  • Each node of the directed acyclic graph 90 for which a dependence relation is descendent constitutes a metadomain, whereas a domain is constituted by a terminal node of the directed acyclic graph 90.
  • each node of the tree recorded in the memory 90 of the data organization module 3 can be represented by:
  • metadomain 91 defined by a name ".”Name” and a metakey "::Metakey” denoted by a relation in the tree referred to by m k ;
  • Each domain 92 is thus as indicated above coupled to a resource of connectivity "::Connectivity” 93, a resource of visualization of the connections “ ⁇ Visualization” 94 and a resource for pointing to the data model "::Model” 95.
  • a first bijective application 99 for analysis which is traversed when a query element is returned from the query module 4 so as to arrive at the address of a document or of any data unit referenced by the interrogation system by virtue of the domain 97 and the domains management resource 92;
  • a domain corresponds to a relational table or else to a main relational table and to several secondary relational tables for example having relations from 1 to N or from N to 1 (in the case of a thesaurus) with the main table.
  • the interrogation system of the invention comprises a means for generating domains, that is to say computer objects comprising:
  • the model is an entity which makes the link between the concrete data of the domains and the various implications of metadomains.
  • a model comprising essentially four components which vary according to the type of model, but whose principle remains similar. The four components are:
  • - pk the collection of primary keys of the domain
  • - mk the corresponding collection of metakeys
  • a means for managing the models makes it possible to create, to update, to manage the aforesaid components of a model in such a way as to configure a bijective relation which makes it possible to point from a metadomain to a domain. Subsequently, in a higher order of organization, a particular model of data is integrated with its pairs of bijective relations.
  • a means for managing the models makes it possible to create, to update, to manage the aforesaid components of a model so as to configure a bijective relation which makes it possible to point from a metadomain to another metadomain in a model, that is to say to configure a particular edge of the directed acyclic graph going from a metadomain (represented by an oval in Figure 4) to another metadomain (represented by an oval in Figure 4).
  • the system for interrogating heterogeneous data of the invention associates several tools which give the user thereof the means for centralizing, organizing and/or interrogating the data disposed in the data repository module 2 or directly in the set of data 1.
  • the database interrogation system of the invention comprises one or more tools, among which are included :
  • a processing unit devised as a man-machine interface 4 comprising a visual interface of web type, with interrogation forms and results presentation arrays, as well as a module for formulating queries (GEN_FORM; Figure 5), which makes it possible to translate the queries of the users into various computer languages on the one hand for circulating the queries among the search engines capable of interrogating the resources of the data module 1 or data repository module 2, on the one hand, and into human language on the other hand in particular to produce the responses destined for the user;
  • MM queries metaengine
  • the heterogeneous database interrogation system of the invention is intended to interface in an Intranet allowing a firm, or a group of firms, to carry out all the necessary management operations on the collection of heterogeneous data files contained in the data repository module.
  • the heterogeneous data interrogation system of the invention is intended to be interfaced with the outside world by way of the Internet by means of the management of one or more websites.
  • the data repository module 2 affords several advantages to the system of the invention:
  • each group of copied data preserves its structure defined by its initial conceptual data model which is specific to it without it being necessary to constrain it to a single model as is the case in other heterogeneous data management systems;
  • the maintenance of the data is facilitated since the specialists of the various production systems recognize the conceptual model of their own data in the data box 2 thus allowing maintenance or updating for example by means of the feed system.
  • the data repository module 2 constitutes moreover a portal for accessing the data in so far as the grouping of the data may be real or virtual. Depending on requirements, it is possible either to copy the data or else to maintain them in the production system so as to consult them and extract them at the moment of interrogation.
  • the data organization module 3 moreover constitutes a portal for accessing the data in so far as it contains associated with the domains (92; Figure 6) the resources of connectivity (93; Figure 6) which make it possible to access in an organized manner the data searchable by the system of the invention.
  • a web client is then apt for activating the connectivity resources so that the data associated with each domain interrogated directly, for example with the aid of an interface IG 1
  • Figure 1 by the Administrator Ad or anther equivalent user, may be presented on a visualization console.
  • the repository of the data in the data repository module 2 ( Figure 1 ) is not an obligatory measure.
  • the choice of copying the data present in the production system or of carrying out a data repository presents various advantages or drawbacks which are as follows:
  • the heterogeneous database interrogation system of the invention thereafter comprises a means for feeding (M_COPY; Figure 5) the data repository unit or module 2 which may be performed by means of a scheduler (PLAN-REC; Figure 5) at regular intervals, for example every day or every week, in the following manner:
  • the data repository module 2 is based on a relational database interrogation system such as a mySQL system which has the advantage of being widespread while preserving a high degree of performance and reliability.
  • the data repository unit 2 Periodically, according to a periodicity determined preferentially by a copy scheduler PLAN-REC associated with the repository module 2, the data repository unit 2 is connected to the data collection 1 by a tele-informatics network not represented in Figure 1.
  • a network comprises an Intranet type network on the one hand and, as appropriate a network for accessing remote web resources on an Internet network, on the other hand.
  • the basic or original data are recorded in databases constituted in the data collection 1.
  • the original data are copied with the aid of a data copying means M_COPY.
  • the means for copying M_COPY copies the data either physically in the guise of a physical copy into a physical copy memory REC-PHY, or in the guise of pointers or hypertext links in a virtual copy zone REC_VIR.
  • the operation of copying, of feeding, or of depositing, of data utilizes a plurality of computer resources which are described elsewhere.
  • a user (not represented) activates a forms generator GEN_FORM which activates a metaengine MM described elsewhere which then activates the data repository module 2.
  • the forms generator GEN-FORM interrogates the virtual copying space REC_VIR and/or the physical copying space REC_PHY of the data repository module 2 as well as a cache memory of queries CR which contains the previous stored searches.
  • a query form is transmitted by way of the structuring unit 3 which will be described elsewhere so that, if the physical data are requested, they are retransmitted directly to a responses generator GEN_REP of the processing unit 4.
  • the data repository module 2 produces a connection to the data collection 1 or else to a website determined by the pointer addressed in the virtual copy memory REC_VIR and the data fetched are then forwarded, via the data organization module 3, directly from the data collection 1 to the responses generator GENMREP of the processing unit 4.
  • the response to the query produced by the forms generator GEN_FORM is then compiled with the earlier responses obtained and recorded in the queries cache memory CR, and the final response is made accessible to the user by means of the response generator GEN_REP in the guise of a visualization screen, of a web page, or else of a printed representation as is known in the state of the art.
  • the organization module 3 comprises a means for organizing and a means for arranging the data contained in the data repository module 2 or directly in the data collection 1 ;
  • the organization module 3 comprises a search engine for executing queries on all the data regardless of their type, be it text, structured or unstructured data;
  • the organization module 3 also comprises a means for generating visual interfaces intended to allow the user to formulate queries and to utilize their response;
  • the organization module 3 comprises a tool for creating links between the data and external applications, in particular statistical tools or other systems for navigation or post- processing of data.
  • the data organization means or structuring unit 3 comprises a means for generating, above the pre-existing structure of the data, an upper organization layer.
  • Such an organization layer is of object type, and it is structurally separated from the subjacent relational layer.
  • the means for creating the organization layer comprises a means for executing the grouping of one or more tables of queries into query domains as well as a means for creating multiple links between the query domains and their grouping into entities called metadomains.
  • a query domain is an entity of logical grouping of data corresponding to one or more tables disposed in the memories REC_PHY and REC_VIR of the data repository module 2.
  • the means for generating a query domain comprises a means for grouping data, for the explicit realization of its data and a means for annotating the data.
  • a domain groups together, in addition to the data proper, metadata which describe the nature of the information:
  • a domain administrator utilizes a computer connected RL2 ( Figure 1 ) to the tele-informatics network of the kind of the Administrator Ad ( Figure 1). Such an administrator assists the users connected to the tele-informatics network for the defining of the domains a visual administration interface IG1 for helping to class and generate the domains as a function of the recommendations of the users.
  • a domain administration means which may be managed by the domain administrator comprises:
  • domains and the types of objects that a domain comprises (resources of connectivity, visualization; model, primary key);
  • the generator for assisting with the creation of domains comprises a means for carrying out a grouping of the data and of the domains, which grouping is identical to that of the production systems.
  • the data have a meaning different in the limit a semantics dependent on their provenance, according to the domain specified.
  • the term “necrosis” does not signify the same thing when it is used clinically, in pathology or in radiology. It follows that the term “necroses” steers the search as a function of the domain which was specified to establish the query.
  • An advantage of the present invention is to make it possible to create domains while the systems for managing relational databases are heterogeneous, local or remote in the collection of data 1.
  • the data organization module 3 comprises means for interrelating data present invariably in the original collection of data 1 or in the data repository module 2.
  • the data organization module 3 also comprises a means for managing metadomains.
  • a metadomain within the sense of the invention is an object which makes it possible to link entities which are located in the data repository module 2, which exhibits one and the same semantics while stemming from various data production systems. These entities, which may originate from diverse database fields are apt to be found in different query domains.
  • the data entities which serve to compose a metadomain receive a co-relation beyond their heterogeneous relocation or their heterogeneous origin.
  • the metadomains management means comprises a first means for identifying common entities and declaring in the data repository module 2 or in the collection of data 1 in the guise of semantic entity intended to receive a co- relation.
  • the first means for identifying of common entities comprises a means for inputting common entity and a means for declaring of a co-relation between several common entities inputted into a memory means provided for this purpose in the data repository module 2 by the common entity input means.
  • the metadomains management means thereafter comprises a second means for individually identifying in each of the domains, which may have been designated in the domain management means, the entity having the semantics defining the co-relation regardless of its assigned name or the data typing.
  • each patient is identified by a unique file number.
  • the various database production systems comprise tables each of which comprises a column containing the entity characteristic of the unique file number.
  • the second means of the means for managing metadomains makes it possible to individually identify the domains relevant to the same semantics.
  • the second means for individually identifying of domains makes it possible to declare a semantic unit: here the "patient” and thereafter to declare the key making it possible to identify a patient: the "file number”. Thereafter, when the interrogation system of the invention makes it possible to work in interrogation mode as was described elsewhere.
  • the second means for managing metadomains traverses the various domains already defined so as to retrieve the corresponding entities, in the present case "Numfile", "N_file” etc. and they are declared as having the same semantics as the newly defined metakey.
  • the semantic unit "patient” thus groups together several domains managed by the structuring unit 4 regarding the adopted criterion of the "file number”.
  • the data organization module 3 comprises moreover a means for managing metakeys which makes it possible in particular to produce metadomains by grouping together several query domains.
  • the metakey is constituted by "N_file” whereas the metadomain is constituted by "patient”.
  • the concept of metakey used by the present invention obeys a certain number of rules which are now described.
  • the metakey takes into account the whole collection of relational tables relevant to the co-relation with which it is associated and it is not necessary for its field name to be identical nor for its data typing to be the same.
  • the metadomains are produced on the basis of a public-domain database-independent interface and may group together relational tables of heterogeneous relational database management systems, such as Oracle, Access, etc. (registered trade marks).
  • the metakeys management means cooperates with a means for creating the links.
  • the metakey generated by the metakeys management means is a foreign key creating relations both between the local and remote data resources, by means of Intranet type or Internet type connections.
  • the metakey and its management means cooperate with a means for creating multiple relations without physical modification of the structure of the data contained in the data repository module 2, whereas all this information is stored in the object layer, which is located above the relational physical layer.
  • the metadomain associated with a metakey generated by the means for managing metakeys of the invention is constituted as a "object oriented" structure making it possible to create links between relational entities of the heterogeneous relational database interrogation system, installed on heterogeneous platforms, remote or local.
  • the metadomains management means of the invention thus makes it possible to enrich the relational model by adding thereto the benefits of the "object" model.
  • the system of the invention in a particular embodiment, therefore comprises a means for constructing networks whose input is connected to the metadomains management means of the invention and to the associated means for managing metakeys, to construct a complex network of relations and of contexts between relational entities.
  • the hospital information system makes it possible to monitor the medical treatment of patients who are afflicted with tumours and who are hospitalized in care units.
  • the users of the interrogation system of the invention, who are members of the care staff utilize the aforesaid means to define various entities in the guise of metakeys grouping together the same data in several ways as various metadomains.
  • Patient specification or identification data make it possible to establish a metadomain characteristic of the patient, because numerous data pertaining thereto may be interrelated, this having been previously described as a co-relation.
  • the data specifying a tumour of a patient makes it possible to establish a characteristic metadomain by knowing that the patient may exhibit several different tumours and that it is not always easy to link the successive data or elements to a tumour, especially when the patient exhibits several tumours, synchronous or successive.
  • the characteristic data of a sample or biopsy performed on a patient make it possible to generate a metadomain.
  • Data describing a hospitalization, taken as a particular care episode make it possible to generate a new metadomain.
  • the interrogation system of the invention comprises a means for calculating a relational entity defining a metadomain associated with a metakey characteristic of a care episode.
  • a means for managing metadomains constitutes a means for managing virtual metadomains.
  • the interrogation system of the invention comprises means for managing various metadomains:
  • such a management means makes it possible to track and group together the relations between entities pertaining to the patient, a biological sampling or hospitalization;
  • such a management means cooperates with a calculation means , in particular a calculation means able to effect a consolidation of data acquired successively over time according to sequences making it possible to normalize the numerical data with the aid of an operation of calculation as a function of programming criteria determined by the user of the interrogation system of the invention.
  • the "hospitalization" metadomain is defined by a hospitalization number, which serves as metakey, to which may be attached: - administrative information: acts and mode of entry, hospitalization unit, etc.;
  • the embodiment described herein below uses software resources constituted by classes of objects which are distributed into two categories: - a first class of objects related to the structural organization of the data;
  • the metamodel is constituted by an object of class EPI::metamodel for referencing an object according to the semantics of the C++ computer language or else of the Java computer language.
  • object class contains another public-domain object class 'DAG 1 , the container of this public object inheriting all the basic functionalities for managing an directed graph.
  • the vertex, the nodes and the leaves of the metamodel represented by the acyclic directed graph generated contain simply the identifiers of the metadomains or domains to which they make reference.
  • the means for generating a metamodel comprises a means for managing metadomains and a means for calling an object of class EPI::MetaDomain which contains all the metadata related to the metadomain embodied, such as for example the explicit name of the metakey.
  • the means for generating a metamodel comprises a means for managing metadomains of nodes and for calling an object of class EPI::Model which contains the data model, a particular data structure modelling the bijective relation between its own metakeys and that of the parent metadomains as has been explained elsewhere.
  • the means for generating a metamodel comprises a means for managing domains and for calling a plurality of object classes which are respectively:
  • the concerned object class uses a public-domain database-independent interface object class which ensures connectivity compatible with a certain number of relational database management systems such as Oracle, Sybase, etc.,
  • an object of class EPI::lndex::* for interrogating fields in the data deposited in the copying unit 2 and that the SQL system may not interrogate effectively and these object class will be discussed later on;
  • an object of class EPI::Form which comprises:
  • the relational table contains fields whose content may not be interrogated effectively with an SQL type engine.
  • the collection of classes EPI::lndex::* comprises a means for creating specialized indices and means for effectively interrogating a field non searchable with the aid of an SQL engine.
  • a class EPI::lndex::Regex comprises means for indexing, and for interrogating a field containing text by using a language for calculating regular Regex expressions.
  • the means for interrogating a field containing text and contained in the class EPI::lndex::Regex cooperates with a cache memory system for supporting sizeable likes in load.
  • the means for interrogating a field containing text can perform searches for phrases or for textual patterns in relation in particular with the definition of one or more domains, as has already been specified.
  • the second class of objects makes it possible to embody a means for generating a visual interface as has been described elsewhere.
  • the means for generating a visual interface of this particular embodiment comprises an object of class EPI::Form which executes or implements a means for generating a query form making it possible to interrogate a consistent collection of data, structured by the data structuring unit 3.
  • the object of class EPU::Form comprises a means for calling or generating a plurality of elements of forms of a specific field of the domain generated by an object of class EPI::FormElement which is responsible for the interrogation and/or for the displaying of a specific field of the domain.
  • Each object of class EPI::FormElement comprises: - a means for interrogating one or more types of relational data (types including numerical, date, textual field, etc.);
  • - a means for interrogating the specialized indices created by the interrogation system for the interrogation of a field, for example a textual index searchable with the index object class EPI::lndex::Regex; - a means for graphically representing a form element on a visual interface, such as an "html" page;
  • FIG. 7 is a form of the interrogation system of the invention when it is utilized by a user 5 ( Figure 1 ).
  • the user 5 utilizes a visual interface from a computer station connected to the computer network of the interrogation system.
  • the visual interface 100 addresses a form generator which is fed from a memory 105 of form elements which was previously built on the one hand during the configuration of the interrogation system and on the other hand during a phase of administration with the aid of the organization module 3, by the Administrator Ad ( Figure 1 ) on the basis of the model of the data adopted by the administrator.
  • the query built with the aid of the forms generator 101 is then addressed to the metaengine 102 as has been described previously and the built query is addressed to a set of caches 84 and/or to a set of query engines 103 as has previously been set forth with the aid of Figure 5.
  • the query is then utilized by the organization module 3 by its User part 105 so that the organization of the data in the guise of a directed acyclic graph effected by the organization into organization entities is then applied to the query in the guise of traversals of the nodes of the tree according to the bijective applications of query descent going from the metadomains to the domains, then of backtracking of the responses going from the domains to the metadomains until a response is offered in the guise of a list of references of documents or of other data units which are referenced or else in the data repository module 2 or else in the data set 1.
  • the response is then made available both in the set of caches 104 and at the level of the visual interface 100.

Abstract

The present invention relates to a system and a method for interrogating heterogeneous databases. According to the invention, the original set (1) of heterogeneous databases is decoupled from the interrogation system by means of: • a data organizer module (3) for associating the data by reference to a plurality of domains and a plurality of metadomains; • a module (4) for interrogation and for producing response data which comprises a search metaengine for managing query forms and producing response data.

Description

"System for interrogating heterogeneous databases and method for interrogation"
The present invention relates to a system and a method for interrogating heterogeneous databases. In the state of the art, systems for managing heterogeneous databases are known especially in the field of medical files. Reference may be made in particular to patent W0-A1 -03/065251 which allows the exchange of information between heterogeneous databases by means of the manipulation of a semantic equivalence operator. In this state of the art, we search for semantic equivalences on concepts described in the data of two distinct bases. The aim of this equivalencing operation makes it possible to work on queries at various levels of granularity so that a query from among several databases can reconstruct a consistency between records of data of different types.
Moreover, the management of very large databases, and of very large heterogeneous databases has been an important topic of development for several years and several publications mention these developments. In document US-A-2002/0156756, in the name of Stanley, is described a method and a structure of data of molecular objects which make it possible to manage heterogeneous data environments. In this document, the described software application makes it possible to unify the presentation to the user of various data. Described is a collection of procedures which make it possible to carry out:
- management of the state of the activity and of the content in terms of persistent data;
- a functional integration and an exchange of data between the databases, applications and the various components and interfaces.
Such a solution requires that the whole collection of data be recomposed, while the invention is aimed precisely at processing existing data and preserving systems thereof for managing a base of preexisting data, which systems are not directly modified by the invention.
In document US-A-2003/176929, in the name of Gardner, is described a user interface which comprises a source collection of data and a processing section which allows a user to select a processing which depends on results of searching among the data in such a way as to produce a processing result. In this state of the art, the heterogeneity of the data is a factor which reduces its capacity and the effectiveness of search processing.
In document US-A-6, 009,422, in the name of Cicarelli, is described a method and a system for executing a semantic translation or a search translation which uses a generalized search language in a plurality of heterogeneous databases. In the document, the effectiveness depends on the generalized grammar which would make it possible to translate into phrases contained in a generalized search language, the invention circumventing this.
In document US-A-5,652,403, in the name of Lin, is described a method and a system for rewriting databases without reducing the capacity to utilize the database during the rewriting thereof. However, this document leads to a modification of the heterogeneous databases to allow their unified interrogation. The advantage of the present invention is to make it possible to preserve the original databases in the state in which they are and at the same time to allow their modification without interfering with the users' interrogation capabilities.
Such a state of the art only very partially addresses the problem of the utilization of heterogeneous databases. Specifically, databases of large size lead to the formulation of a considerable number of queries or even of classes of different queries. The system for interrogating heterogeneous databases, presented in document US-A-5,652,403, does not allow flexible interrogation by people who do not know the structure of the databases used.
The present invention also affords an effective solution to this problem of the state of the art. An advantage of the invention is to make it possible to reduce the total cost of acquisition of a system for interrogating heterogeneous databases by reusing the maximum of existing computer infrastructures, both hardware and software.
Another advantage of the invention is to make it possible to offer new possibilities of use to the users of a system for interrogating heterogeneous databases while reducing the development and maintenance cost of the system for interrogating heterogeneous databases. Such a gain is possible in terms of interrogation system management time and of human resources with respect to an information system which is perpetually being upgraded within an institution or a firm.
Another advantage of the invention is to allow the use of technologies common to the Unix world and to the Windows world, thereby ensuring easy migration from one world to the other. As a result, it is also possible to produce upgrades based on the software components characteristic of these two worlds, and when these components are upgraded, to provide the benefit thereof to the system for interrogating heterogeneous databases of the invention. Another advantage of the present invention is to allow automated generation of the computer interface between a user of the interrogation system of the invention and the mass of the searchable data without the user needs to know the structure of the heterogeneous data or any particular language for managing databases.
Another advantage of the invention is to make it possible to profit from a computer interface, in particular of "web" type, which is open to the outside, which interfaces with other tools of the intranet, and which exhibits the simplicity of use that are characteristic of web interfaces.
Another advantage of the invention is to make it possible to execute interrogations swiftly on sizeable masses of heterogeneous documents and to perform cross checks on these data in a calculation time which is linear with respect to the mass of data. By virtue of this swiftness of execution, the users may reformulate and/or refine their questions as many times as necessary before saving the queries and their results. Another advantage of the invention is to allow the users to share knowledge of pluridisciplinary type searchable in the system for interrogating heterogeneous databases, thereby facilitating the search for computer solutions on the one hand and the search for solutions to problems characteristic of the business carried out in the firm in which the system for interrogating heterogeneous databases of the invention is installed.
To solve the aforesaid problems and afford the stated advantages, the invention relates to a system for interrogating heterogeneous databases of the kind comprising: - a set of databases, all or some of which may be furnished with their own interrogation system,
- a means for generating an interaction between a user and a plurality of data of the set of databases.
According to the invention, the means for generating an interaction between a user and a plurality of data of the set of databases comprises:
• a data organizer module for associating the data by reference to a plurality of organization entities;
• a module for interrogation and for producing response data which comprises a means for generating query forms and producing response data. According to an aspect of the invention, the interrogation system comprises a data repository module and a means for connecting to the said data collection at least at copy dates and in that the data repository module comprises: - a relational database part;
- a treelike data part.
Finally the invention also relates to a method for interrogating heterogeneous databases, which is characterized by two main phases: a first phase of organization of the data by an administrator and executed on the basis of at least one structural model of the data to be interrogated; a second phase of interrogation of the data by a user and executed on the basis of forms generated as a function of the data organization determined during said first phase of organization of the data.
Other advantages and characteristics of the present invention will be better understood with the aid of the description and the appended figures among which: - Figure 1 represents a block diagram in a particular embodiment of the interrogation system of the invention;
- Figure 2 represents a diagram of the data repository module of the embodiment of the interrogation system of Figure 1 ;
- Figure 3 represents a diagram of the data structuring unit of the embodiment of the interrogation system of Figure 1 ;
- Figure 4 represents a metamodel of data suited to a hospital information system in an exemplary application of the interrogation system of Figure 1 ;
- Figure 5 represents a block diagram of the interrogation unit of the embodiment of the interrogation system of Figure 1 ;
- Figure 6 represents a part of a means implemented in the data organization module; - Figure 7 represents an embodiment of the interrogation system represented during the execution of an interrogation query by a user.
Represented in Figure 1 is a block diagram of a system for interrogating heterogeneous databases according to the invention. In Figure 1 , the databases 1 pre-exist on the establishment of the interrogation system of the invention. They may also be created or enhanced, updated and undergo any other maintenance operation directly with the aid of the interrogation system of the invention.
The database interrogation system of the invention comprises three main modules:
- a module 2 for depositing data emanating from the collection 1 of databases; - a module 3 for organizing the data so as to execute a structuring according to organization entities as will be explained later;
- a module 4 for interrogation which allows users 5 to use the interrogation system by producing at least one interrogation and by receiving in response a set of interrogation results.
According to an essential aspect of the invention, the databases 1 may be utilized separately from the interrogation system of the invention by the systems for managing databases which already exist in the state of the art or yet others. In particular, the advantage of the invention is to provide a means for federating heterogeneous databases whose cooperation is difficult on account of the differences between the various different data structures, and between the various interrogation languages. However, the invention will not demand the disappearance of the systems for managing databases, in particular relational databases, which make it possible locally in one of the databases 1 to maintain, according to local constraints, the content and the structure of the data.
In order to allow utilization which does not interfere with the functioning of such independent management systems of the invention, the interrogation system of the invention comprises the data repository module 2 which makes it possible to retrieve the data in the guise of local copies, or else in the guise of references such as hypertext links. The data thus deposited are at the disposal of the remainder of the interrogation system. The retrieval of the data may be done during periods of non use of the collection of data 1 , for example outside of the working hours in a conventional office organization. This also results in the access of a user of the interrogation system of the invention to data deposited in the organization module 2 not interfering with a modification of a database of the collection of data 1 by a producer. In an embodiment, the retrieval of the data is, at least partially, performed in the guise of a reference for each accessible unit of data, such as a reference to the address of a document on a web server. When the data are deposited in the data copying or repository module 2, an organization module 3 executes a structuring 7 of the data into organization entities. In an embodiment of the invention, the organization entities are metadomains and domains which reproduce a characteristic diagram of directed acyclic graphs, which diagram will be described later. Such an organization is conducted by an organization Administrator, such as the Administrator Ad, which has at its disposal a connection RL1 to the organization module 3. Such an organization Administrator can access the data, and especially the data copied into the module 2, so that it is able to carry out at least one initial phase of structuring the data. To this end, it has available an interface IG2 for accessing the data repository module 2 and interface IG1 for accessing the organization module 3 so that entities for organizing the data copied are produced as will be described later. The Administrator Ad is connected to the two resources IG1 and IG2 by a computer linkage network RL1 which may be autonomous from the information network of the firm especially if the administration function of the organization module 3 is subcontracted to a specialist outside the entity or firm in which the invention is implemented.
According to an aspect of the invention, the data repository module 2 and/or the data organization module 3 also comprises an entity of server-client type, especially of web type, for accessing data stored in directory trees.
In another embodiment, the Administrator Ad for organizing the data may access a resource for managing queries set up by the users 5 with the aid of another linkage RL2 so as to be in contact with the users 5 of the entity or firm in which the interrogation system of the invention is implemented. Such an organization Administrator Ad then has available a resource making it possible to cooperate with the users during the organization of the data for the fabrication of the organization entities implemented in the organization module 3 and/or a resource for utilizing interrogation statistics so as to modify the organization entities with a view to optimizing the interrogation of the data repository module 2 in terms of time to set up the query and/or to use the caches as will be described later. An organization Administrator therefore constitutes a technical entity which can manage one or more interrogation systems according to the invention, which are independent or interconnected for example by way of a web connection. Once the data have been deposited, the access by a user
5 to the repository data in the module 2 is done by way of the interrogation module 4 which utilizes a form generator and a search metaengine by using structural elements which correspond to the organization entities of the organization module 3. The search metaengine, as will be described later, utilizes several search engines to execute at least one query for interrogating the data of the data repository module 2. When the organization entities are metadomains and domains, the structural elements of the query forms are metakeys and primary keys which correspond, during their construction 9, to the structuring according to the acyclic tree described hereinabove. In an embodiment, the whole collection of users 5 is interconnected by way of a local network and/or of a tele- informatics network, such as the internet network, or else of an intranet network which allows them access 8 to the interrogation system proper. Especially, at least one (not represented) of the user machines 5 is dedicated to the administration of the heterogeneous database interrogation system of the invention, and its role will be set forth subsequently in the text.
Represented in Figure 2 is a particular embodiment of the interrogation system of Figure 1.
In Figure 2, the same elements as those of Figure 1 bear the same reference numbers. The data repository module 2 executes, during the operation 6 (Figure 1 ) a repository of the data of the collection of data 1. Databases and other data, such as references to referenceable data of the data collection 1 , and disposed in the collection of data sets 1 , are deposited in the data repository and organization module 2. In a particular embodiment, the data repository module 2 is composed of two parts which are respectively:
- a relational database part 80;
- a treelike data part In a particular embodiment, the relational database part 80 of the repository module 2 dialogues with a relational database interrogation system of SQL type implemented in a module 84 belonging to the data organization unit 3. The treelike data part 81 dialogues with a file tree server system using a web server technology such as "Apache", "IIS", etc, and implemented in a module 85 belonging to the data organization unit 3. Within the framework of the invention, the treelike data are data or data units structured as a tree as is known in the state of the art.
According to the invention, the repository of data is executed as is into a data box consisting of a storage space based on hard disks which may be distributed over a computer network of Internet type or Intranet type. The data are therefore not modified or restructured during repository.
Continuing with the description of Figure 2, it may be seen that the collection of original data 1 comprises four different types of databases which are in particular:
- documentary databases 70 among which may be found in particular databases created by the Lotus Notes
(registered trademark) application;
- relational databases 72 among which may be found in particular the databases created by Oracle (registered trademark), Microsoft SQL Server (registered trademark), Sybase, MySQL, etc. applications;
- databases consisting of websites located on Intranet networks or on an Internet network 74; and
- data 79 recorded in file trees such as office documents and which are located for example on storage volumes shared or hooked up by a tele-informatics network on which are connected various machines of the users of the interrogation system of the invention. To carry out the repository or the feeding of the data box
80 and/or 81 of the data repository module 2, the interrogation system of the invention has available a plurality of computer resources capable of transferring the data. Particularly, in the exemplary application of Figure 2 may be found in succession:
- an application of "middleware" type such as Notrix (registered trademark), which makes it possible to connect the documentary databases 70 to the relational part 80 of the data box 2; - a middleware application such as Datastage
(registered trademark), which makes it possible to connect the relational databases 72 to the relational part 84 of the data box 2.
- a method for importing websites 75 which makes it possible to transfer the websites 74 into the relational part 80 of the data box 2;
- a method for importing websites in treelike form into the treelike part 81 of the data box 2;
- a method for importing treelike data into the treelike part 81 of the data box 2; - a method of importing treelike data of web type from the treelike part 81 of the data box 2 to the relational part 80 of the same data box 2;
- a method 83 of importing treelike data of textual type from the treelike part 81 of the data box 2 to the relational part 80 of the data box 2.
Each of these applications or methods of importation makes it possible to copy the data contained in the data system 1 to the data repository module 2.
The reproduction of the structure of the initial data in the data collection 1 ensures, with the aid of the relational database interrogation system 84, useful and effective utilization of the data thus structured. There is therefore no loss of information in the data box, nor of data which does not succeed in finding its place in a unified data model.
Moreover, the identity of structure between the production system and the interrogation system, which identity will be studied later, facilitates the comprehension and hence the formulation of the queries.
On account of the copying into the data repository module 2, the feeding of data is decoupled from the organization of the collection of data in a conceptual model. Now, a unified model, invented after the production of the data of a determined production system, functions correctly only in theory. In practice, it is necessary to perform alterations which mobilize at least one administrator of each database of the collection of databases 1.
Moreover, the split between data repository, on the one hand, and organization of the data, on the other hand, leads to referring the interrogation task to an independent tool, which does not intervene during the data feed phase. It is thus possible, and this is essential in the case of a firm or a group of firms having numerous heterogeneous data management systems, to modify one or the other or the data models during their own utilization, without the function of interrogation of the whole collection of data 1 being disrupted by such a modification.
In another embodiment, it is possible not to have a data repository module 2, but conversely to work directly on the data collection 1.
The method used to enable the functioning of the interrogation system of the invention will now be described. It describes at least two phases which may be repeated. The method comprises; a first phase of organization of the data by an administrator and executed on the basis of at least one structural model of the data to be interrogated; a second phase of interrogation of the data by a user and executed on the basis of forms generated as a function of the data organization determined during said first phase of organization of the data. In the first phase of organization, a first step of organizing the data is determined on the basis of at least one structural model of the data to be interrogated so as to associate in a bijective manner at least one set of real data with a query element. Next, a second step is executed in the course of which at least one form is generated with form elements associated with query elements.
In the second phase, when he wishes to conduct an interrogation, a user selects a form and produces on his base at least one query, in particular by selecting and customizing each form element in an arbitrary manner. Next, in a second step, the user's query is transmitted to a module for analyzing the query as a function of the organization elements so that the interrogation query of the user is decomposed as a function of the query elements. In a third step, the data organization determined during the first phase is traversed according to a directional scheme which is deduced therefrom and the references of the responses located in the set of data interrogated are then recorded. Once the responses have been recorded, in a fourth step, the responses are then collated so as to be produced to the user in a response form which thereafter generates a final result.
Referring now to Figure 3, the functioning of a particular embodiment of the data organization module 3 will now be described. The part of the interrogation system of the invention which is paired up with the user essentially comprises a user interface 10 connected by a bidirectional channel 20, 21 , to a query formulation module 11 connected by a bidirectional link 22 to the organization module 3. The data organization module 3 works according to several levels of organization. The upper level of organization is the data metamodel. Each metamodel consists of metadomains, then of query domains which are set up on the basis of the forms generator 11. Metadomains and domains are linked together by virtue of a data model. The query domains are the basic entities hooked up directly with the data sources located in the data repository module 2.
The metamodel is a data structure based upon an directed acyclic graph or tree, composed of edges, of nodes and of leaves which appear in particular in Figures 3 and 4. In the metamodel, an edge corresponds to a model, a node corresponds to a metadomain and a leaf corresponds to a domain. Each metamodel implemented in the interrogation system of the invention, and each time it is implemented in a particular application of the invention, is a particular class of directed acyclic graph of type "DAG" which contains at least one vertex. A node is a starting point for navigating over the whole of the tree, but is itself inaccessible from other nodes of a deeper level. Referring now to Figure 4, an implementation of a simplified metamodel applied to a hospital medical information system will be described.
The starting metadomain is constituted by the "patient" metadomain 40 which will be detailed elsewhere. In Figure 4, the metadomains are represented by an oval whereas domains are represented by rectangles. The metamodel of the information system is composed of a graph having four levels. After the starting level, the second level comprises three metadomains 42 of hospitalizations, 43 of pathology samplings, 45 of biochemistry samplings. In the second level, the information system comprises two domains, namely the domain 41 of consultation reports 44, of radiology reports. The third level comprises a single metadomain and seven domains which are respectively: - the metadomain 46 of visits of the patient to the medical unit;
- the domains 47 of hospitalization reports, of microscope slides 48, of tumour library 49, of diagnosis 55 and of flow cytometry reports 51 , of pathology reports 52 and haematology library 53.
The edges of the graph make it possible to associate the metadomain of hospitalizations 42 with the metadomain 46 and the domain 47, the metadomain 43 of pathology samplings with the four domains 49, 51 and 52, the metadomain of biochemistry samplings 45 with the domains 50 and 53.
The fourth level of the metamodel of the hospital information system is composed of the three domains linked to the metadomain 46, namely respectively: - the domain 54 of visits;
- the domain 55 of diagnoses;
- the domain 56 of acts.
When the administrator, during the first phase, first step of the method of the invention, determines the organization of the searchable data, he determines the metadomains from among:
- the most important entities that the firm which would utilize the interrogation system according to the invention manipulates during the activity thereof. For example in a bank essential entities are the customer, the bank account, the branches, etc. In a manufacturing plant, important entities are the products made, the suppliers, the customers, etc. In an anti cancer centre, important entities are the patient, the hospitalization, the tumour to be cured, etc. The administrator of the interrogation system of the invention uses a means for managing metadomains which allows him to generate the metadomains suitable to his institution; - the entities to which the users of the interrogation system would want to refer systematically any query result if the calculation time so permits.
It is possible to distinguish between two types of organization entities according to whether or not it depends on another essential entity. The metadomains depending on no other entity are located at the vertex of the directed acyclic graph representative of a metamodel whereas the other metadomains constitute the nodes of the metamodel thereof defined from the first entity. In the same way, the administrator of the interrogation system of the invention has available in the interrogation system of the invention a means for creating metamodels which is furnished with a means for hierarchizing the metadomains according to whether they are independent of or dependent on other metadomains. In most applications of the interrogation system of the invention, it has been found that the number of independent metadomains is generally less than five whereas most of the activities of one and the same firm are limited to fifteen dependent metadomains. In the interrogation system of the invention, the domain is the basis entity which makes it possible to directly interrogate data in the collection 1 or indirectly a collection of data deposited in the data repository module 2. Each domain is recorded in the module 3. The domain in the guise of base entity is constituted by a collection of computer objects and is fully autonomous so that it contains all the resources and information necessary for interrogating a data source. It comprises:
- a resource of connectivity to the data source such as a relational database interrogation system; - all the metadata relating to the data source, the characteristics of the data, etc.
- a visual interface allowing a user to formulate questions on the data. Detailed in a diagram in Figure 6 is the articulation between a domain and a metadomain as well as the architecture of the means disposed in particular in the data organization module 3 for organizing the data under the action of an organization Administrator Ad, then for interrogating the data organized under the action of a query of a user 5. According to the invention, the prior analysis of the data of the collection 1 has been carried out so as to form a metamodel of the collection of data 1 seen by the interrogation system, so that a plurality of metadomains is defined with dependence relations disposed according to a directed acyclic graph recorded in a suitable memory of the data organization module 3.
Each node of the directed acyclic graph 90 for which a dependence relation is descendent constitutes a metadomain, whereas a domain is constituted by a terminal node of the directed acyclic graph 90.
As a result, each node of the tree recorded in the memory 90 of the data organization module 3 can be represented by:
- a metadomain 91 defined by a name "."Name" and a metakey "::Metakey" denoted by a relation in the tree referred to by mk;
- a domain 92 defined by a name "::Name" and a primary key "::Primary_key" denoted by a relation in the tree referred to by pk. Each domain 92 is thus as indicated above coupled to a resource of connectivity "::Connectivity" 93, a resource of visualization of the connections "^Visualization" 94 and a resource for pointing to the data model "::Model" 95.
A metadomain analyzed by the resource 91 of the data organization module 3 therefore makes it possible to point at each data unit of the set of real data 1 or of its copy in the data repository module 2 as a point of an intermediate node 96 of the directed acyclic graph 90. The point is referred to by the metakey πrik- Likewise, a terminal node 97 of the directed acyclic graph 90 refers to a data unit of the set of real data 1 or of its copy in the data repository module 2 such as a point pointed at by the primary key Pk. Each point of the node representative of a metadomain 96 is connected by the resource of connectivity 93 by:
- a first bijective application 99 for analysis which is traversed when a query element is returned from the query module 4 so as to arrive at the address of a document or of any data unit referenced by the interrogation system by virtue of the domain 97 and the domains management resource 92;
- a second bijective application 98 of response which makes it possible to upload the reference of the data unit responding to the query part during the construction of the response.
In a practical embodiment, a domain corresponds to a relational table or else to a main relational table and to several secondary relational tables for example having relations from 1 to N or from N to 1 (in the case of a thesaurus) with the main table. The interrogation system of the invention comprises a means for generating domains, that is to say computer objects comprising:
- a resource of connectivity to the data source;
- a collection of metadata; - a visual interface of questions on the data associated with the domain.
In the interrogation system of the invention, the model is an entity which makes the link between the concrete data of the domains and the various implications of metadomains. A model comprising essentially four components which vary according to the type of model, but whose principle remains similar. The four components are:
- pk: the collection of primary keys of the domain; - mk: the corresponding collection of metakeys;
- pk_mk: the collection of links going from the primary keys to the corresponding metakeys;
- mk_pk: the collection of links going from the associated metakeys to the primary keys.
In a practical mode of embodiment of the interrogation system of the invention, a means for managing the models makes it possible to create, to update, to manage the aforesaid components of a model in such a way as to configure a bijective relation which makes it possible to point from a metadomain to a domain. Subsequently, in a higher order of organization, a particular model of data is integrated with its pairs of bijective relations.
The same scheme is applied to configure the edge of the directed acyclic graph going from a father metadomain to a son metadomain. To this end, in a practical embodiment of the interrogation system of the invention, a means for managing the models makes it possible to create, to update, to manage the aforesaid components of a model so as to configure a bijective relation which makes it possible to point from a metadomain to another metadomain in a model, that is to say to configure a particular edge of the directed acyclic graph going from a metadomain (represented by an oval in Figure 4) to another metadomain (represented by an oval in Figure 4). The system for interrogating heterogeneous data of the invention associates several tools which give the user thereof the means for centralizing, organizing and/or interrogating the data disposed in the data repository module 2 or directly in the set of data 1. For this purpose, the database interrogation system of the invention comprises one or more tools, among which are included :
- a processing unit devised as a man-machine interface 4 comprising a visual interface of web type, with interrogation forms and results presentation arrays, as well as a module for formulating queries (GEN_FORM; Figure 5), which makes it possible to translate the queries of the users into various computer languages on the one hand for circulating the queries among the search engines capable of interrogating the resources of the data module 1 or data repository module 2, on the one hand, and into human language on the other hand in particular to produce the responses destined for the user;
- a queries metaengine (MM; Figure 5) capable of calling several types of search engines, in particular relational search engine and textual search engine, and of performing set- theory operations on the results originating from these various engines;
- a cache memory (CR; Figure 5) which brings together the questions already posed to the system as well as their respective responses, so that the questions already formulated may be answered more quickly when they occur in a new form, produced in particular by the aforesaid queries engine;
- a tool for post-processing the results (GEN-REP; Figure 5);
- an architecture offering tools for connecting external applications, in particular visual interfaces, tools serving to produce and to maintain the data of the information system 1.
As a result, furnished with these tools, the heterogeneous database interrogation system of the invention is intended to interface in an Intranet allowing a firm, or a group of firms, to carry out all the necessary management operations on the collection of heterogeneous data files contained in the data repository module. In the same way, the heterogeneous data interrogation system of the invention is intended to be interfaced with the outside world by way of the Internet by means of the management of one or more websites. The data repository module 2 affords several advantages to the system of the invention:
- each group of copied data preserves its structure defined by its initial conceptual data model which is specific to it without it being necessary to constrain it to a single model as is the case in other heterogeneous data management systems;
- all the data are located in the data box 2 without loss of information.
- the identity of structure between the production system which established the data set 1 and the interrogation system 4 constituted by the interrogation system of the invention facilitates the comprehension of the structure.
The maintenance of the data is facilitated since the specialists of the various production systems recognize the conceptual model of their own data in the data box 2 thus allowing maintenance or updating for example by means of the feed system.
In an embodiment, the data repository module 2 constitutes moreover a portal for accessing the data in so far as the grouping of the data may be real or virtual. Depending on requirements, it is possible either to copy the data or else to maintain them in the production system so as to consult them and extract them at the moment of interrogation.
In another embodiment, the data organization module 3 moreover constitutes a portal for accessing the data in so far as it contains associated with the domains (92; Figure 6) the resources of connectivity (93; Figure 6) which make it possible to access in an organized manner the data searchable by the system of the invention. A web client is then apt for activating the connectivity resources so that the data associated with each domain interrogated directly, for example with the aid of an interface IG 1
(Figure 1 ) by the Administrator Ad or anther equivalent user, may be presented on a visualization console. As has already been set forth, the repository of the data in the data repository module 2 (Figure 1 ) is not an obligatory measure. However, the choice of copying the data present in the production system or of carrying out a data repository presents various advantages or drawbacks which are as follows:
- when the data are copied physically into a physical copy part (REC_PHY; Figure 5), it is possible to gain performance and avoid overloading the production system. One thus avoids inconveniencing the users of the data collection 1 in their daily work. The overloads of the tele-informatics network of the data production systems 1 are reduced on account of the fact that the interrogations pass through the system of the invention and that therefore this task is no longer assigned primarily to them. It is also possible to interrogate the database management systems which are not well suited to complex queries, such as for example documentary data management systems. Finally, production systems naturally being continuously modified, it is possible to carry out transformations of data during feeding or within the collection of data 1 ; - when the data of the production systems are accessed directly without any copying, by virtue of the copying of pointers such as hypertext links into a virtual copying space in the data repository module 2 (REC-VIR; Figure 5), they are accessible in real time thereby allowing a saving of storage space in the data repository module 2, a saving in the duration of the feeding of the database 2 and a limiting of the burdens of scheduling of work time and of bandwidth on the tele-informatics network which makes it possible to link the data collection 1 to the system of the invention. The heterogeneous database interrogation system of the invention comprises in a particular embodiment means allowing the user to choose between:
- optimizing the performance in the queries; and - favouring access to the information in real time; according to objectives that the user assigns during the formulation of a query and/or configuration of an interrogation session. It is noted that, in this solution, the data are grouped either physically (RECJ3HY; Figure 5) or else virtually (REC_VIR), but while still preserving their design scheme or initial model.
The heterogeneous database interrogation system of the invention thereafter comprises a means for feeding (M_COPY; Figure 5) the data repository unit or module 2 which may be performed by means of a scheduler (PLAN-REC; Figure 5) at regular intervals, for example every day or every week, in the following manner:
- the coded data, maintained in the data collection 1 and originating from software applications (not represented) which produce coded databases, in particular such as a Lotus Notes application, are copied with the aid of a tool for migrating coded data to various relational database management systems, be it in physical form (in REC-PHY) or virtual form (in REC_VIR); - the static text data are copied from a web server by virtue of an integrated web robot by means of copying M_COPY (Figure 5) of the data repository module 2;
- the strongly structured data which are intended to be repeated in several types of databases, such as for example generic data on patients (sex, address, date of birth, possible date of death, etc.) are grouped together on a generic data server (identity and movements server) which constitutes its own relational database interrogation system, which are not copied physically into REC_PHY (Figure 5), but are accessible directly, by writing to the virtual copy part REC_VIR (Figure 5) to benefit from real time, in particular at the updates level; - documentary resources are copied from websites, using an Intranet resource or an Internet resource (parts of M_COPY) by making use of the aforesaid web robot.
The data repository module 2 is based on a relational database interrogation system such as a mySQL system which has the advantage of being widespread while preserving a high degree of performance and reliability.
In Figure 5 are elements making it possible to describe the functioning of the interrogation system of the invention during two different modes of functioning:
- a first mode of feeding the data repository module 2 with fresh data;
- a second mode of utilizing the data deposited in the data repository module 2 by using the processing unit 4. Periodically, according to a periodicity determined preferentially by a copy scheduler PLAN-REC associated with the repository module 2, the data repository unit 2 is connected to the data collection 1 by a tele-informatics network not represented in Figure 1. Such a network comprises an Intranet type network on the one hand and, as appropriate a network for accessing remote web resources on an Internet network, on the other hand. The basic or original data are recorded in databases constituted in the data collection 1.
In the first mode or feed mode, the original data are copied with the aid of a data copying means M_COPY. Depending on the status of the recorded data 1 , which status is determined by a configuration means (not represented), the means for copying M_COPY copies the data either physically in the guise of a physical copy into a physical copy memory REC-PHY, or in the guise of pointers or hypertext links in a virtual copy zone REC_VIR. The operation of copying, of feeding, or of depositing, of data utilizes a plurality of computer resources which are described elsewhere. In the second mode or utilization mode, a user (not represented) activates a forms generator GEN_FORM which activates a metaengine MM described elsewhere which then activates the data repository module 2. Moreover, depending on the type of data interrogated, the forms generator GEN-FORM interrogates the virtual copying space REC_VIR and/or the physical copying space REC_PHY of the data repository module 2 as well as a cache memory of queries CR which contains the previous stored searches. A query form is transmitted by way of the structuring unit 3 which will be described elsewhere so that, if the physical data are requested, they are retransmitted directly to a responses generator GEN_REP of the processing unit 4. If the data requested are in the virtual copy zone REC-VIR1 the data repository module 2 produces a connection to the data collection 1 or else to a website determined by the pointer addressed in the virtual copy memory REC_VIR and the data fetched are then forwarded, via the data organization module 3, directly from the data collection 1 to the responses generator GENMREP of the processing unit 4. The response to the query produced by the forms generator GEN_FORM is then compiled with the earlier responses obtained and recorded in the queries cache memory CR, and the final response is made accessible to the user by means of the response generator GEN_REP in the guise of a visualization screen, of a web page, or else of a printed representation as is known in the state of the art.
An embodiment of the data organization module 3 will now be described succinctly. The aim of such a unit is to make it possible to easily fetch information and, for this purpose, the organization module 3 makes it possible to carry out the following operations:
- the organization module 3 comprises a means for organizing and a means for arranging the data contained in the data repository module 2 or directly in the data collection 1 ; - the organization module 3 comprises a search engine for executing queries on all the data regardless of their type, be it text, structured or unstructured data;
- the organization module 3 also comprises a means for generating visual interfaces intended to allow the user to formulate queries and to utilize their response;
- finally, the organization module 3 comprises a tool for creating links between the data and external applications, in particular statistical tools or other systems for navigation or post- processing of data.
The data organization means or structuring unit 3 comprises a means for generating, above the pre-existing structure of the data, an upper organization layer. Such an organization layer is of object type, and it is structurally separated from the subjacent relational layer. To generate such an object layer, the means for creating the organization layer comprises a means for executing the grouping of one or more tables of queries into query domains as well as a means for creating multiple links between the query domains and their grouping into entities called metadomains.
A query domain is an entity of logical grouping of data corresponding to one or more tables disposed in the memories REC_PHY and REC_VIR of the data repository module 2. The means for generating a query domain comprises a means for grouping data, for the explicit realization of its data and a means for annotating the data. Thus a domain groups together, in addition to the data proper, metadata which describe the nature of the information:
- non explicit, - description,
- type of data,
- date of validity, etc. A domain administrator utilizes a computer connected RL2 (Figure 1 ) to the tele-informatics network of the kind of the Administrator Ad (Figure 1). Such an administrator assists the users connected to the tele-informatics network for the defining of the domains a visual administration interface IG1 for helping to class and generate the domains as a function of the recommendations of the users. A domain administration means, which may be managed by the domain administrator comprises:
- a means for defining the domains, - a means for producing the layout of the domains,
- means for defining and managing a plurality of organization entities comprising:
* domains and the types of objects that a domain comprises (resources of connectivity, visualization; model, primary key);
* metadomains;
* metamodels;
* form elements.
In a preferred embodiment, the generator for assisting with the creation of domains comprises a means for carrying out a grouping of the data and of the domains, which grouping is identical to that of the production systems. This approach has several advantages:
- the user retrieves the information emanating from his own production system with an identical layout;
- the user profits from the thinking carried out during the design of the production system on the grouping of the data;
- the data have a meaning different in the limit a semantics dependent on their provenance, according to the domain specified. For example in oncology, the term "necrosis" does not signify the same thing when it is used clinically, in pathology or in radiology. It follows that the term "necroses" steers the search as a function of the domain which was specified to establish the query.
An advantage of the present invention is to make it possible to create domains while the systems for managing relational databases are heterogeneous, local or remote in the collection of data 1. The data organization module 3 comprises means for interrelating data present invariably in the original collection of data 1 or in the data repository module 2.
The data organization module 3 also comprises a means for managing metadomains. A metadomain within the sense of the invention is an object which makes it possible to link entities which are located in the data repository module 2, which exhibits one and the same semantics while stemming from various data production systems. These entities, which may originate from diverse database fields are apt to be found in different query domains. To be associated in a metadomain, the data entities which serve to compose a metadomain receive a co-relation beyond their heterogeneous relocation or their heterogeneous origin. To embody the metadomains management means the latter comprises a first means for identifying common entities and declaring in the data repository module 2 or in the collection of data 1 in the guise of semantic entity intended to receive a co- relation. The first means for identifying of common entities comprises a means for inputting common entity and a means for declaring of a co-relation between several common entities inputted into a memory means provided for this purpose in the data repository module 2 by the common entity input means.
The metadomains management means thereafter comprises a second means for individually identifying in each of the domains, which may have been designated in the domain management means, the entity having the semantics defining the co-relation regardless of its assigned name or the data typing. In the case of the application of the interrogation system of the invention to a processing of the files of various patients treated in a hospital, it is considered that each patient is identified by a unique file number. The various database production systems comprise tables each of which comprises a column containing the entity characteristic of the unique file number. To obtain a co-relation of the data recorded in the various databases, such as clinical data and radiology images, tumours stored in the tumour library and data of the identity and movements server, it is possible to use or to construct a co- relation, owing to the very fact that all of these data relate to the same patient or the same list of patients. However, the entity which stores the unique file number bears a different name in each of the data production systems. For example, in a first database interrogation system, the "file number" entity may be labelled "Numfile" whereas in a second database interrogation system, the same entity may be labelled "N_file". The problem is multiplied with more than two database management systems. Moreover, their data typing is rarely identical and may be an integer, a character string or else a character string of defined length, etc. and will thus be copied into the data repository module 2 in heterogeneous forms inaccessible in a unified manner. Moreover, depending on whether certain data are copied physically into the physical copy memory or virtually into the virtual copy memory, access thereto does not take the same mode.
To solve problems of this kind, the second means of the means for managing metadomains makes it possible to individually identify the domains relevant to the same semantics. In this instance, the second means for individually identifying of domains makes it possible to declare a semantic unit: here the "patient" and thereafter to declare the key making it possible to identify a patient: the "file number". Thereafter, when the interrogation system of the invention makes it possible to work in interrogation mode as was described elsewhere. The second means for managing metadomains traverses the various domains already defined so as to retrieve the corresponding entities, in the present case "Numfile", "N_file" etc. and they are declared as having the same semantics as the newly defined metakey. The semantic unit "patient" thus groups together several domains managed by the structuring unit 4 regarding the adopted criterion of the "file number". The data organization module 3 comprises moreover a means for managing metakeys which makes it possible in particular to produce metadomains by grouping together several query domains. In the aforesaid example, the metakey is constituted by "N_file" whereas the metadomain is constituted by "patient". The concept of metakey used by the present invention obeys a certain number of rules which are now described.
- The metakey takes into account the whole collection of relational tables relevant to the co-relation with which it is associated and it is not necessary for its field name to be identical nor for its data typing to be the same.
- in a particular embodiment, the metadomains are produced on the basis of a public-domain database-independent interface and may group together relational tables of heterogeneous relational database management systems, such as Oracle, Access, etc. (registered trade marks). Moreover, the metakeys management means cooperates with a means for creating the links.
- The metakey generated by the metakeys management means is a foreign key creating relations both between the local and remote data resources, by means of Intranet type or Internet type connections.
The metakey and its management means cooperate with a means for creating multiple relations without physical modification of the structure of the data contained in the data repository module 2, whereas all this information is stored in the object layer, which is located above the relational physical layer.
As a result, the metadomain associated with a metakey generated by the means for managing metakeys of the invention is constituted as a "object oriented" structure making it possible to create links between relational entities of the heterogeneous relational database interrogation system, installed on heterogeneous platforms, remote or local. The metadomains management means of the invention thus makes it possible to enrich the relational model by adding thereto the benefits of the "object" model. The system of the invention, in a particular embodiment, therefore comprises a means for constructing networks whose input is connected to the metadomains management means of the invention and to the associated means for managing metakeys, to construct a complex network of relations and of contexts between relational entities.
An application of the aforesaid concepts and of their management means for a particular hospital information system will now be described. The hospital information system makes it possible to monitor the medical treatment of patients who are afflicted with tumours and who are hospitalized in care units. The users of the interrogation system of the invention, who are members of the care staff utilize the aforesaid means to define various entities in the guise of metakeys grouping together the same data in several ways as various metadomains.
Patient specification or identification data make it possible to establish a metadomain characteristic of the patient, because numerous data pertaining thereto may be interrelated, this having been previously described as a co-relation. The data specifying a tumour of a patient makes it possible to establish a characteristic metadomain by knowing that the patient may exhibit several different tumours and that it is not always easy to link the successive data or elements to a tumour, especially when the patient exhibits several tumours, synchronous or successive. The characteristic data of a sample or biopsy performed on a patient make it possible to generate a metadomain. Data describing a hospitalization, taken as a particular care episode, make it possible to generate a new metadomain. More generally, data descriptive of a care episode make it possible to generate a metadomain if one considers the phase of diagnostic consultations, the treatment phase, the consultations and acts entering into the framework of monitoring, check-up, then the treatment of a recurrence. The particular feature of this metadomain is that its metakey is not an entity fed directly by the information system. However, in such a case, the interrogation system of the invention comprises a means for calculating a relational entity defining a metadomain associated with a metakey characteristic of a care episode. Such a means for managing metadomains constitutes a means for managing virtual metadomains.
Data characteristic of a phase of the illness, if the following phases are considered:
- phase of local illnesses,
- phase of local recurrence,
- phase of metastatic illness,
- palliative phase, also make it possible to generate another metadomain.
More generally, the interrogation system of the invention comprises means for managing various metadomains:
- a means for managing real metadomains: such a management means makes it possible to track and group together the relations between entities pertaining to the patient, a biological sampling or hospitalization;
- a means for managing virtual metadomains: such a management means cooperates with a calculation means , in particular a calculation means able to effect a consolidation of data acquired successively over time according to sequences making it possible to normalize the numerical data with the aid of an operation of calculation as a function of programming criteria determined by the user of the interrogation system of the invention.
By way of example, the "hospitalization" metadomain is defined by a hospitalization number, which serves as metakey, to which may be attached: - administrative information: acts and mode of entry, hospitalization unit, etc.;
- statistical information on the patient: name, age, sex;
- diagnostic coded medical information and acts in respect of problems of compatibility with national insurance management and indemnity systems;
- calculated information: duration of stay, cost of stay, etc.;
- textual information: with each stay, a hospitalization report is drafted as free text possibly in a structured manner; - information on the production of an act and on the consumption of medication, transfusion, etc;
- information on the care administered, gathered into a computerized care file, such as contained in a smart health card.
A particular embodiment of the data structuring unit or organization module 3 of the interrogation system represented in Figure 1 will now be described.
The embodiment described herein below uses software resources constituted by classes of objects which are distributed into two categories: - a first class of objects related to the structural organization of the data;
- a second class of objects associated with the creation of the visual interfaces. The first class of objects makes it possible to embody a means for generating a metamodel as has been described elsewhere. In this particular mode of embodiment, the metamodel is constituted by an object of class EPI::metamodel for referencing an object according to the semantics of the C++ computer language or else of the Java computer language. Such an object class contains another public-domain object class 'DAG1, the container of this public object inheriting all the basic functionalities for managing an directed graph. During the creation of an object of metamodel type using the object of class EPI::metamodel of the invention, the vertex, the nodes and the leaves of the metamodel represented by the acyclic directed graph generated contain simply the identifiers of the metadomains or domains to which they make reference. To embody a metadomain, the means for generating a metamodel comprises a means for managing metadomains and a means for calling an object of class EPI::MetaDomain which contains all the metadata related to the metadomain embodied, such as for example the explicit name of the metakey. To embody a node metadomain, the means for generating a metamodel comprises a means for managing metadomains of nodes and for calling an object of class EPI::Model which contains the data model, a particular data structure modelling the bijective relation between its own metakeys and that of the parent metadomains as has been explained elsewhere.
To embody a domain, the means for generating a metamodel comprises a means for managing domains and for calling a plurality of object classes which are respectively:
- an object of class EPI::Domain which contains all the metadata related to the domain, such as for example the explicit names of all the fields, their typing, etc.;
- an object of class EPI::DBase which ensures the connectivity with the relational database management systems to the relational table or tables to which the domain makes reference. In a particular embodiment, the concerned object class uses a public-domain database-independent interface object class which ensures connectivity compatible with a certain number of relational database management systems such as Oracle, Sybase, etc.,
- an object of class EPI::Model which contains a particular data structure modelling the injections and surjections between the primary keys and the metakeys of the parent metadomain;
- at least one object of class EPI::lndex::* for interrogating fields in the data deposited in the copying unit 2 and that the SQL system may not interrogate effectively and these object class will be discussed later on; - an object of class EPI::Form which comprises:
- - a means for automatically generating a visual interface allowing a user to formulate a query to interrogate the fields of the domain;
- - a means for analyzing the query formulated by the user and transcoding it into a language readable by an engine implemented in the interrogation unit 4;
- an object of class EPI::DomainStats, which comprises:
- -a means for generating statistics on the data of the domain; - - a means for storing the statistics generated;
- - a means for managing a dictionary of data of statistics generated and stored in the object class.
The objects of class EPI::lndex::* will now be made precise. In a certain number of query situations, the relational table contains fields whose content may not be interrogated effectively with an SQL type engine. To allow the interrogation of these fields, the collection of classes EPI::lndex::* comprises a means for creating specialized indices and means for effectively interrogating a field non searchable with the aid of an SQL engine. A domain generated by the means for generating domains to comprise one or more objects of this type.
Among the objects of class "index", EPI::lndex::*, a class EPI::lndex::Regex comprises means for indexing, and for interrogating a field containing text by using a language for calculating regular Regex expressions. In a particular embodiment, the means for interrogating a field containing text and contained in the class EPI::lndex::Regex cooperates with a cache memory system for supporting sizeable likes in load. As a result, by using the cache memory, the means for interrogating a field containing text can perform searches for phrases or for textual patterns in relation in particular with the definition of one or more domains, as has already been specified. The second class of objects makes it possible to embody a means for generating a visual interface as has been described elsewhere. The means for generating a visual interface of this particular embodiment comprises an object of class EPI::Form which executes or implements a means for generating a query form making it possible to interrogate a consistent collection of data, structured by the data structuring unit 3. To implement a means for generating a query form, the object of class EPU::Form comprises a means for calling or generating a plurality of elements of forms of a specific field of the domain generated by an object of class EPI::FormElement which is responsible for the interrogation and/or for the displaying of a specific field of the domain. The objects of this class are not specific to a particular field, but to a data typing and to a type of interrogation or to a particular interrogation class. As a result, the objects of class EPI::FormElement are independent of the data. Each object of class EPI::FormElement comprises: - a means for interrogating one or more types of relational data (types including numerical, date, textual field, etc.);
- a means for directly interrogating the data in SQL language;
- a means for interrogating the specialized indices created by the interrogation system for the interrogation of a field, for example a textual index searchable with the index object class EPI::lndex::Regex; - a means for graphically representing a form element on a visual interface, such as an "html" page;
- a means for interacting with the user, who fills in the form element so as to formulate his question, for example using objects called in "html" tags inserted into an "html" page serving as receptacle for the form;
- a means for analyzing the question represented by the user in the guise of an "html" page and for generating the corresponding subquery in an engine language as well as a natural language. Represented in Figure 7 is a form of the interrogation system of the invention when it is utilized by a user 5 (Figure 1 ). The user 5 utilizes a visual interface from a computer station connected to the computer network of the interrogation system. The visual interface 100 addresses a form generator which is fed from a memory 105 of form elements which was previously built on the one hand during the configuration of the interrogation system and on the other hand during a phase of administration with the aid of the organization module 3, by the Administrator Ad (Figure 1 ) on the basis of the model of the data adopted by the administrator.
The query built with the aid of the forms generator 101 is then addressed to the metaengine 102 as has been described previously and the built query is addressed to a set of caches 84 and/or to a set of query engines 103 as has previously been set forth with the aid of Figure 5. The query is then utilized by the organization module 3 by its User part 105 so that the organization of the data in the guise of a directed acyclic graph effected by the organization into organization entities is then applied to the query in the guise of traversals of the nodes of the tree according to the bijective applications of query descent going from the metadomains to the domains, then of backtracking of the responses going from the domains to the metadomains until a response is offered in the guise of a list of references of documents or of other data units which are referenced or else in the data repository module 2 or else in the data set 1. The response is then made available both in the set of caches 104 and at the level of the visual interface 100.

Claims

1. System for interrogating heterogeneous databases of the kind comprising:
- a set (1 ) of databases, all or some of which having their own interrogation system,
- a means for generating an interaction between a user and a plurality of data of the set of databases, characterized in that the means for generating an interaction between a user and a plurality of data of the set (1 ) of databases comprises:
• a data organizer module (3) for associating the data by reference to a plurality of organization entities;
• a module (4) for interrogation and for producing response data which comprises a means for generating query forms and producing response data.
2. Interrogation system according to Claim 1 , characterized in that it comprises a data repository module (2) and a means of connection (6) to the said data collection (1 ) at least at copy dates and in that the data repository module (2) comprises: - a relational database part (80);
- a tree-like data part (81 ).
3. Interrogation system according to Claim 2, characterized in that the relational database part (80) of the repository module (2) comprises a dialog means with a relational database interrogation system of SQL type implemented in a module (84) belonging to the data organization unit (3).
4. Interrogation system according to Claim 2, characterized in that the data part structured as a tree (81 ) comprises a dialog means with a treelike interrogation system implemented in a module (85) belonging to the data organization module (3).
5. Interrogation system according to any one of Claims 2 to characterized in that it is provided with a plurality of computer resources capable of transferring the data of the collection of original data (1 ) to the data repository module (2), among which are:
- a middleware application (71 ), which makes it possible to connect the documentary databases (70) to the relational part
(80) of the data repository module (2);
- a middleware application (73), which makes it possible to connect the relational databases (72) to the relational part (80) of the data repository module (2); - a method for importing websites (75) which makes it possible to transfer the websites (74) into the relational part (80) of the data repository module (2);
- a method for importing the arborescent structure of websites (76) into the treelike part (81 ) of the data box (2); - a method (78) for importing treelike data (79) into the treelike part (81 ) of the data repository module (2);
- a method (82) for importing treelike data of web type from the treelike part (81 ) of the data repository module (2) to the relational part (80) of the data repository module (2); - a method for importing (83) treelike data of textual type from the treelike part (81 ) of the data repository module (2) to the relational part (80) of the data repository module (2).
6. Interrogation system according to Claim 1 , characterized in that the association by reference to a plurality of organization entities of the data organization module (3) is executed by a method of management of at least one data metamodel, consisting of metadomains, then of query domains which are established on the basis of the forms generator (11 ), in the guise of a data structure based on an directed acyclic graph of "DAG" type.
7. Interrogation system according to Claim 6, characterized in that it comprises a means for managing the metadomains.
8. Interrogation system according to Claim 7, characterized in that it comprises a means for creating metamodels for hierarchizing the metadomains depending on whether they are independent of or dependent on other metadomains.
9. Interrogation system according to Claim 8, characterized in that a means for creating metamodels comprises resources and information necessary for interrogating a data source, among which are:
- a resource for connectivity (93) to the data source as a relational database interrogation system.
- a resource for pointing (95) to a predetermined model;
- a visual interface (94) allowing a user to formulate questions on the data.
10. Interrogation system according to Claim 9, characterized in that it comprises a means for generating domains comprising,
- a resource for connectivity to the data source;
- a collection of metadata;
- a visual interface of questions on the data associated with the domain.
11. Interrogation system according to one of Claims 9 or 10, characterized in that it comprises a means for managing the models for creating, updating, managing the components of a model in such a way as to configure injections and surjections which make it possible to point from a metadomain to a domain in a model.
12. Interrogation system according to any one of Claims 9 to 11 , characterized in that it comprises one or more tools, among which are: - a processing unit devised as a man-machine interface
(4) comprising a visual interface of web type, with interrogation forms and tables for presentation of the results, as well as a module for formulating queries (GEN_FORM; Figure 5); - a query metaengine (MM; Figure 5) capable of calling several types of search engines, in particular relational search and textual search engines, and merging the results originating from these various engines. - a cache memory (CR; Figure 5) which brings together the questions already presented to the system as well as their respective responses, so that the already presented questions may be compiled more rapidly when they occur in a new form, produced especially by the aforesaid query engine; - a tool for post-processing the results (GEN_REP;
Figure 5);
- tools for connecting external applications.
13. Interrogation system according to Claim 12, characterized in that it comprises a means for interfacing in an intranet.
14. Interrogation system according to Claim 12, characterized in that it comprises a means for interfacing with the outside world by way of the internet by means of the management of one or more websites.
15. Interrogation system according to Claim 1 or 6, characterized in that the data organization module (3) comprises a means for generating, above the pre-existing structure of the data (1 or 2), an upper organization layer, of the object type, structurally separated from the subjacent relational layer.
16. Interrogation system according to Claim 15, characterized in that the means for generating an upper organization layer comprises a means for executing the grouping together of several query tables into query domains as well as a means for creating multiple links between the query domains and their grouping into entities called metadomains.
17. Interrogation system according to Claim 6, characterized in that it comprises a means for administering domains, which comprises at least: - a means for defining the domains,
- a means for arranging the layout of the domains,
- means for defining and managing a plurality of organization entities comprising: * domains and the types of objects that a domain includes (connectivity resources, visualization; model, primary key);
* metadomains;
* metamodels; * form elements.
18. Interrogation system according to Claim 6, characterized in that the structuring unit (3) also comprises a means for managing metadomains which comprises a first means of common entity identification and declaration thereof in the data repository module (2) and/or the data set (1 ) in the guise of semantic entity intended to receive a co-relation.
19. Interrogation system according to Claim 18, characterized in that the first means of identification of common entities comprises a means of common entity entry and a means of declaration of co-relation between several common entities input into a memory means in the data repository module by the means of common entity entry.
20. Interrogation system according to Claim 18, characterized in that the means for managing metadomains comprises a second means for individually identifying, in each of the domains that would have been designated in the domain management means, the entity having the semantics defining the co-relation regardless of its assigned name or the data typing.
21. Interrogation system according to Claim 20, characterized in that the second means of the management means of metadomains makes it possible to individually identify the domains relevant to the same semantics by declaring a semantic unit, then its key.
22. Interrogation system according to Claim 6, characterized in that the data organization module (3) comprises a management means of metakeys for effecting metadomains by grouping together several query domains.
23. Interrogation system according to Claim 22, characterized in that each metakey and its management means cooperate with a means for creating multiple relations without physical modification of the structure of the data contained in the data repository module (2).
24. Interrogation system according to Claim 22, characterized in that it comprises a means for constructing networks whose input is connected to the management means of metadomains and to the associated management means of metakeys, so as to construct a complex network of relations and of contexts between relational entities.
25. Interrogation system according to any one of Claims 6 to 24, characterized in that it comprises a means for calculating a relational entity defining a metadomain associated with a metakey so as to constitute a means for managing virtual metadomains.
26. Interrogation system according to Claim 25, characterized in that it comprises means for managing various metadomains:
- a means for managing real metadomains for tracking and grouping the relations between entities; - a means for managing virtual metadomains to cooperate with a calculation means , in particular a calculation means able to effect a consolidation of data acquired successively over time according to sequences making it possible to normalize the numerical data with the aid of an operation of calculation as a function of programming criteria determined by the user.
27. Interrogation system according to any one of Claims 6 to 26, characterized in that the data organization module (3) cooperates with software resources comprising: -a first objects class of structural organization of the data; -a second objects class for creating visual interfaces.
28. Interrogation system according to the preceding claim, characterized in that it comprises a means for generating a metamodel on the basis of an object of class EPI::metamodel which contains another object of class DAG by inheriting therefrom all the base functionalities for managing an directed acyclic graph of "DAG" type.
29. Interrogation system according to the preceding claim, characterized in that the means for generating a metamodel comprise a means for managing metadomains and a means for calling an object of class EPI::Metadomain which contains all the metadata related to the metadomain produced, such as the explicit name of a metakey.
30. Interrogation system according to the preceding claim, characterized in that, in order to produce a node metadomain, the means for generating a metamodel comprises a means for managing node metadomains and for calling an object of class EPI::Model which contains the data model, a particular data structure modelling the bijective relation between its own metakeys and that of the parent metadomains.
31. Interrogation system according to the preceding claim, characterized in that in order to produce a domain, the means for generating a metamodel comprise a means for managing domains and for calling at least one object taken from among a plurality of object classes comprising:
-an object of class EPI::Domain which contains metadata related to the domain;
-an object of class EPI::Dbase which ensures the connectivity with the relational database management systems and/or to the relational table or tables to which the domain refers; -an object of class EPI::Model which contains a particular data structure modelling the injections and surjections between the primary keys and the metakeys of the parent metadomain;
-at least one object of class EPI::lndex::* for indexing fields in the data deposited in the data repository module (2) and/or the data set (1 ) with a view to their interrogation by a predetermined engine;
-an object of class EPI::Form which comprises a means for automatically generating a visual interface for formulating a query for interrogating the fields of the domain; a means for analyzing the query formulated and transcoding it into a language readable by an engine implemented in the interrogation unit (4); -an object of class EPI::DomainStats, which comprises: a means for generating statistics on the data of the domain; a means for storing the statistics generated; a means for managing a dictionary of data of statistics generated and stored in the object class.
32. Interrogation system according to the preceding claim, characterized in that in order to allow the interrogation of these fields, the collection of classes EPI::lndex::* comprises a means for creating specialized indices and means for interrogating in an effective manner a field not searchable with the aid of an SQL engine.
33. Interrogation system according to the preceding claim, characterized in that it comprises a class EPI::lndex::Regex which comprises means for indexing, and for interrogating a field containing text by using a language for calculating regular expressions Regex.
34. Interrogation system according to the preceding claim, characterized in that the means for interrogating a field containing text and contained in the class EPI::lndex::Regex cooperate with a cache memory system in particular for performing searches for phrases or textual patterns in relation in particular to the definition of one or more domains.
35. Interrogation system according to Claim 26, characterized in that the means for generating a visual interface comprises an object of class EPI::Form which executes or implements a means for generating a query form making it possible to interrogate a coherent collection of data which is structured by the data organization module (3).
36. Interrogation system according to the preceding claim, characterized in that the object of class EIP::Form comprises a means for calling or generating a plurality of form elements of a specific field of the domain generated by an object of class EPI::FormElement for executing the interrogation and/or the display of a specific field of the domain.
37. Interrogation system according to the preceding claim, characterized in that each object of class EPI::FormElement comprises: - a means for interrogating one or more types of relational data (types such as numerical, date, textual field, etc.);
- a means for directly interrogating the data in SQL language;
- a means for interrogating the specialized indices created by the interrogation system for the interrogation of a field, such as a textual index searchable with the object of index class EPI::lndex::Regex;
- a means for graphically representing a form element on a visual interface, such as a page in the "html" format; -a means for interacting with the user which fills in the form element so as to formulate his question, for example by using objects called in "html" tags inserted into an "html" page serving as receptacle for the form; -a means for analyzing the question represented by the user in the guise of an "html" page and for generating the corresponding sub-query in an engine language.
38. Method for interrogating heterogeneous databases for implementing an interrogation system according to any one of the preceding claims, which is characterized by two main phases: a first phase of organization of the data by an administrator and executed on the basis of at least one structural model of the data to be interrogated; a second phase of interrogation of the data by a user and executed on the basis of forms generated as a function of the data organization determined during said first phase of organization of the data.
PCT/EP2005/053248 2004-07-09 2005-07-07 System for interrogating heterogeneous databases and method for interrogation WO2006005715A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR0451482A FR2872940B1 (en) 2004-07-09 2004-07-09 HETEROGENEOUS DATABASE INTERROGATION SYSTEM AND INTERROGATION METHOD
FR0451482 2004-07-09

Publications (1)

Publication Number Publication Date
WO2006005715A1 true WO2006005715A1 (en) 2006-01-19

Family

ID=34950796

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2005/053248 WO2006005715A1 (en) 2004-07-09 2005-07-07 System for interrogating heterogeneous databases and method for interrogation

Country Status (2)

Country Link
FR (1) FR2872940B1 (en)
WO (1) WO2006005715A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310429A (en) * 2020-03-16 2020-06-19 青岛百洋智能科技股份有限公司 Method and system for realizing customizable forms
CN114756629A (en) * 2022-06-16 2022-07-15 之江实验室 Multi-source heterogeneous data interaction analysis engine and method based on SQL

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002021259A1 (en) * 2000-09-08 2002-03-14 The Regents Of The University Of California Data source integration system and method
US20040019429A1 (en) * 2001-11-21 2004-01-29 Marie Coffin Methods and systems for analyzing complex biological systems

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002021259A1 (en) * 2000-09-08 2002-03-14 The Regents Of The University Of California Data source integration system and method
US20040019429A1 (en) * 2001-11-21 2004-01-29 Marie Coffin Methods and systems for analyzing complex biological systems

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHRISTINE PARENT ET AL: "Issues and Approaches of Database Integration", COMMUNICATIONS OF THE ASSOCIATION FOR COMPUTING MACHINERY, ASSOCIATION FOR COMPUTING MACHINERY. NEW YORK, US, vol. 41, no. 5, 1998, pages 166 - 178, XP002157038, ISSN: 0001-0782 *
WIDOM J: "Research problems in data warehousing", IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, IEEE, NEW YORK, NY, US, December 1995 (1995-12-01), pages 25 - 30, XP002306922, ISSN: 0018-9472 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310429A (en) * 2020-03-16 2020-06-19 青岛百洋智能科技股份有限公司 Method and system for realizing customizable forms
CN111310429B (en) * 2020-03-16 2023-08-18 青岛百洋智能科技股份有限公司 Method and system for realizing customizable form
CN114756629A (en) * 2022-06-16 2022-07-15 之江实验室 Multi-source heterogeneous data interaction analysis engine and method based on SQL

Also Published As

Publication number Publication date
FR2872940A1 (en) 2006-01-13
FR2872940B1 (en) 2010-07-30

Similar Documents

Publication Publication Date Title
Jarke Common subexpression isolation in multiple query optimization
US7490099B2 (en) Rapid application development based on a data dependency path through a body of related data
Goh et al. Context interchange: New features and formalisms for the intelligent integration of information
US8140557B2 (en) Ontological translation of abstract rules
US7689555B2 (en) Context insensitive model entity searching
Bressan et al. Context knowledge representation and reasoning in the context interchange system
US8595231B2 (en) Ruleset generation for multiple entities with multiple data values per attribute
US8122012B2 (en) Abstract record timeline rendering/display
Chen et al. Exploring performance issues for a clinical database organized using an entity-attribute-value representation
GB2293667A (en) Database management system
US20090077012A1 (en) Displaying relevant abstract database elements
US20060161573A1 (en) Logical record model entity switching
US8090737B2 (en) User dictionary term criteria conditions
Grandi et al. Efficient management of multi-version clinical guidelines
Deshpande et al. Metadata-driven ad hoc query of patient data: meeting the needs of clinical studies
WO2006005715A1 (en) System for interrogating heterogeneous databases and method for interrogation
Lee et al. A metadata oriented architecture for building datawarehouse
Bajaj et al. A comprehensive framework towards information sharing between government agencies
Grandi et al. Multi-Version Ontology-Based Personalization of Clinical Guidelines for Patient-Centric Healthcare
Dogdu Semantic web in eHealth
Grandi et al. Towards Patient-Centric Healthcare: Multi-Version Ontology-Based Personalization of Clinical Guidelines
GB2573512A (en) Database and associated method
Yu et al. Object-relational data modelling for informetric databases
Ontology-Based Towards Patient-Centric Healthcare
Samadian et al. A data-driven approach to automatic discovery of prescription drugs in cardiovascular risk management

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

122 Ep: pct application non-entry in european phase