WO2009078729A1 - Procédé d'amélioration d'efficacité de moteur de recherche - Google Patents

Procédé d'amélioration d'efficacité de moteur de recherche Download PDF

Info

Publication number
WO2009078729A1
WO2009078729A1 PCT/NO2008/000425 NO2008000425W WO2009078729A1 WO 2009078729 A1 WO2009078729 A1 WO 2009078729A1 NO 2008000425 W NO2008000425 W NO 2008000425W WO 2009078729 A1 WO2009078729 A1 WO 2009078729A1
Authority
WO
WIPO (PCT)
Prior art keywords
search
index
query
keyword
search engine
Prior art date
Application number
PCT/NO2008/000425
Other languages
English (en)
Inventor
Johannes Gehrke
Robbert Vanrenesse
Fred Schneider
Original Assignee
Fast Search & Transfer As
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from NO20080836A external-priority patent/NO327318B1/no
Application filed by Fast Search & Transfer As filed Critical Fast Search & Transfer As
Publication of WO2009078729A1 publication Critical patent/WO2009078729A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures

Definitions

  • the present invention concerns a method for improving search engine efficiency with respect to accessing, searching and retrieving information in the form of documents stored in document or content repositories, wherein an indexing subsystem of the search engines crawls the stored documents and generates an index thereof, wherein applying a user search query to the index shall return a result set of at least some query-matching documents to the user, and wherein the search engine comprises an array of search nodes hosted on one or more servers.
  • the invention discloses how to build a new framework for index distribution on a search engine, and even more particularly on an enterprise search engine.
  • the search engine needs to maintain high availability and high throughput even during hardware failures.
  • search engines use sophisticated methods for distributing their indices across a possibly large cluster of hosts.
  • fig. 1 shows a block diagram of a search engine as will be known to persons skilled in the art, its most important subsystems, and its interfaces respectively to a content domain, i.e. the repository of documents that may be subjected to a search, and a client domain comprising all users posing search queries to the search engine for retrieval of query-matching documents from the content domain.
  • a content domain i.e. the repository of documents that may be subjected to a search
  • client domain comprising all users posing search queries to the search engine for retrieval of query-matching documents from the content domain.
  • the search engine 100 of the present invention comprises various subsystems 101- 107.
  • the search engine can access document or content repositories located in a content domain or space wherefrom content can either actively be pushed into the search engine, or using a data connector be pulled into the search engine.
  • Typical repositories include databases, sources made available via ETL (Extract-Transform-Load) tools such as Informatica, any XML-formatted repository, files from file servers, files from web servers, document management systems, content management systems, email systems, communication systems, collaboration systems, and rich media such as audio, images and video. Retrieved documents are submitted to the search engine 100 via a content API
  • a content analysis stage 103 also termed a content preprocessing subsystem, in order to prepare the content for improved search and discovery operations.
  • the output of this content analysis stage 103 is an XML representation of the input document.
  • the output of the content analysis is used to feed the core search engine 101.
  • the core search engine 101 can typically be deployed across a farm of servers in a distributed manner in order to allow for large sets of documents and high query loads to be processed.
  • the core search engine 101 accepts user requests and produces lists of matching documents.
  • the document ordering is usually determined according to a relevance model that measures the likely importance of a given document relative to the query.
  • the core search engine 101 can produce additional metadata about the result set, such as summary information for document attributes.
  • the core search engine 101 in itself comprises further subsystems, namely an indexing subsystem 101a for crawling and indexing content documents and a search subsystem 101b for carrying out search and retrieval proper.
  • the output of the content analysis stage 103 can be fed into an optional alert engine 104.
  • the alert engine 104 will have stored a set of queries and can determine which queries that would have been satisfied by the given document input.
  • a search engine can be accessed from many different clients or applications which typically can be mobile and computer-based client applications. Other clients include PDAs and game devices. These clients, located in a client space or domain, submit requests to a search engine query or client API 107.
  • the search engine 100 will typically possess a further subsystem in the form of a query analysis stage 105 to analyze and refine the query in order to construct a derived query that can extract more meaningful information.
  • the output from the core search engine 101 is typically further analyzed in another subsystem, namely a result analysis stage 106 in order to produce information or visualizations that are used by the clients.
  • Both stages 105 and 106 are connected between the core search engine 101 and the client API 107, and in case the alert engine 104 is present, it is connected in parallel to the core search engine 101 and between the content analysis stage 103 and the query and result analysis stages 105; 106.
  • a first set of nodes comprises dispatch nodes N ⁇ , a second set of nodes search nodes Np and a third set of nodes indexing nodes N ⁇ .
  • the search nodes Np are grouped in columns which via the network are connected in parallel between the dispatch nodes N ⁇ and an indexing node N r
  • the dispatch nodes N ⁇ are adapted for processing search queries and search answers
  • the search nodes Np are adapted to contain search software
  • the indexing nodes N ⁇ are adapted for generally generating indexes / for the search software.
  • acquisition nodes N ⁇ are provided in a fourth set of nodes and adapted for processing the search answers, thus relieving the dispatch nodes of this task.
  • the two-dimensional scaling takes place respectively with a scaling of the data volume and a scaling of the search engine performance through a respective adaptation of the architecture.
  • this scalable search engine architecture is shown in fig. 2, illustrating the principle of two-dimensional scaling.
  • An important benefit of this architecture is that the query response time is essentially independent of catalogue size, as each query is executed in parallel on all search nodes Np.
  • the architecture is inherently fault-tolerant such that faults in individual nodes will not result in a system breakdown, only in a temporary reduction of the performance.
  • the architecture shown in fig. 2 provides a multilevel data and functional parallelism such that large volumes of data can be searched efficiently and very fast by a large number of users simultaneously, it is encumbered with certain drawbacks and hence far from optimal. This is due to the fact that the row and column architecture is based on a mechanical and rigid partition scheme, which does not take account of modalities in the keyword distribution and the user behaviour, as expressed by frequency distributions of search terms or keywords, and access patterns.
  • US Patent No. 7,293,016 B l discloses how to arrange indexed documents in an index according to a static ranking and partitioned according to that ranking.
  • the index partition is scanned progressively, starting with a partition containing those documents with the highest static rank, in order to locate documents containing a search word, and a score is computed based on a present set of documents located thus far in the search and on basis of the range of static ranks to a next partition to be scanned.
  • the next partition is scanned to locate the documents containing a search word when the calculated score is above a target score. A search can be stopped when no more relevant results will be found in the next partition.
  • 2008/033943 Al (Richards & al., assigned to BEA Systems, Inc.) concerns a distributed search system with a central queue of document-based records wherein a group of nodes is assigned to different partitions, indexes for a group of documents are stored in each partition, and the nodes in the same partition independently process document-based records from the central queue in order to construct the indexes.
  • the present invention does not take specific ranking algorithms into account since it is assumed that the user always wants all query results.
  • these ideas can be extended in a straightforward manner to some of the recently developed ranking algorithms [RPB06, AM06, LLQ + 07] and algorithms for novel query models [CPD06, LT1T07, ZS07, DEFS06, TKT06, JRMG06, YJ06, KCMK06].
  • Algorithms for finding the best matching query results when combining matching functions have also been the focus of much research [PZSD96, Fag99, MYL02].
  • Another object of the present invention is to provide a method that significantly enhances the performance of a search engine.
  • Another object of the present invention is to configure the index of a search engine and specifically an enterprise search engine on basis of recognizing that keywords and documents will differ both with regard to intrinsic as well as to extrinsic properties, for instance such as given by modalities in search and access patterns.
  • fig. 1 shows a simplified block diagram of a search engine, as known in the art and discussed hereinabove
  • fig. 2 a diagram of a scalable search engine architecture, as used for the prior art AllTheWeb search service and discussed hereinabove
  • fig. 3 the concept of a mapping function
  • fig. 4 the concept of host assignment
  • fig. 5 the concept of mapping functions for rows and columns
  • fig. 6 the concept of a classification of keywords.
  • Each document d is a list of keywords, and is identified by a unique identifier called a URL.
  • An occurrence is a tuple (/ ⁇ u) which indicates that the document associated with the URL u contains the keyword K.
  • a document record is a tuple (u, date) that indicates that the document associated with the URL u was created at a given date.
  • an occurrence contains other data, for example the position of the keyword in the document or data that are useful for determining the ranking of the document in the output of a query.
  • a document has other associated metadata besides the document record, for example an access control list. Neither of these issues are important for the aspects of the index which are the focus of the following discussion.
  • the index of a search engine consists of sets of occurrences and a set of document records. There is one set of occurrences for each keyword K, hereinafter called the posting set of keyword K.
  • the posting set of keyword K contains all occurrences of keyword K, and it contains only occurrences of keyword K.
  • posting sets are presumed to be ordered in a fixed order (for example, lexicographically by URL), and the ordered posting set of a keyword k will be referred to as the posting list PL(/ ⁇ ) of keyword K in the following disclosure.
  • the set of document records contains one document record for each document, and it only contains document records.
  • the present invention adopts a model for a query in which a user would like to find every document that contains all the keywords in the query.
  • a query workload ⁇ is a function that associates with each query q e 2 an arrival rate ⁇ ⁇ (q). From a query workload one can compute the arrival rate ⁇ ⁇ ( ⁇ ) of each keyword ⁇ " by summing over all the queries that contain K, formally
  • QueryResult(g) There are more sophisticated ways of defining QueryResult(g); for example, the user may only want to see a subset of QueryResult(g), and also may want to see this subset in ranked order.
  • Each host h is assumed to be capable of an associated overall performance that allows it to retrieve buc(/z) units of storage within latencyBound milliseconds; this number is an aggregated unit that incorporates CPU speed, the amount of main memory available, and the latency and transfer rate of the disk of the host. Further, in the following, all hosts are assumed to have identical performance, and thus the dependency of buc(/z) on h can be dropped and reference just be made to buc as the number of units that any host can retrieve within latencyBound milliseconds.
  • framework or architecture as realized according to the method of the present invention encompasses three aspects, viz. partitioning, replication and host assignment, as set out below.
  • each keyword its posting list is partitioned into one or more components. This partitioning of the posting lists into components is done in order to be able to distribute the posting lists across multiple hosts such that all components can be retrieved in parallel.
  • each of its components is replicated a certain number of times resulting in several component-replicas for each component.
  • Component-replicas are created for several reasons.
  • the first reason for replication is fault-tolerance; in case a host that stores a component fails, the component can be read from another host.
  • the second reason for replication is improved performance because queries can retrieve a component from anyone of the hosts on which the component is replicated and thus load can be balanced.
  • each component-replica of a posting list is assigned to a host, but with the assignment subject to the restirction that no two component-replicas of the same component and the same partition are assigned to the same host.
  • the host assignment enables the location of components to be optimized globally across keywords. One could for example co-locate components of keywords that appear commonly together in queries to reduce the cost of query processing.
  • the right numPartitions( ⁇ ) components are combined, then they together comprise PL(/c); for any component COj( ⁇ ) one can find numReplicas(x) identical component-replica.
  • ⁇ keyword /chas arrival rate ⁇ ⁇ ( ⁇ ) and one uniformly balances the load between numReplicas(/c) component-replica, then the arrival rate for this keyword for each of the component-replicas will be
  • For the third part select a function hostAssign(/e, i,j) that takes as input a keyword K, a replica number i and a component number j and returns the host that stores component-replica / of component / of the posting list in PL( ⁇ ). Note that two identical component-replicas (that are replicas of each other) must be mapped to different hosts.
  • hostAssign(/c Z 1 , j) ⁇ hostAssign(/f, i 2 , j) must hold for /e ⁇ 1 , ..., numPartitions(/c) ⁇ and i ⁇ , / 2 e ⁇ 1, ..., numPartitions( ⁇ ) ⁇ with i ⁇ ⁇ / 2 .
  • Figures 3 and 4 show an exemplary instantiation on the framework according to the present invention for a keyword K with a posting list with eight occurrences: A, B, C, D, E, F, G, and H.
  • numPartitions( ⁇ ) 4, (i.e.
  • Five hosts h ⁇ , h 2 , h ⁇ , /z 4 , and Zz 5 are given.
  • the function hostAssign(/r, 1, 2) h ⁇
  • hostAssign(/c, 2, 2) Zz 2
  • hostAssign(/ ⁇ 3, 1) h $ .
  • numPartitions(-) numReplicas(-) and hostAssign (K, i,j) shall be called a search engine index configuration.
  • Processing a query q involves three steps: 1. For each keyword K e q identify a set of hosts such that if the union of the component-replicas stored at the hosts comprises PL( ⁇ ) . numReplicas(*r) > 1, then there is more than one such set, and one can choose between different sets based on other characteristics, for example the load of a host.
  • function hostAssign(/ ⁇ i,j) encodes for each keyword ⁇ : the set of hosts where all the component-replicas of the posting list of K are stored.
  • each host involved in processing query q retrieves all its local component-replicas for all keywords involved in the query.
  • each host will first intersect the local component-replica of all the keyword. Then the results of the local intersections are processed further to complete computation of QueryResult(g).
  • QueryResult(g) the problem of index design can be defined as follows. A set of hosts that has associated storage space DiskSize and performance buc is given. Also given is a set of keywords with posting lists PL( ⁇ 1 ), ..., PL(K ⁇ ) that have sizes I PL( ⁇ r 1 ) I , ..., I PL( ⁇ OT ) I , as well as a query workload ⁇ .
  • the AllTheWeb Rows and Columns architecture (in homage to the AllTheWeb search system as described in the introduction hereinabove) is a trivial instantiation of the framework, cf. fig. 5 which renders the mapping functions for host assignment.
  • this architecture there is a matrix of hosts consisting of r rows and c columns. One can visualize this matrix as follows:
  • the postings of any keyword are approximately evenly partitioned into c components. Each component is then replicated within the column, one component-replica for each row, resulting in r component-replicas.
  • To reconstruct the posting list of a keyword one host from each column needs to be accessed, but it is not necessary to select these hosts all from the same row, and this flexibility simplifies query load balancing between hosts and improves fault tolerance.
  • OCCLOC(( ⁇ : 1 , U)) occLoc(( ⁇ " 2 , u)), i.e. for a URL u
  • the function OCCLOC(( ⁇ , U)) is independent of the keyword K.
  • AllTheWeb Rows and Columns has several disadvantages. Firstly, the number of hosts accessed for a keyword K ⁇ S independent of the length of fds posting list; c hosts must always be accessed even for keywords with very short posting lists. Secondly, AllTheWeb Rows and Columns does not take keyword popularity in the query workload into account; every component is replicated r times even if the associated keyword is accessed only quite infrequently. Thirdly, changes in the physical setup for AllTheWeb Rows and columns are constrained to additions of hosts in multiples of c or r at once, resulting in an additional row or an additional column in the architecture.
  • each component is sized such that it can be read within the query latency requirement from a single host. Note that for a keyword having very short posting lists, one (or very few) components are created, whereas for keywords having very long posting lists, many components are created.
  • component-replicas should be created for a keyword K. Recall that component-replicas are created for fault tolerance and in order to distribute the query workload across hosts. To tolerate/ unavailable hosts, numRepIicas( ⁇ ) >/is enforced. To balance the query workload, posting lists of popular keywords (in the query workload) are replicated more often than posting lists of rare keywords. So the number of replicas is made inversely proportional to the arrival rate of the keyword in the workload.
  • a third instantiation of the framework as realized by the method of the present invention is a special case of Fully Adaptive Rows and Columns that results in much simpler (and cheaper) query processing.
  • AllTheWeb Rows and Columns it is assumed that r ⁇ c hosts are arranged in the usual matrix of hosts.
  • the keyword is classified along two axes.
  • the first axis is the size of the posting list, where keywords are partitioned into short and long keywords based on the size of their posting lists.
  • the second axis is the arrival rate of the keywords in the query workload where keywords are partitioned into popular and unpopular keywords based on their arrival rate. This results in four different classes of keywords:
  • SP Short popular
  • the posting lists of an SP keyword tc are not partitioned, and r component-replicas of rds posting list are created to distribute /ds arrival rate across hosts.
  • numReplicas( ⁇ ) r.
  • rowHash(v) be as defined and disclosed in connection with the discussion of fully adaptive Rows and Columns hereinabove.
  • rowHash(v) be a function from K x ⁇ 1, ...,/ ⁇ to ⁇ 1, ..., r) such that rowHash(/ ⁇ Z 1 ) ⁇ rowHash( ⁇ -, i 2 ) for Z 1 , i 2 e ⁇ 1, ..., r ⁇ .
  • rowHash(/ ⁇ Z 1 ) ⁇ rowHash( ⁇ -, i 2 ) for Z 1 , i 2 e ⁇ 1, ..., r ⁇ .
  • the posting list for any SU keyword just resides on a single host. Assuming without loss of generality that
  • the posting list of K 1 is partitioned, but not the posting list of ⁇ ⁇ .
  • LP must partition the posting list of ⁇ ⁇ on-the-fly and send it to all the hosts in the row where K 1 is stored.
  • LP created according to AHTheWeb Rows and Columns and thus query processing can proceed as in AHTheWeb Rows and Columns.
  • LP 5 SP Partition the posting list of K 1 and then send it to the hosts in one of the rows.
  • SP SP send the posting list of ⁇ ⁇ to one of the hosts where the posting list of K 1 resides, or vice versa.
  • applying the method of the present invention allows an extension of the search system framework that permits each keyword to have more than a single row-and-column instance. This shall be described immediately below.
  • This extension can be characterized by associating sets of functions from the resulting framework with each keyword applying the method of the invention; for example, a keyword K could have two sets of functions ⁇ numPartitionsi(yc), numReplicasi(/ ⁇ ) ⁇ and ⁇ numPartitions 2 (/s:), numReplicas 2 (xr) ⁇ .
  • the number of sets could be keyword-dependent. This greatly increases the possible choices for query processing.
  • this extension formally shall not be introduced formally herein since it is conceptually straightforward.
  • the method of the present invention realizes a framework for distributing the index of a search engine across several hosts in a computing cluster.
  • the framework as disclosed distinguishes three orthogonal mechanisms for distributing a search index: Index partitioning, index replication, and assignment of replicas to hosts. Instantiations of these mechanisms yield different ways of distributing the index of a search engine, including popular methods from the literature and novel methods that by far outperform the prior art in terms of resource usage and performance while achieving the same level of fault tolerance.
  • the method of the present invention for the first time recognizes that different keywords and different documents in a search engine might have different properties (such as length or frequency of access).
  • the framework realized by applying the method of the present invention creates a configuration of the index of a search engine according to these properties.
  • the framework also serves to outline how to process queries for the space of configurations made possible by its realizations.
  • ODISSEA A peer-to-peer architecture for scalable web search and information retrieval.
  • Vassilis Christophides and Juliana Freire, editors, WebDB, pages 67-72, 2003.
  • [TD04] Chunqiang Tang and Sandhya Dwarkadas. Hybrid global-local indexing for efficient peer-to-peer information retrieval.
  • NSDI pages 211-224. USENIX, 2004.
  • [TGM93] A. Tomasic and H. Garcia-Molina. Performance of inverted indices in shared-nothing distributed text document information retrieval systems [ selected best paper].
  • PDIS '93 pages 8-17, Los Alamitos, Ca., USA, January 1993. IEEE Computer Society Press.
  • TKT06 Taro Tezuka, Takeshi Kurashima, and Katsumi Tanaka. Toward tighter integration of web search with a geographic information system. In Les Carr, David De Roure, Arun Iyengar, Carole A. Goble, and Michael Dahlin, editors, WWW, pages 277-286. ACM, 2006.

Abstract

L'invention porte sur un procédé pour améliorer l'efficacité d'un moteur de recherche en termes d'accès, de recherche et de récupération d'informations sous la forme de documents stockés dans des dépôts de documents ou de contenus, le moteur de recherche comprenant un réseau de nœuds de recherche hébergés sur un ou plusieurs serveurs. Un index du document stocké est créé. Le moteur de recherche traite une demande de recherche utilisateur et renvoie un ensemble de résultats de documents correspondant à la recherche. L'index du moteur de recherche est configuré sur la base d'une ou plusieurs propriétés de document et partitionné, répliqué et distribué sur le réseau des nœuds de recherche. Les demandes de recherche sont traitées sur la base de l'index distribué. Le procédé permet d'obtenir une infrastructure pour distribuer l'index d'un moteur de recherche sur plusieurs hôtes dans un groupe d'ordinateurs, reposant sur trois mécanismes orthogonaux pour une distribution d'index, à savoir le partitionnement d'index, la réplication d'index et l'affectation des répliques aux hôtes. De cette manière, différentes façons de configurer l'index d'un moteur de recherche sont obtenues et permettent une utilisation et une performance des ressources bien meilleures, en combinaison avec un quelconque niveau souhaité de tolérance aux défaillances.
PCT/NO2008/000425 2007-12-14 2008-12-01 Procédé d'amélioration d'efficacité de moteur de recherche WO2009078729A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US1370507P 2007-12-14 2007-12-14
US61/013,705 2007-12-14
NO20080836 2008-02-15
NO20080836A NO327318B1 (no) 2008-02-15 2008-02-15 Fremgangsmåte for å forbedre effektiviteten til en søkemotor

Publications (1)

Publication Number Publication Date
WO2009078729A1 true WO2009078729A1 (fr) 2009-06-25

Family

ID=40795718

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/NO2008/000425 WO2009078729A1 (fr) 2007-12-14 2008-12-01 Procédé d'amélioration d'efficacité de moteur de recherche

Country Status (1)

Country Link
WO (1) WO2009078729A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112835905A (zh) * 2021-02-05 2021-05-25 上海达梦数据库有限公司 一种数组类型列的索引方法、装置、设备以及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000068834A1 (fr) * 1999-05-10 2000-11-16 Fast Search & Transfer Asa Moteur de recherche dote d'une architecture parallele, bidimensionnelle, echelonnable de facon lineaire
US6182063B1 (en) * 1995-07-07 2001-01-30 Sun Microsystems, Inc. Method and apparatus for cascaded indexing and retrieval
US6507837B1 (en) * 2000-06-08 2003-01-14 Hyperphrase Technologies, Llc Tiered and content based database searching
US20050102270A1 (en) * 2003-11-10 2005-05-12 Risvik Knut M. Search engine with hierarchically stored indices

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6182063B1 (en) * 1995-07-07 2001-01-30 Sun Microsystems, Inc. Method and apparatus for cascaded indexing and retrieval
WO2000068834A1 (fr) * 1999-05-10 2000-11-16 Fast Search & Transfer Asa Moteur de recherche dote d'une architecture parallele, bidimensionnelle, echelonnable de facon lineaire
US6507837B1 (en) * 2000-06-08 2003-01-14 Hyperphrase Technologies, Llc Tiered and content based database searching
US20050102270A1 (en) * 2003-11-10 2005-05-12 Risvik Knut M. Search engine with hierarchically stored indices

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112835905A (zh) * 2021-02-05 2021-05-25 上海达梦数据库有限公司 一种数组类型列的索引方法、装置、设备以及存储介质

Similar Documents

Publication Publication Date Title
US8799264B2 (en) Method for improving search engine efficiency
Koloniari et al. Peer-to-peer management of XML data: issues and research challenges
US8938459B2 (en) System and method for distributed index searching of electronic content
Tang et al. pSearch: Information retrieval in structured overlays
Suel et al. ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval.
US20180276250A1 (en) Distributed Image Search
Bender et al. Improving collection selection with overlap awareness in p2p search engines
Yu et al. Effective keyword-based selection of relational databases
Deng et al. Scalable column concept determination for web tables using large knowledge bases
US20150039629A1 (en) Method for storing and searching tagged content items in a distributed system
Luo et al. Storing and indexing massive RDF datasets
Kulkarni et al. Shard ranking and cutoff estimation for topically partitioned collections
Luu et al. Alvis peers: a scalable full-text peer-to-peer retrieval engine
Koren et al. Searching and navigating petabyte-scale file systems based on facets
WO2009078729A1 (fr) Procédé d'amélioration d'efficacité de moteur de recherche
Spence et al. Location based placement of whole distributed systems
KR102049420B1 (ko) 분산 데이터베이스에서의 복제본이 존재하는 데이터에 대한 질의 병렬화 방법
Pham et al. Building a Library Search Infrastructure with Elasticsearch
Ren et al. haps: Supporting effective and efficient full-text p2p search with peer dynamics
Li et al. Query-driven frequent Co-occurring term computation over relational data using MapReduce
Paik et al. WS-CatalogNet: building peer-to-peer e-catalog
Mass et al. KMV-peer: a robust and adaptive peer-selection algorithm
Markova et al. Distributed Data Addressed in Natural Language
Chung et al. Cross-organisation dataspace (COD)-architecture and implementation
Shen et al. Hilbertchord: A p2p framework for service resources management

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08862247

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08862247

Country of ref document: EP

Kind code of ref document: A1