WO2003081462A3 - Selective updating of index in a search engine - Google Patents

Selective updating of index in a search engine Download PDF

Info

Publication number
WO2003081462A3
WO2003081462A3 PCT/GB2003/001121 GB0301121W WO03081462A3 WO 2003081462 A3 WO2003081462 A3 WO 2003081462A3 GB 0301121 W GB0301121 W GB 0301121W WO 03081462 A3 WO03081462 A3 WO 03081462A3
Authority
WO
WIPO (PCT)
Prior art keywords
pages
index
selective updating
search engine
leaf
Prior art date
Application number
PCT/GB2003/001121
Other languages
French (fr)
Other versions
WO2003081462A2 (en
Inventor
Barry David Otley Adams
Original Assignee
Magus Res Ltd
Barry David Otley Adams
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Magus Res Ltd, Barry David Otley Adams filed Critical Magus Res Ltd
Priority to AU2003212535A priority Critical patent/AU2003212535A1/en
Publication of WO2003081462A2 publication Critical patent/WO2003081462A2/en
Publication of WO2003081462A3 publication Critical patent/WO2003081462A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results

Abstract

To save on computational resources, a search engine is configured to perform selective updating of its index rather than full indexing. Selective updating operates on a previous index, by classifying the indexed pages as leaf and branch pages. Branch pages are those which include links to other pages deeper in the website, while leaf pages do not include such links. The selective updating procedure updates branch pages and new leaf pages more regularly than existing leaf pages.
PCT/GB2003/001121 2002-03-20 2003-03-18 Selective updating of index in a search engine WO2003081462A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003212535A AU2003212535A1 (en) 2002-03-20 2003-03-18 Selective updating of index in a search engine

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0206626.4 2002-03-20
GB0206626A GB2386712A (en) 2002-03-20 2002-03-20 Selective updating of an index of webpages by a search engine

Publications (2)

Publication Number Publication Date
WO2003081462A2 WO2003081462A2 (en) 2003-10-02
WO2003081462A3 true WO2003081462A3 (en) 2003-11-06

Family

ID=9933397

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2003/001121 WO2003081462A2 (en) 2002-03-20 2003-03-18 Selective updating of index in a search engine

Country Status (3)

Country Link
AU (1) AU2003212535A1 (en)
GB (1) GB2386712A (en)
WO (1) WO2003081462A2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2403559A (en) * 2003-07-02 2005-01-05 Sony Uk Ltd Index updating system employing self organising maps
RU2733482C2 (en) 2018-11-16 2020-10-01 Общество С Ограниченной Ответственностью "Яндекс" Method and system for updating search index database

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5748954A (en) * 1995-06-05 1998-05-05 Carnegie Mellon University Method for searching a queued and ranked constructed catalog of files stored on a network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5920859A (en) * 1997-02-05 1999-07-06 Idd Enterprises, L.P. Hypertext document retrieval system and method
US5864863A (en) * 1996-08-09 1999-01-26 Digital Equipment Corporation Method for parsing, indexing and searching world-wide-web pages
JPH1115851A (en) * 1997-06-27 1999-01-22 Hitachi Inf Syst Ltd Www page link control system and recording medium recording control processing program for the system
US6192375B1 (en) * 1998-07-09 2001-02-20 Intel Corporation Method and apparatus for managing files in a storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5748954A (en) * 1995-06-05 1998-05-05 Carnegie Mellon University Method for searching a queued and ranked constructed catalog of files stored on a network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BRIN S ET AL: "The anatomy of a large-scale hypertextual Web search engine", COMPUTER NETWORKS AND ISDN SYSTEMS, NORTH HOLLAND PUBLISHING. AMSTERDAM, NL, vol. 30, 1998, pages 107 - 117, XP002089959, ISSN: 0169-7552 *
CHO J ET AL: "EFFICIENT CRAWLING THROUGH URL ORDERING", INTERNET ARTICLE, 1998, pages 1 - 20, XP002253091, Retrieved from the Internet <URL:http://citeseer.nj.nec.com/cache/papers/cs/694/http:zSzzSzwww-db.stanford.eduzSzpubzSzpaperszSzefficient-crawling.pdf/cho98efficient.pdf> [retrieved on 20030828] *
CRIMMINS F: "Focused Crawling Review", INTERNET ARTICLE, 10 September 2001 (2001-09-10), XP002252828, Retrieved from the Internet <URL:http://funnelback.com/focused-crawler-review.html> [retrieved on 20030828] *
CRIMMINS F: "Web Crawler Review", INTERNET ARTICLE, 10 September 2001 (2001-09-10), XP002252829, Retrieved from the Internet <URL:http://dev.funnelback.com/crawler-review.html> [retrieved on 20030828] *
PIROLLI P ET AL: "Silk from a Sow's Ear: Extracting Usable Structures from the Web", INTERNET ARTICLE, 11 July 1996 (1996-07-11), XP002128179, Retrieved from the Internet <URL:http://www.acm.org/sigchi/chi96/proceedings/papers/Pirolli_2/pp2.html> [retrieved on 20000119] *

Also Published As

Publication number Publication date
GB0206626D0 (en) 2002-05-01
AU2003212535A8 (en) 2003-10-08
WO2003081462A2 (en) 2003-10-02
AU2003212535A1 (en) 2003-10-08
GB2386712A (en) 2003-09-24

Similar Documents

Publication Publication Date Title
WO2005060684A3 (en) Method and system for obtaining solutions to contradictional problems from a semantically indexed database
WO2007005463A3 (en) Collections of linked databases
ATE446547T1 (en) TOPIC-SPECIFIC SEARCH ENGINE
WO2007038301A3 (en) System and method for responding to a user query
WO2007087561A3 (en) System for searching
WO2005066847A3 (en) Systems and methods for improving search quality
WO2005074410A3 (en) System and method for indexing electronic text
WO2005013046A3 (en) Ranking search results using conversion data
GB0506628D0 (en) Trie search engines and ternary CAM used as pre-classifier
EP1182590A3 (en) Method, system, and program for gathering indexable metadata on content at a data repository
WO2007087379A3 (en) Data access using multilevel selectors and contextual assistance
WO2003098479A3 (en) Managing search expressions in a database system
DE69912410D1 (en) FAST STRING SEARCHING AND INDICATING
WO2005017667A3 (en) Performance prediction system with query mining
WO2006005001A3 (en) Method and system for automated intelligent electronic advertising
WO2007076269A3 (en) Multi-segment string search
WO2007033338A3 (en) Networked information indexing and search apparatus and method
WO2008011029A3 (en) Method and system for creating a concept-object database
WO2008051750A3 (en) Associating geographic-related information with objects
WO2005070019A3 (en) Contextual searching
WO2006133087A3 (en) Method and system for discovering antenna line devices
WO2005045632A3 (en) Utilizing cookies by a search engine robot for document retrieval
WO2007032834A3 (en) Source code file search
Fedderke The structure of growth in the South African economy: factor accumulation and total factor productivity growth 1970-97
WO2003081462A3 (en) Selective updating of index in a search engine

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP