CN1402156A - Web site information extracting system and method - Google Patents
Web site information extracting system and method Download PDFInfo
- Publication number
- CN1402156A CN1402156A CN 01123635 CN01123635A CN1402156A CN 1402156 A CN1402156 A CN 1402156A CN 01123635 CN01123635 CN 01123635 CN 01123635 A CN01123635 A CN 01123635A CN 1402156 A CN1402156 A CN 1402156A
- Authority
- CN
- China
- Prior art keywords
- extraction
- document
- word
- web
- search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
A system and method for extracting the site information, that it to browse and filter the page data of WWW is composed of a finder for finding home page data and sending the resultant to page searching file, a data extractor for extracting the contents from said file to create an extracted file, and a memory device for storing the finding condition, filtering condition, and said page searching file. The finding data come from different sites can be displayed by said system.
Description
(1) technical field
The present invention relates to a kind of text retrieval system and method, particularly a kind of webpage text retrieval system and method in World Wide Web (world wide web).
(2) background technology
Nowadays because the development of internet (Internet), the transmission of information with share also all the more fast with convenient.The user be as long as just can be connected on the World Wide Web (world wide web) that global website forms via the internet, and can use data or information on the World Wide Web.And at present, search device (searchengine) or webpage text retrieval system often are used for searching or retrieving its needed data by the user on the World Wide Web.
Please refer to Fig. 1, show the method flow synoptic diagram that traditional search device is searched the World Wide Web.At first, the user imports key word or the theme that desire is searched in search device, is then connected the World Wide Web and is begun retrieval by search device.Immediately, search device will meet web page address (URL) row of the key word imported or theme and give the user, be connected to those URL to browse its content by the user again.Though and above-mentioned traditional method is easy, have following shortcoming:
(1),, the palpus user just can see content but still being connected to the webpage of this URL again though search device has retrieved the URL that is relevant to key word.And, often comprise the unwanted data of user in the webpage, for the user, inconvenience very may need to utilize a text searching just can find needed data again.
(2) user can't compare its correlativity mutually at the web data of the URL that search device retrieved.For example, if the user searches is the price of a product, then the user can't to compare the product price of which website according to the result that search device retrieved among Fig. 1 the most cheap.
(3) summary of the invention
Therefore, the object of the present invention is to provide a kind of Web site information extracting system and method.The user can retrieve the needed data of user, and show all search data by native system by System and method for of the present invention from the World Wide Web, be beneficial to the result for retrieval that the user browses different web pages.
According to purpose of the present invention, a kind of Web site information extracting system is proposed, this system is connected with World Wide Web (world wide web) by internet (lnternet), in order to browse and to filter the web data of World Wide Web.This Web site information extracting system comprises a search device, a data extraction element and a memory storage at least.Wherein, search device is connected with the World Wide Web by the internet, and a Web search condition that sets in order to the foundation user is searched the web data in the World Wide Web, and search result is outputed in the search page document.And the data extract device is used to receive search page document, and extracts the content of search page document and form an extraction document according to the home page filter condition that the user sets.Memory storage is used to store Web search condition, home page filter condition, search page document and extraction document.
Wherein, the data extract device also comprises a hurdle extraction unit, a tag delete unit and a paragraph extraction unit.Wherein, the hurdle extraction unit is used for extracting the column number certificate that search page document sets.And the tag delete unit is used for deleting all web displaying control marks (tag) of search page document.The paragraph extraction unit is used for deletion or keeps the whole paragraph of search page document, and can be used for deleting the literal to be deleted in the search page document.
According to purpose of the present invention, a kind of site information extracting method is proposed in addition, use the web data of browsing and filter the World Wide Web for the user, this site information extracting method is at first set a Web search condition and a home page filter condition for the user.Then search web data in the World Wide Web, and export search result to a search page document according to the Web search condition.Next, extract the content of search page document and form an extraction document according to the home page filter condition.
Wherein, the step of extracting the content of search page document and forming extraction document according to the home page filter condition of this site information extracting method also comprises deletion or keeps in the search page document in the data of extracting between paragraph banner word and this extraction paragraph end word; Data in the extraction search page document between extraction hurdle banner word and extraction hurdle end word and all the web displaying control marks in the deletion search page document.
For above-mentioned purpose of the present invention, feature and advantage can be become apparent, a most preferred embodiment cited below particularly, and conjunction with figs. are described in detail below.
(4) description of drawings
Fig. 1 shows the method flow synoptic diagram that traditional search device is searched the World Wide Web.
Fig. 2 shows the system construction drawing according to a kind of Web site information extracting system of a most preferred embodiment of the present invention.
Fig. 3 shows the system block diagram of the Web site information extracting system 201 among Fig. 2.
Fig. 4 shows the system block diagram of the data extract device 303 among Fig. 3.
Fig. 5 shows the method flow synoptic diagram of the extraction site information of the Web site information extracting system 201 among Fig. 2.
Fig. 6 shows the process flow diagram of the site information extracting method of the Web site information extracting system 201 among Fig. 2.
Fig. 7 shows data extract setup unit 401 and sets the setting interface synoptic diagram that paragraph extracts.
Fig. 8 shows data extract setup unit 401 and sets the setting interface synoptic diagram that the hurdle extracts.
Fig. 9 shows the setting interface synoptic diagram that data extract setup unit 401 is set tag delete.
Figure 10 shows the substep process flow diagram of the step 605 among Fig. 6.
(5) embodiment
Please refer to Fig. 2, it shows the system construction drawing according to a kind of Web site information extracting system of a most preferred embodiment of the present invention.In Fig. 2, Web site information extracting system 201 is connected with World Wide Web (world wide web) 205 by internet (lnternet) 203.Wherein, World Wide Web 205 comprises a plurality of websites (web site) 207.And Web site information extracting system 201 can provide the user in order to browsing the webpage of each website 207 of searching global information 205, and can filter out unnecessary data and extract needed web data of user and column number certificate.
Then please refer to Fig. 3, it shows the system block diagram of the Web site information extracting system 201 among Fig. 2.As shown in Figure 3, Web site information extracting system 201 comprises search device 301, data extract device 303, memory storage 305, search device setting device 307 and monitor (monitor) 309.Wherein, the Web search condition that search device setting device 307 provides the user to set, and this Web search condition is used for the webpage of search device 301 which website of judgement and need be searched, which webpage does not need is retrieved.And search device 301 is connected with World Wide Web 205 via internet 203, in order to meet the web data of Web search condition in each website 207 of searching and extracting World Wide Web 205.Search device 301 outputs to a search page document with above-mentioned search result, and search page document is stored in the memory storage 305.
At this moment, this search page document is the webpage raw data, and it comprises web displaying control mark (tag) and the unwanted data of user.And the data extract device is used for a home page filter condition setting according to the user, extracts needed data content of user or hurdle from search page document, and is stored as an extraction document.In addition, monitor 309 is in order to show the content of extraction document.And memory storage 305 is in order to store above-mentioned Web search condition, home page filter condition, search page document and extraction document.
Then please refer to Fig. 4, it shows the system block diagram of the data extract device 303 among Fig. 3.As shown in Figure 4, data extract device 303 comprises data extract setup unit 401, hurdle extraction unit 403, tag delete unit 405 and paragraph extraction unit 407.Wherein, data extract setup unit 401 is used for setting above-mentioned home page filter condition for the user.And the home page filter condition can comprise that also extraction hurdle banner word of setting, an extraction hurdle finish word, an extraction paragraph banner word, an extraction paragraph end word and a literal to be deleted.
In addition, data extract setup unit 401 also can be set the execution sequence of hurdle extraction unit 403, tag delete unit 405 and paragraph extraction unit 407 for user's elasticity, so that can extract the needed data of user smoothly.
Please refer to Fig. 5, it shows the method flow synoptic diagram of the extraction site information of Web site information extracting system 201 among Fig. 2.For example the user wants to retrieve the price of this product of PDA in the related web site.At first search device 301 retrieves the search page document that content is the original web page data according to the Web search condition in each website 207.Then from search page document, extract the needed data of user, and be stored as extraction document by data extract device 303.As shown in Figure 5, the user can directly see the commodity and the price of each related web site from extraction document, and the network address that needn't be connected to each website just can be seen content.
Then please refer to Fig. 6, it shows the process flow diagram of the site information extracting method of Web site information extracting system 201 among Fig. 2.In step 601, the user sets Web search condition and home page filter condition respectively in search device setting device 307 and data extract setup unit 401.And the setting of Web search condition comprises at least:
(1) searching network address sets: the user sets a network address at least and connects search for search device 301.
(2) full-text search condition enactment: the user sets a search key at least, judges whether to extract the data of the web page contents of this network address for search device 301.
(3) the network address search condition is set: the user can select to set a special word, judges a network address if comprise this special word for search device 301, i.e. this web page contents is extracted in decision.
(4) searching url-path sets: the user can select to set a path key word, judges whether comprise this path key word in the network address for search device 301, whether continues to search the sub-directory of this network address with decision.
(5) account number cipher is set: the user can select to set an account number and password, and when a network address needed account number and password just can inspect, search device 301 will be logined with predefined account number of user and password.
(6) search the degree of depth: the user can select to set the degree of depth when searching the website.
In addition, the user utilizes data extract setup unit 401 to set and whether carries out hurdle extraction unit 403, tag delete unit 405 and paragraph extraction unit 407 and execution sequence thereof.In this embodiment, be that execution sequence is that example describes with the order of paragraph extraction unit 407, hurdle extraction unit 403, tag delete unit 405, but the present invention is not as limit.Please refer to Fig. 7 simultaneously, it shows data extract setup unit 401 and sets the setting interface synoptic diagram that paragraph extracts.Drop down menu 701 among Fig. 8 can be extracted for the selected paragraph of user, the hurdle extracts or the tag delete option, can set the execution sequence of paragraph extraction unit 407, hurdle extraction unit 403, tag delete unit 405 whereby.As shown in Figure 7, the user utilizes drop down menu 701 setting data extraction elements 303 will at first carry out paragraph extraction unit 407, and the operation that the user can select to set paragraph extraction unit 407 is that paragraph extracts or word string is extracted:
Whether (1) paragraph extracts: the user sets and extracts the paragraph banner word and extract paragraph and finish word, and set to delete or keep at first option 703 and extracting paragraph banner word and the literal that extracts between the paragraph end word.The user can utilize second option 705 to set whether selected paragraph comprises extraction paragraph banner word and the extraction paragraph finishes word in addition.
(2) word string is extracted: the user imports literal to be deleted.
Then please refer to Fig. 8, it shows data extract setup unit 401 and sets the setting interface synoptic diagram that the hurdle extracts.In Fig. 8, the user chooses the hurdle extraction and carries out hurdle extraction unit 403 in regular turn with setting data extraction element 303.And extraction hurdle banner word more than the user can import at least one group and extraction hurdle finish word, so that hurdle extraction unit 403 extracts in the column number certificate of extracting between hurdle banner word and the extraction hurdle end word.
Please refer to Fig. 9, it shows the setting interface synoptic diagram that data extract setup unit 401 is set tag delete.In Fig. 9, the user chooses tag delete will carry out paragraph extraction unit 407 with setting data extraction element 303 the 3rd step.Wherein, the user can select whether to delete blank line.
Then in step 603 shown in Figure 6, search device 301 is searched the web data of each website 207 in the World Wide Web 205 according to the setting in the Web search condition, and extraction meets the web data of Web search condition and outputs to search page document.Then carry out step 605.
In step 605, data extract device 303 is according to the home page filter condition of setting, and extracts content and form extraction document from search page document.And the detailed substep of this step please refer to Figure 10.Figure 10 shows the substep process flow diagram of the step 605 among Fig. 6.In step 1001, in the data of extracting between paragraph banner word and the extraction paragraph end word, perhaps delete the literal to be deleted that the user sets in 407 deletions of paragraph extraction unit or the reservation search page document.
Then in step 1003, hurdle extraction unit 403 extracts in the web page files in the data of extracting between hurdle banner word and the extraction hurdle end word.Then carry out step 1005, all the web displaying control marks in the tag delete unit 405 deletion search page documents.In step 607, monitor 309 shows the content of extraction document to the user.So promptly finished site information extracting method of the present invention.
Among the foregoing description, with paragraph extract, the hurdle extracts, the order of tag delete is that data extract device 303 extracts content formation extraction document from search page document sequence of operation is that example describes, but the present invention is not as limit.The user can set up on their own so that can reach and extract suitable data.
Web site information extracting system that the above embodiment of the present invention is disclosed and method, remove by above-mentioned setting step, having substituted manpower handles outside the extensive work load of data search extraction and arrangement, target data for the locking extraction, can also be by the setting of extraction system flow process, reach the effect that upgrades in time, also more efficient than general search device for the grasp of data promptness; In addition, the present invention also has following advantage:
(1) Web site information extracting system of the present invention is desired the data retrieved extraction with the user and is shown, and filters out unwanted data, and the time that the user searches has again been saved in very convenient user's reading.
(2) Web site information extracting system of the present invention all shows side by side with meeting the needed data of user in each website 207 in the World Wide Web 205, is convenient to the user relatively data dependence and the otherness of different web pages.
In sum; though the present invention discloses as above with a most preferred embodiment; but it is not in order to limit the present invention; any those of ordinary skill in affiliated field; under the premise without departing from the spirit and scope of the present invention; should make various modifications, so protection scope of the present invention should be as the criterion with accompanying claims institute restricted portion.
Claims (15)
1. a Web site information extracting system is connected with World Wide Web (world wide web) by internet (Internet), and in order to browse and to filter the web data of this World Wide Web, described Web site information extracting system comprises at least:
A search device is connected with the World Wide Web via the internet, and a Web search condition that sets in order to the foundation user is searched the web data in this World Wide Web, and search result is outputed in the search page document;
A data extraction element is used to receive described search page document, and extracts the content of described search page document and form an extraction document according to the home page filter condition that the user sets; And
A memory storage is used to store described Web search condition, described home page filter condition, described search page document and described extraction document.
2. the system as claimed in claim 1, wherein said system also comprises a monitor (monitor), described monitor is used to show the content of described extraction document.
3. the system as claimed in claim 1, wherein said home page filter condition comprises that also an extraction hurdle banner word, an extraction hurdle finish word, an extraction paragraph banner word, an extraction paragraph end word and a literal to be deleted, and described data extract device also comprises:
A hurdle extraction unit is used for extracting the data of described search page document between described extraction hurdle banner word and described extraction hurdle end word;
A tag delete unit is used for deleting all web displaying control marks (tag) of described search page document; And
A paragraph extraction unit is used for deleting or keeps described search page document and finishes data between the word at described extraction paragraph banner word and described extraction paragraph, also can be used for deleting the literal described to be deleted in the described search page document.
4. system as claimed in claim 3, wherein said data extract device also comprises a data extraction setup unit, described data extract setup unit is used for setting described home page filter condition for described user.
5. the system as claimed in claim 1, wherein said system also comprises a search device setting device, described search device setting device is used for setting described Web search condition for described user.
6. a site information extracting method is used for browsing and filter the web data of World Wide Web for a user, and described site information extracting method comprises:
Described user sets a Web search condition and a home page filter condition;
According to described Web search condition, search the web data in the described World Wide Web, and search result is outputed in the search page document; And
According to described home page filter condition, extract the content of described search page document and form an extraction document.
7. method as claimed in claim 6, wherein said method also comprises:
The content that shows described extraction document.
8. method as claimed in claim 6, wherein said home page filter condition comprises that also an extraction hurdle banner word, extraction hurdle finish word, one and extract paragraph banner word and one and extract paragraph and finish word, and the step of extracting the content of described search page document and forming an extraction document according to described home page filter condition also comprises:
Deletion or keep in the described search page document data between described extraction paragraph banner word and described extraction paragraph end word;
Extract the data between described extraction hurdle banner word and described extraction hurdle end word in the described search page document; And
Delete all the web displaying control marks in the described search page document.
9. method as claimed in claim 6, wherein said home page filter condition comprises that also an extraction hurdle banner word, extraction hurdle finish word and a literal to be deleted, and the step of extracting the content of described search page document and forming an extraction document according to described home page filter condition also comprises:
Delete the literal described to be deleted in the described search page document;
Extract the data between described extraction hurdle banner word and described extraction hurdle end word in the described search page document; And
Delete all the web displaying control marks in the described search page document.
10. method as claimed in claim 6, wherein said home page filter condition comprises that also an extraction hurdle banner word and extraction hurdle finish word, and the step of extracting the content of described search page document and forming an extraction document according to described home page filter condition also comprises:
Extract the data between described extraction hurdle banner word and described extraction hurdle end word in the described search page document; And
Delete all the web displaying control marks in the described search page document.
11. a computer-readable recording medium comprises a program that is used to carry out the site information extracting method, wherein said method is used for browsing and filter the web data of World Wide Web for the user, and described site information extracting method comprises:
Described user sets a Web search condition and a home page filter condition;
According to described Web search condition, search the web data in the described World Wide Web, and search result is outputed in the search page document; And
According to described home page filter condition, extract the content of described search page document and form an extraction document.
12. computer-readable recording medium as claimed in claim 11, wherein said method also comprises:
The content that shows described extraction document.
13. computer-readable recording medium as claimed in claim 11, wherein said home page filter condition comprises that also an extraction hurdle banner word, extraction hurdle finish word, one and extract paragraph banner word and one and extract paragraph and finish word, and also comprises according to the step that the content that described home page filter condition is extracted described search page document forms an extraction document:
Deletion or keep in the described search page document data between described extraction paragraph banner word and described extraction paragraph end word;
Extract the data between described extraction hurdle banner word and described extraction hurdle end word in the described search page document; And
Delete all the web displaying control marks in the described search page document.
14. computer-readable recording medium as claimed in claim 11, wherein said home page filter condition comprises that also an extraction hurdle banner word, extraction hurdle finish word and a literal to be deleted, and also comprises according to the step that the content that described home page filter condition is extracted described search page document forms an extraction document:
Delete the literal described to be deleted in the described search page document;
Extract the data between described extraction hurdle banner word and described extraction hurdle end word in the described search page document; And
Delete all the web displaying control marks in the described search page document.
15. computer-readable recording medium as claimed in claim 11, wherein said home page filter condition comprises that also an extraction hurdle banner word and extraction hurdle finish word, and also comprises according to the step that the content that described home page filter condition is extracted described search page document forms an extraction document:
Extract the data between described extraction hurdle banner word and described extraction hurdle end word in the described search page document; And
Delete all the web displaying control marks in the described search page document.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 01123635 CN1402156A (en) | 2001-08-22 | 2001-08-22 | Web site information extracting system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 01123635 CN1402156A (en) | 2001-08-22 | 2001-08-22 | Web site information extracting system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN1402156A true CN1402156A (en) | 2003-03-12 |
Family
ID=4665196
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 01123635 Pending CN1402156A (en) | 2001-08-22 | 2001-08-22 | Web site information extracting system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1402156A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100432996C (en) * | 2004-12-07 | 2008-11-12 | 国际商业机器公司 | System, method and program for extracting web page core content based on web page layout |
CN100444174C (en) * | 2006-09-25 | 2008-12-17 | 北京中搜在线软件有限公司 | Method for picking-up, and aggregating micro content of web page, and automatic updating system |
CN100458797C (en) * | 2007-06-20 | 2009-02-04 | 精实万维软件(北京)有限公司 | Process for ordering network advertisement |
CN100543741C (en) * | 2006-02-10 | 2009-09-23 | 鸿富锦精密工业(深圳)有限公司 | The system and method for automatic download and filtering web page |
CN101997915A (en) * | 2010-10-29 | 2011-03-30 | 中国电信股份有限公司 | Deep packet detection device, webpage data processing method, and webpage data acquisition method and system |
CN101409634B (en) * | 2007-10-10 | 2011-04-13 | 中国科学院自动化研究所 | Quantitative analysis tools and method for internet news influence based on information retrieval |
CN101310277B (en) * | 2005-11-15 | 2011-10-05 | 皇家飞利浦电子股份有限公司 | Method of obtaining a representation of a text and system |
CN101470731B (en) * | 2007-12-26 | 2012-06-20 | 中国科学院自动化研究所 | Personalized web page filtering method |
CN101751438B (en) * | 2008-12-17 | 2012-08-22 | 中国科学院自动化研究所 | Theme webpage filter system for driving self-adaption semantics |
CN101127038B (en) * | 2006-08-18 | 2012-09-19 | 鸿富锦精密工业(深圳)有限公司 | System and method for downloading website static web page |
CN102857885A (en) * | 2012-08-17 | 2013-01-02 | 东莞宇龙通信科技有限公司 | Method and communication terminal for sharing information |
CN104065504A (en) * | 2013-03-22 | 2014-09-24 | 腾讯科技(深圳)有限公司 | Information processing method and device |
CN107169076A (en) * | 2017-05-10 | 2017-09-15 | 北京京东尚科信息技术有限公司 | Method, system and the computer-readable recording medium cleaned for 2-D data |
-
2001
- 2001-08-22 CN CN 01123635 patent/CN1402156A/en active Pending
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100432996C (en) * | 2004-12-07 | 2008-11-12 | 国际商业机器公司 | System, method and program for extracting web page core content based on web page layout |
CN101310277B (en) * | 2005-11-15 | 2011-10-05 | 皇家飞利浦电子股份有限公司 | Method of obtaining a representation of a text and system |
CN100543741C (en) * | 2006-02-10 | 2009-09-23 | 鸿富锦精密工业(深圳)有限公司 | The system and method for automatic download and filtering web page |
CN101127038B (en) * | 2006-08-18 | 2012-09-19 | 鸿富锦精密工业(深圳)有限公司 | System and method for downloading website static web page |
CN100444174C (en) * | 2006-09-25 | 2008-12-17 | 北京中搜在线软件有限公司 | Method for picking-up, and aggregating micro content of web page, and automatic updating system |
CN100458797C (en) * | 2007-06-20 | 2009-02-04 | 精实万维软件(北京)有限公司 | Process for ordering network advertisement |
CN101409634B (en) * | 2007-10-10 | 2011-04-13 | 中国科学院自动化研究所 | Quantitative analysis tools and method for internet news influence based on information retrieval |
CN101470731B (en) * | 2007-12-26 | 2012-06-20 | 中国科学院自动化研究所 | Personalized web page filtering method |
CN101751438B (en) * | 2008-12-17 | 2012-08-22 | 中国科学院自动化研究所 | Theme webpage filter system for driving self-adaption semantics |
CN101997915A (en) * | 2010-10-29 | 2011-03-30 | 中国电信股份有限公司 | Deep packet detection device, webpage data processing method, and webpage data acquisition method and system |
CN101997915B (en) * | 2010-10-29 | 2014-01-08 | 中国电信股份有限公司 | Deep packet detection device, webpage data processing method, and webpage data acquisition method and system |
CN102857885A (en) * | 2012-08-17 | 2013-01-02 | 东莞宇龙通信科技有限公司 | Method and communication terminal for sharing information |
CN104065504A (en) * | 2013-03-22 | 2014-09-24 | 腾讯科技(深圳)有限公司 | Information processing method and device |
CN104065504B (en) * | 2013-03-22 | 2019-04-12 | 腾讯科技(深圳)有限公司 | The processing method and processing device of information |
CN107169076A (en) * | 2017-05-10 | 2017-09-15 | 北京京东尚科信息技术有限公司 | Method, system and the computer-readable recording medium cleaned for 2-D data |
CN107169076B (en) * | 2017-05-10 | 2020-06-05 | 北京京东尚科信息技术有限公司 | Method, system and computer readable storage medium for two-dimensional data cleansing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8082266B2 (en) | Index for data retrieval and data structuring | |
US8359295B2 (en) | User interface for navigating a keyword space | |
US7660808B2 (en) | Automatically indexing a collection of files of a selected type | |
CN1317661C (en) | System and method for facilitating internet search by providing web document layout image | |
US9367637B2 (en) | System and method for searching a bookmark and tag database for relevant bookmarks | |
CN1402156A (en) | Web site information extracting system and method | |
CN1522418A (en) | Predictive caching and highlighting of web pages | |
JP2009500719A (en) | Query search by image (query-by-imagesearch) and search system | |
WO2008098502A1 (en) | Method and device for creating index as well as method and system for retrieving | |
US6694302B2 (en) | System, method and article of manufacture for personal catalog and knowledge management | |
WO2011145922A1 (en) | Method and system for compiling a unique sample code for specific web content | |
CN101310277B (en) | Method of obtaining a representation of a text and system | |
JP2009026249A (en) | Browsing-history-editing terminal, program, and its method | |
Klein et al. | Evaluating methods to rediscover missing web pages from the web infrastructure | |
KR100671077B1 (en) | Server, Method and System for Providing Information Search Service by Using Sheaf of Pages | |
CN101599069A (en) | The searching method of electronic document and system | |
CN103853777A (en) | Method and device for accessing websites through keywords | |
JP2008191982A (en) | Retrieval result output device | |
US20080208831A1 (en) | Controlling search indexing | |
US20090313558A1 (en) | Semantic Image Collection Visualization | |
CN105243073A (en) | Bookmark access method and device and terminal | |
CN103853730B (en) | Control the method and system of network linking shortcut classification | |
TWI238333B (en) | Website information capturing system and method | |
JP4510041B2 (en) | Document search system and program | |
CN1838123A (en) | Information search method and system based on fixed keyword |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C06 | Publication | ||
PB01 | Publication | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |