Method and system for incremental web crawling

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 6631369
SERIAL NO

09345040

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A Web crawler creates an index of documents in a document store on a computer network. In an initial crawl, the crawler creates a first full index for the document store. The first full crawl is based on a set of predefined 'seed' URLs and crawl restrictions, and involves recursively retrieving each folder/document directly or indirectly linked to the seed URLs. In the process of creating the first full index, the crawler creates a History Table containing a list of URLs for each folder and document found in the first full crawl. The History Table also includes a local commit time (LCT) for each document and a deleted documents count (DDC) and LCT or maximum LCT (MLCT) for each folder (this assumes that the store supports a folder hierarchy and the MLCT, LCT and DDC properties). Thereafter, in an incremental crawl, the crawler determines, for each folder, (1) whether the DDC for that folder has changed and (2) whether the MLCT is more recent than the corresponding value in the History Table. If the DDC has changed, the crawler obtains a full list of items (URLs) in that folder, and compares the list with the URLs in the History Table to identify the deleted documents. The deleted documents are then deleted from the History Table and index. If the MLCT is more recent, the crawler queries the document store for the URLs of linked documents having a LCT more recent than the MLCT in the History Table for the folder. The History Table and index are then updated accordingly to reflect the changes to the document store.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

  • MICROSOFT TECHNOLOGY LICENSING, LLC

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Meyerzon, Dmitriy Bellevue, WA 112 3584
Sanu, Sankrant Redmond, WA 6 1115
Shoroff, Srikanth Issaquah, WA 30 1469
Terek, F Soner Bellevue, WA 16 876

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation