System and method for efficient representation of data set addresses in a web crawler

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 6301614
SERIAL NO

09433008

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A web crawler stores fixed length representations of document addresses in first and second caches and a disk file. When the web crawler downloads a document from a host computer, it identifies URL's (document addresses) in the downloaded document. Each identified URL is converted into a fixed size numerical representation. The numerical representation is systematically compared to numerical representations in the caches and disk file. If the representation is not found in the caches and disk file, the document corresponding to the representation is scheduled for downloading, and the representation is stored in the second cache. If the representation is not found in the caches but is found in the disk file, the representation is added to the first cache. When the second cache is full, it is merged with the disk file and the second cache is reset to an initial state. When the first cache is full, one or more representations are evicted in accordance with an eviction policy. The representations include a prefix that is a function of a host component of the corresponding URL's, and the representations are stored in the disk file in sorted order. When the web crawler searches for a representation in the disk file, an index of the disk file is searched to identify a single block of the disk file, and only that single block of the disk file is searched for the representation.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
R2 SOLUTIONS LLC6136 FRISCO SQUARE BLVD SUITE 400 FRISCO TX 75034

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Heydon, Clark Allan San Francisco, CA 13 711
Najork, Marc Alexander Palo Alto, CA 27 1043

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation