Method for clustering closely resembling data objects

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 6119124
SERIAL NO

09048653

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A computer-implemented method determines the resemblance of data objects such as Web pages. Each data object is partitioned into a sequence of tokens. The tokens are grouped into overlapping sets of the tokens to form shingles. Each shingle is represented by a unique identification element encoded as a fingerprint. A minimum element from each of the images of the set of fingerprints associated with a document under each of a plurality of pseudo random permutations of the set of all fingerprints are selected to generate a sketch of each data object. The sketches characterize the resemblance of the data objects. The sketches can be further partitioned into a plurality of groups. Each group is fingerprinted to form a feature. Data objects that share more than a certain numbers of features are estimated to be nearly identical.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
R2 SOLUTIONS LLC6136 FRISCO SQUARE BLVD SUITE 400 FRISCO TX 75034

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Broder, Andrei Z Menlo Park, CA 38 2734
Glassman, Steven C Mountain View, CA 8 819
Manasse, Mark S San Francisco, CA 17 1051
Nelson, Charles G Palo Alto, CA 7 733
Zweig, Geoffrey G Oakland, CA 22 1207

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation