Supercomputing environment for duplicate detection on web-scale data

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 7389310
SERIAL NO

12045406

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A scale-out supercomputing environment includes a plurality of interconnected nodes arranged in a three-dimensional cubic grid and configured to perform a method of duplicate detection. The method includes at least computing a fingerprint of at least one document in the supercomputing environment to generate data packets from the at least one document and to generate a fixed size tuple of information from the at least one document, distributing the data packets to each node of the plurality of nodes to ensure all elements of the fixed size tuple fit into memory of the plurality of nodes, applying localized detection techniques to data packets on each node of the plurality of nodes to remove data packet duplicates, redistributing the data packets to each node of the plurality of nodes based on the document fingerprint, and performing a global merge of results of the localized detection techniques.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

  • INTERNATIONAL BUSINESS MACHINES CORPORATION

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Bhagwan, Varun San Jose, CA 78 957
Desai, Rajesh M San Jose, CA 47 265
Gruhl, Daniel F San Jose, CA 49 535

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation