DEDUPLICATION BY PHRASE SUBSTITUTION WITHIN CHUNKS OF SUBSTANTIALLY SIMILAR CONTENT

Number of patents in Portfolio can not be more than 2000

United States of America Patent

SERIAL NO

14817296

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A method, system and computer program product for phrase substitution within chunks of substantially similar content. The method includes: retrieving from content files a first and a second content chunk which are identical above a predetermined threshold; identifying a candidate for substitution, wherein the candidate for substitution is a string of characters in the second content chunk that is not identical to a corresponding string of characters in the first content chunk; comparing the candidate for substitution with a synonym database to find a match, wherein the synonym database provides a plurality of synonym suggestions to convert the candidate for substitution in the first content chunk and the second content chuck to an identical string of characters; replacing the candidate for substitution with a reference to the identical string of characters; and storing a single copy of the identical string of characters in a common repository.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
INTERNATIONAL BUSINESS MACHINES CORPORATIONNEW ORCHARD ROAD ARMONK NY 10504

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Acharya, Alka A Pune, IN 1 20
Allen,, JR Lloyd W RESEARCH TRIANGLE PARK, US 29 370
Jenkins, Jana H DURHAM, US 215 1060
Samuel, Abigail KARNATAKA, IN 1 20

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation