Method and system for normalizing dirty text in a document

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 7003725
APP PUB NO 20030014448A1
SERIAL NO

09905610

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A method and system of normalizing dirty text in a document. The present invention creates a thesaurus that evolves over time as new document collections are analyzed. This thesaurus, which is used by an editor, contains standard terms and phrases, and their corresponding variations of these standard terms and phrases. Documents are run through this editor and misspelled words or phrases, joined words, and ad hoc abbreviations are replaced with standard terms from the thesaurus. The present invention also enables normalization of documents in cases where a list of standard terms must be inferred from the corpus of the document. The normalizer will facilitate data mining applications which can not function properly with dirty text, resulting in more accurate analysis of documents. Over time, as the thesaurus evolves, collecting more words and phrases, the process of generating the thesaurus will become more automated.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
MICRO FOCUS LLC4555 GREAT AMERICA PARKWAY SANTA CLARA CA 95054

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Castellanos, Maria Sunnyvale, CA 5 167
Stinger, James R Palo Alto, CA 8 258

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation