Confusion matrix based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 6154579
SERIAL NO

08909199

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A method and apparatus for correcting misrecognized words appearing in electronic documents that have been generated by scanning an original document in accordance with an optical character recognition ('OCR') technique. If an incorrect word is found in the electronic document, the present invention generates at least one reference word and selects the reference word that is the most likely correct replacement for the incorrect word. This selection is accomplished by performing a probabilistic determination that assigns to each reference word a replacement word recognition probability. The probabilistic determination is carried out on the basis of a pre-stored confusion matrix that stores a plurality of probability values. The confusion matrix is used to associate each character of recognized word in the electronic document with a corresponding character of a word in the original document on the basis of these probability values.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

  • AT&T CORP.

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Goldberg, Randy G Princeton, NJ 58 2757

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation