Merging three optical character recognition outputs for improved precision using a minimum edit distance function

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 5459739
SERIAL NO

07853550

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

Three OCR systems are employed for text conversion and the results generated from each of the three are merged using a edit distance algorithm to estimate a correct common text ancestor. To make the process computationally feasible for large strings such as pages of documentation with 3,000 characters, the method is executed in two stages. The first procedure is carried out with each page considered as a string of lines. Where differences exist using the edit distance between the lines on a page to find the optimal alignment of the lines. In the event that choice must be made among three non-null lines, the procedure then is invoked on the three lines , by using the edit distance between the characters on a line to find the optimal alignment. The number of computations required of the procedure is further reduced by comer-cutting that hueristically determines an upper bound on the edit distance and limits calculations to those which do not exceed the upper bound.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
OCLC ONLINE COMPUTER LIBRARY CENTER INCORPORATED A CORPORATION OF OHDUBLIN OH

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Handley, John C Penfield, NY 80 1457
Hickey, Thomas B Columbus, OH 5 214

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation