Automatic language identification system for multilingual optical character recognition

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 6047251
SERIAL NO

08929788

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

The disclosed invention utilizes a dictionary-based approach to identify languages within different zones in a multi-lingual document. As a first step, a document image is segmented into various zones, regions and word tokens, using suitable geometric properties. Within each zone, the word tokens are compared to dictionaries associated with various candidate languages, and the language that exhibits the highest confidence factor is initially identified as the language of the zone. Subsequently, each zone is further split into regions. The language for each region is then identified, using the confidence factors for the words of that region. For any language determination having a low confidence value, the previously determined language of the zone is employed to assist the identification process.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
NUANCE COMMUNICATIONS INC1 WAYSIDE ROAD BURLINGTON MA 01803

International Classification(s)

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Bokser, Mindy R Los Gatos, CA 9 722
Choy, Kenneth Chan Los Gatos, CA 1 74
Kanungo, Tapas San Jose, CA 30 443
Pon, Leonard K Los Gatos, CA 1 74
Yang, Jun Los Gatos, CA 792 8454

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation