Identifying language and character set of data representing text

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 6157905
SERIAL NO

08987565

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

The present invention provides a facility for identifying the unknown language of text represented by a series of data values in accordance with a character set that associates character glyphs with particular data values. The facility first generates a characterization that characterizes the series of data values in terms of the occurrence of particular data values on the series of data values. For each of a plurality of languages, the facility then retrieves a model that models the language in terms of the statistical occurrence of particular data values in representative samples of text in that language. The facility then compares the retrieved models to the generated characterization of the series of data values, and identifies as the distinguished language the language whose model compares most favorably to the generated characterization of the series of data values.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
MICROSOFT TECHNOLOGY LICENSING LLCONE MICROSOFT WAY REDMOND WA 98052

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Powell, Robert David Issaquah, WA 1 101

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation