Natural language determination using correlation between common words

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 6023670
SERIAL NO

08769842

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

The language in which a computer document is written is identified. A plurality of words from the document are compared to words in a word list associated with a candidate language. The words in the word list are a selection of the most frequently used words in the candidate language. A count of matches between words in the document and words in the word list for each word in the word list to produce a sample count. The sample count is correlated to a reference count for the candidate language to produce a correlation score for the candidate language. The language of the document is identified based on the correlation score. Generally, there are a plurality of candidate languages. Thus, comparing, accumulating, correlating and identifying processes are practiced for each language. The language of the document is identified as the candidate language having a reference count which generates a highest correlation score.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
INTERNATIONAL BUSINESS MACHINES CORPORATIONNEW ORCHARD ROAD ARMONK NY 10504

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Martino, Michael John Austin, TX 9 1043
Paulsen, Jr Robert Charles Austin, TX 8 1034

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation