Identifying a property of a document

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 8380488
SERIAL NO

11737603

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

Methods, systems and apparatus, including computer program products, for identifying properties of an electronic document. In one aspect, a sequence of bytes representing text in a document is received. A plurality of byte-n-grams are identified from the bytes. For multiple encodings, a respective likelihood of each byte-n-gram occurring in each of the respective multiple encodings is identified. A respective encoding score for each of the multiple encodings is determined. A most likely encoding of the document is identified based on a highest encoding score among the encoding scores. In another aspect, a sequence of characters, having an encoding, are identified in a document. The sequence is segmented into features, each corresponding to two or more characters. A respective score for each of multiple languages is determined based on the features and a respective language model. A language of the document is identified based on the scores.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
GOOGLE INC1600 AMPHITHEATRE PARKWAY MOUNTAIN VIEW CA 94043

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Liu, Xin San Jose, US 657 8645
Yang, Stewart Sunnyvale, US 6 76

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation