Software and method for recognizing similarity of documents written in different languages based on a quantitative measure of similarity

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 6519557
SERIAL NO

09588250

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A system for identifying different language versions of the same structured format document (e.g., HTML web page) detects the language of the two documents and translates one or both into a preferred language if necessary, parses the two candidate documents and builds two hierarchical data structure based on the document. The data structures are used to compare the hierarchical structure of the two documents and also to access text portions in congruent positions in the two documents. A fuzzy measure of similarity of a set of text portions occupying congruent positions in the two documents is then obtained, to induce a measure of the similarity of the two documents which is compared to a fuzzy threshold.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

  • NUANCE COMMUNICATIONS, INC.

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Emens, Michael L San Jose, CA 13 2692
Kraft, Reiner Gilroy, CA 138 12123
Yim, Peter Chi-Shing San Francisco, CA 11 938

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation