Automatic segmentation of texts comprising chunks without separators

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 7536296
APP PUB NO 20070118356A1
SERIAL NO

10556940

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

Syntagms of a text including individual elements written without separators are segmented into chunks having strings including at least one individual element, such as an ideogram of the Mandarin Chinese language. A lexicon is defined including a set of strings, each string having at least one of the individual elements. The syntagm, being segmented, is orderly searched on an element-by-element basis by searching within the lexicon strings corresponding to any of the chunks. In the case of a positive search result, the corresponding chunk located is stored with an associated cost. A check is made as to whether the chunk located was already present in the lexicon. If the chunk located was already present, the cost associated therewith is reduced. A plurality of candidate segmentation sequences are thus generated, each corresponding to a respective segmentation pattern having associated a corresponding accrued cost. The candidate sequence having the lowest associated cost is selected as the final result of segmentation.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
CERENCE OPERATING COMPANYONE BURLINGTON WOODS DRIVE SUITE 301A BURLINGTON MA 01803

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Badino, Leonardo Turin, IT 3 290

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation