Text categorization based on co-classification learning from multilingual corpora

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 8438009
APP PUB NO 20110098999A1
SERIAL NO

12909389

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

The present document describes a method and a system for generating classifiers from multilingual corpora including subsets of content-equivalent documents written in different languages. When the documents are translations of each other, their classifications must be substantially the same. Embodiments of the invention utilize this similarity in order to enhance the accuracy of the classification in one language based on the classification results in the other language, and vice versa. A system in accordance with the present embodiments implements a method which comprises generating a first classifier from a first subset of the corpora in a first language; generating a second classifier from a second subset of the corpora in a second language; and re-training each of the classifiers on its respective subset based on the classification results of the other classifier, until a training cost between the classification results produced by subsequent iterations reaches a local minima.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
NATIONAL RESEARCH COUNCIL OF CANADAIP PORTFOLIO MANAGEMENT M-55 ROOM 29 1200 MONTREAL RD OTTAWA ONTARIO K1A 0R6 K1A 0R6

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Amini, Massih Gatineou, CA 1 31
Goutte, Cyril Toronto, CA 14 853

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation