Systems and methods for determining the topic structure of a portion of text

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 7130837
APP PUB NO 20030182631A1
SERIAL NO

10103053

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

Systems and methods for determining the topic structure of a document including text utilize a Probabilistic Latent Semantic Analysis (PLSA) model and select segmentation points based on similarity values between pairs of adjacent text blocks. PLSA forms a framework for both text segmentation and topic identification. The use of PLSA provides an improved representation for the sparse information in a text block, such as a sentence or a sequence of sentences. Topic characterization of each text segment is derived from PLSA parameters that relate words to 'topics', latent variables in the PLSA model, and 'topics' to text segments. A system executing the method exhibits significant performance improvement. Once determined, the topic structure of a document may be employed for document retrieval and/or document summarization.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

  • XEROX CORPORATION

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Brants, Thorsten H Palo Alto, CA 8 387
Chen, Francine R Menlo Park, CA 43 3554
Tsochantaridis, Ioannis Providence, RI 5 169

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation