METHOD OF IDENTIFYING DOCUMENTS WITH SIMILAR PROPERTIES UTILIZING PRINCIPAL COMPONENT ANALYSIS

Number of patents in Portfolio can not be more than 2000

United States of America Patent

APP PUB NO 20080281581A1
SERIAL NO

12116735

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

The present invention generally provides methods and systems for characterizing texts, for example, for identifying textual documents by language, topic, author, or other attributes. In some embodiments, a method of the invention can include creating an n-gram frequency spectrum for a document under analysis, preferably selecting a subset of the n-gram frequency spectrum, transforming the n-gram frequency spectrum into principal component space, and identifying one or more attributes of the document according to its similarity to (or distinction from) reference documents in the principal component space.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
SPARTA INC24 HARTWELLL AVENUE LEXINGTON MA 02173 LEXINGTON MA

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Henshaw, Philip D Carlisle, MA 12 243
Trepagnier, Pierre C Medford, MA 8 175

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation