Method and system for classifying semi-structured documents

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 6606620
SERIAL NO

09624616

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A classifier for semi-structured documents and associated method dynamically and accurately classify documents with an implicit or explicit schema by taking advantage of the term-frequency and term distribution information inherent in the document. The system uses a structured vector model that allows like terms to be grouped together and dissimilar terms to be segregated based on their frequency and distribution within the sub-vectors of the structure vector, thus achieving context sensitivity. The final decision for assigning the class of a document is based on a mathematical comparison of the similarity of the terms in the structured vector to those of the various class models. The classifier of the present invention is capable of both learning and testing. In the learning phase the classifier develops models for classes with information it develops from the composite information gleaned from numerous training documents. Specifically, it develops a structured vector model for each training document. Then, within a given class of documents it adds and then normalizes the occurrences of terms.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
GOOGLE LLC1600 AMPHITHEATRE PARKWAY MOUNTAIN VIEW CA 94043

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Sundaresan, Neelakantan San Jose, CA 428 10718
Yi, Jeonghee San Jose, CA 20 927

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation