Systems and methods for electronic document genre classification using document grammars

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 7734636
SERIAL NO

11094415

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A system for classifying a genre of an electronic document may include a network processor configured to receive an electronic document and convert the electronic document to rich text format (RTF). The processor may be configured to parse the RTF document into lines of text ordered from top to bottom and left to right and assign tokens to each line of text based on content of the line and to line separators based on space between blocks of lines. The network processor may be configured to sequence the tokens, parse the tokenized document with a number of pre-defined document grammars, determine a probability for each genre corresponding to the electronic document, and classify the electronic document as the genre with the highest probability.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
XEROX CORPORATION45 GLOVER AVENUE P O BOX 4505 NORWALK CT 06856-4505

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Handley, John C Fairport, US 80 1457

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation