Table of contents extraction with improved robustness

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 7743327
APP PUB NO 20070196015A1
SERIAL NO

11360963

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

In a method for identifying a table of contents in a document (10), text fragments are extracted (12) from the document. There are identified (20, 30, 34, 38): (i) a substantially contiguous group of text fragments as table of content entries and (ii) a different group of text fragments as linked text fragments linked with corresponding table of content entries. During the identifying, a number of text fragments that are candidates for identification as linked text fragments is reduced based on at least one reduction criterion (130). The identified table of contents entries and linked text fragments (110) are validated based on at least one validation criterion (162) related to distribution of the linked text fragments.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
XEROX CORPORATION45 GLOVER AVENUE P O BOX 4505 NORWALK CT 06856-4505

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Déjean, Hervé Grenoble, FR 24 277
Meunier, Jean-Luc St. Nazaire les Eymes, FR 62 1831

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation