System and method of automatic wrapper grammar generation

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 6792576
SERIAL NO

09361496

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A method for generating a wrapper grammar for a file having a structure of a particular format includes providing at least one sample file of the particular format, where the particular format comprises a plurality of string tokens. Each sample file includes a plurality of tokens (data strings) which may be actual data from the document, an HTML tag or some other grammatical separator. The sample file of the particular format is then processed by annotating attributable tokens with a user-defined attribute, such as Author, Title, etc. from a set of attributes to form an annotated sample set. The annotated sample set is then evaluated to determine if wrapper grammar generation is possible, and if it is possible, a wrapper grammar for the files having a structure of the particular format is generated. Preferably, the annotated sample set is evaluated by determining if all attributes in the annotated sample set are distinguishable from one another.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
XEROX CORPORATION45 GLOVER AVENUE P O BOX 4505 NORWALK CT 06856-4505

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Chidlovskii, Boris Meylan, FR 73 3408

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation