Visual and interactive wrapper generation, automated information extraction from Web pages, and translation into XML

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 7581170
APP PUB NO 20050022115A1
SERIAL NO

10479039

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A method and a system for information extraction from Web pages formatted with markup languages such as HTML [8]. A method and system for interactively and visually describing information patterns of interest based on visualized sample Web pages [5,6,16-29]. A method and data structure for representing and storing these patterns [1]. A method and system for extracting information corresponding to a set of previously defined patterns from Web pages [2], and a method for transforming the extracted data into XML is described. Each pattern is defined via the (interactive) specification of one or more filters. Two or more filters for the same pattern contribute disjunctively to the pattern definition [3], that is, an actual pattern describes the set of all targets specified by any of its filters. A method and for extracting relevant elements from Web pages by interpreting and executing a previously defined wrapper program of the above form on an input Web page [9-14] and producing as output the extracted elements represented in a suitable data structure. A method and system for automatically translating said output into XML format by exploiting the hierarchical structure of the patterns and by using pattern names as XML tags is described.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

  • LIXTO SOFTWARE GMBH

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Baumgartner, Robert Vienna , AT 19 140
Gottlob, Georg Vienna , AT 8 459
Herzoo, Marcus Vienna , AT 1 320
I'Lesca, Sergio Ronde , IT 1 320

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation