Wrapper induction by hierarchical data analysis

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 6606625
SERIAL NO

09587528

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

An inductive algorithm, denominated STALKER, generating high accuracy extraction rules based on user-labeled training examples. With the tremendous amount of information that becomes available on the Web on a daily basis, the ability to quickly develop information agents has become a crucial problem. A vital component of any Web-based information agent is a set of wrappers that can extract the relevant data from semistructured information sources. The novel approach to wrapped induction provided herein is based on the idea of hierarchical information extraction, which turns the hard problem of extracting data from an arbitrarily complex document into a series of easier extraction tasks. Labeling the training data represents the major bottleneck in using wrapper induction techniques, and experimental results show that STALKER performs significantly better than other approaches; on one hand, STALKER requires up to two orders of magnitude fewer examples than other algorithms, while on the other hand it can handle information sources that could not be wrapped by prior techniques. STALKER uses an embedded catalog formalism to parse the information source and render a predictable structure from which information may be extracted or by which such information extraction may be facilitated and made easier.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
IMPORT IO GLOBAL INC20 S SANTA CRUZ AVE #102 LOS GATOS CA 95030

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Knoblock, Craig A El Segundo, CA 12 877
Minton, Steven El Segundo, CA 4 313
Muslea, Ion Culver City, CA 2 250

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation