Learning characteristics for extraction of information from web pages

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 8438080
SERIAL NO

12790551

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A learning module of an information retrieval system is configured to automatically learn distinctive characteristics used by different web sites when presenting data variables of interest. The learned information can then be used to identify data variables of interest on arbitrary web pages of the web sites. In one embodiment, the learning process is guided by feeds provided by the web sites that list values for data variables of interest, and by web pages also provided by the web sites. The values of the feeds enable the learning module to identify candidate portions of the web pages that may represent a data variable of interest. Weights are computed for different values of various properties of the candidate portions, aggregated over all the analyzed pages, and used to identify one of the candidate portions as being the best candidates.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
GOOGLE LLC1600 AMPHITHEATRE PARKWAY MOUNTAIN VIEW CA 94043

International Classification(s)

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Goodrow, Cristos Mountain View, US 2 9
Xiao, Fei Mountain View, US 73 536

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation