Extracting data content items using template matching

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 7765236
APP PUB NO 20090063500A1
SERIAL NO

11848987

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

Systems and methods for extracting data content items from a web page are provided. A template is created by labeling data content items of interest associated with a web page and generating a template Document Object Model (DOM) tree based on the labeled web page. DOM trees are also generated for additional web pages that contain data content items for which extraction may be desired. These DOM trees are compared to the template DOM tree to determine alignment there between. The aligned data content items may then be extracted from the additional web pages and indexed, as desired. Labeling the data content items of interest prior to generating a template DOM tree allows for the desired data content items to be specified and more accurately extracted from related and/or similarly structured web pages.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
MICROSOFT TECHNOLOGY LICENSING LLCONE MICROSOFT WAY REDMOND WA 98052

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Gao, Hong Seattle, US 139 748
Li, Yi Issaquah, US 917 5421
Oian, Richard Sammamish, US 1 15
Tan, Lei Bellevue, US 27 542
Zhai, Yanhong Redmond, US 1 80

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation