US Patent No: 7,765,236

Number of patents in Portfolio can not be more than 2000

Extracting data content items using template matching

Stats

ALSO PUBLISHED AS: 20090063500
ATTORNEY / AGENT: (SPONSORED)
 

Importance

Loading Importance Indicators... loading....

Abstract

Systems and methods for extracting data content items from a web page are provided. A template is created by labeling data content items of interest associated with a web page and generating a template Document Object Model (DOM) tree based on the labeled web page. DOM trees are also generated for additional web pages that contain data content items for which extraction may be desired. These DOM trees are compared to the template DOM tree to determine alignment there between. The aligned data content items may then be extracted from the additional web pages and indexed, as desired. Labeling the data content items of interest prior to generating a template DOM tree allows for the desired data content items to be specified and more accurately extracted from related and/or similarly structured web pages.

Loading the Abstract Image... loading....

First Claim

Related Publications

Loading Related Publications... loading....

Patent Owner(s)

Patent OwnerAddressTotal Patents
MICROSOFT CORPORATIONREDMOND, WA24226

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Gao, Hong Belle Mead, NJ 32 67
Li, Yi Sunnyvale, CA 358 868
Oian, Richard - 1 1
Tan, Lei Shenzhen, CN 6 11
Zhai, Yanhong - 2 3

Cited Art

Patent Info (Count) # Cites Year
 
INTERNATIONAL BUSINESS MACHINES CORPORATION (2)
6,778,703 Form recognition using reference areas 19 2000
7,174,327 Generating one or more XML documents from a relational database using XPath data model 27 2002
 
NOKIA CORPORATION (2)
7,072,984 System and method for accessing customized information over the internet using a browser for a plurality of electronic devices 83 2001
2004/0049,737 System and method for displaying information content with selective horizontal scrolling 71 2002
 
CLAIRVOYANCE CORPORATION (1)
2009/0198,714 Document processing and management approach for reflecting changes in one representation of a document to another representation 1 2005
 
DIVINE TECHNOLOGY VENTURES (1)
6,538,673 Method for extracting digests, reformatting, and automatic monitoring of structured online documents based on visual programming of document tree navigation and transformation 71 2000
 
DOCUMENT ANALYTIC TECHNOLOGIES, LLC (1)
2004/0102,958 Computer-based system and method for generating, classifying, searching, and analyzing standardized text templates and deviations from standardized text templates 8 2003
 
JUSTSYSTEMS CORPORATION (1)
2008/0195,626 Data Processing Device,Document Processing Device,Data Relay Device,Data Processing Method ,and Data Relay Method 2 2005
 
METAMOJI CORPATION (1)
2009/0070,295 DOCUMENT PROCESSING DEVICE AND DOCUMENT PROCESSING METHOD 3 2006
 
MICROSOFT CORPORATION (1)
2008/0010,056 Aligning hierarchal and sequential document trees to identify parallel data 3 2006
 
ORACLE INTERNATIONAL CORPORATION (1)
2006/0242,563 Optimizing XSLT based on input XML document structure description and translating XSLT into equivalent XQuery expressions 15 2005
 
SONY ELECTRONICS INC. (1)
7,176,921 Graphical rewriting system for multimedia descriptions 2 2001
 
TEXAS INSTRUMENTS INCORPORATED (1)
2006/0265,712 Methods for supporting intra-document parallelism in XSLT processing on devices with multiple processors 2 2005
 
WHILOM PROCESSING, L.L.C. (1)
6,772,165 Electronic document processing system and method for merging source documents on a node-by-node basis to generate a target document 104 2002
 
OTHER [CHECK PATENT PROFILE FOR ASSIGNMENT INFORMATION] (2)
6,810,414 System and methods for easy-to-use periodic network data capture engine with automatic target data location, extraction and storage 110 2000
2009/0043,777 Methods and apparatus for enabling use of web content on various types of devices 1 2007

Patent Citation Ranking

Forward Cites

Patent Info (Count) # Cites Year
 
YAHOO! INC. (1)
8,010,544 Inverted indices in information extraction to improve records extracted per annotation 0 2008

Maintenance Fees

Fee Large entity fee small entity fee micro entity fee due date
3.5 Year Payment $1600.00 $800.00 $400.00 Jan 27, 2014
7.5 Year Payment $3600.00 $1800.00 $900.00 Jan 27, 2018
11.5 Year Payment $7400.00 $3700.00 $1850.00 Jan 27, 2022
Fee Large entity fee small entity fee micro entity fee
Surcharge - 3.5 year - Late payment within 6 months $160.00 $80.00 $40.00
Surcharge - 7.5 year - Late payment within 6 months $160.00 $80.00 $40.00
Surcharge - 11.5 year - Late payment within 6 months $160.00 $80.00 $40.00
Surcharge after expiration - Late payment is unavoidable $700.00 $350.00 $175.00
Surcharge after expiration - Late payment is unintentional $1,640.00 $820.00 $410.00