Computer method and apparatus for extracting data from web pages

Number of patents in Portfolio can not be more than 2000

United States of America Patent

SERIAL NO

11436370

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

Computer method and apparatus for extracting information from a Web page is disclosed. The invention apparatus is formed of an extractor coupled to receive Web pages from a source. The extractor uses natural language processing to extract desired information from the Web page. A storage subsystem receives from the extractor the extracted desired information and stores the extracted desired information in a database. The invention method for extracting data from a Web page includes the computer implemented steps of (i) using natural language processing, finding possible formal names on a given Web page, (ii) using pattern matching, searching the given Web page for formal names not found by the natural language processing, and (iii) refining a combined set of the found formal names to produce a working set of people and organization names extracted from the given Web page. The refining includes determining aliases of respective people and organization names, so as to effectively reduce duplicate names.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
ZOOM INFORMATION INC810 MEMORIAL DRIVE CAMBRIDGE MA 02139

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Decary, Michel Montreal, CA 10 728
Karadimitriou, Kosmas Shrewsbury, MA 14 717
Rothman-Shore, Jeremy Cambridge, MA 1 113
Stern, Jonathan Ra'Anana, IL 43 1028

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation