System and method for automatically extracting interesting phrases in a large dynamic corpus

Number of patents in Portfolio can not be more than 2000

United States of America Patent

APP PUB NO 20070067157A1
SERIAL NO

11234667

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A phrase extraction system combines a dictionary method, a statistical/heuristic approach, and a set of pruning steps to extract frequently occurring and interesting phrases from a corpus. The system finds the 'top k' phrases in a corpus, where k is an adjustable parameter. For a time-varying corpus, the system uses historical statistics to extract new and increasingly frequent phrases. The system finds interesting phrases that occur near a set of user-designated phrases. The system uses these designated phrases as anchor phrases to identify phrases that occur near the anchor phrases. The system finds frequently occurring and interesting phrases in a time-varying corpus is changing in time, as in finding frequent phrases in an on-going, long term document feed or continuous, regular web crawl.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
INTERNATIONAL BUSINESS MACHINES CORPORATIONNEW ORCHARD ROAD ARMONK NY 10504

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Kaku, Vinay Kumar Fremont, CA 1 64
Kurita, Keiko Los Gatos, CA 4 189
Niblack, Carlton Wayne San Jose, CA 10 1415
Novak, Jasmine Gina Mountain View, CA 2 135
Zhang, Zengyan San Jose, CA 9 284

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation