US Patent Application No: 2002/0091,671

Number of patents in Portfolio can not be more than 2000

Method and system for data retrieval in large collections of data

1 Status Updates

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A method, system and computer readable medium for retrieving relevant data in large collections of documents is disclosed. The method, system and computer readable medium of the present invention includes retrieving a document to be indexed, generating a document extract from the document, wherein the document extract comprises a portion of the document, and decomposing the document extract into tokens. The tokens are then stored in a search index, wherein a search engine accesses the search index to retrieve information satifying a search query. Through aspects of the method, system and computer readable medium of the present invention, the quality of the search result is improved because the retrieved documents are more relevant in view of the semantic concept or notion represented by the search query. Moreover the storage requirements are reduced, while expediting the processing time for conducting a search.

Loading the Abstract Image... loading....

First Claim

See full text

all claims..

Related Publications

Loading Related Publications... loading....

Patent Owner(s)

Patent OwnerAddressTotal Patents
INTERNATIONAL BUSINESS MACHINES CORPORATIONARMONK, NY77020

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Prokoph, Andreas Boeblingen, DE 5 95

Patent Citation Ranking

Forward Cite Landscape

Patent Info (Count) # Cites Year
 
GOOGLE INC. (40)
7,711,679 Phrase-based detection of duplicate documents in an information retrieval system 19 2004
7,599,914 Phrase-based searching in an information retrieval system 25 2004
7,584,175 Phrase-based generation of document descriptions 31 2004
7,580,921 Phrase identification in an information retrieval system 31 2004
7,536,408 Phrase-based indexing in an information retrieval system 35 2004
7,430,556 Phrase-based indexing in an information retrieval system 0 2004
7,426,507 Automatic taxonomy generation in search results using phrases 56 2004
8,407,239 Multi-stage query processing system and method for use with tokenspace repository 0 2004
7,917,480 Document compression system and method for use with tokenspace repository 4 2004
7,702,618 Information retrieval system for archiving multiple document versions 23 2005
7,567,959 Multiple index based information retrieval system 46 2005
8,713,418 Adding value to a rendered document 1 2005
7,603,345 Detecting spam documents in a phrase based information retrieval system 22 2006
8,166,021 Query phrasification 4 2007
8,166,045 Phrase extraction using subphrase scoring 11 2007
8,086,594 Bifurcated document relevance scoring 5 2007
7,925,655 Query scheduling using hierarchical tiers of index servers 17 2007
7,702,614 Index updating using segment swapping 16 2007
7,693,813 Index server architecture using tiered and sharded phrase posting lists 20 2007
8,117,223 Integrating external related phrase information into a phrase-based indexing information retrieval system 6 2007
8,560,550 Multiple index based information retrieval system 1 2009
8,619,287 System and method for information gathering utilizing form identifiers 0 2009
8,078,629 Detecting spam documents in a phrase based information retrieval system 5 2009
8,090,723 Index server architecture using tiered and sharded phrase posting lists 8 2010
8,612,427 Information retrieval system for archiving multiple document versions 0 2010
8,108,412 Phrase-based detection of duplicate documents in an information retrieval system 4 2010
8,793,162 Adding information or functionality to a rendered document via association with an electronic counterpart 0 2010
8,621,349 Publishing techniques for adding value to a rendered document 0 2010
8,619,147 Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device 0 2010
8,620,760 Methods and systems for initiating application processes by data capture from rendered documents 0 2010
8,799,303 Establishing an interactive environment for rendered documents 0 2010
8,321,445 Generating content snippets using a tokenspace repository 2011
8,402,033 Phrase extraction using subphrase scoring 2 2011
8,489,628 Phrase-based detection of duplicate documents in an information retrieval system 1 2011
8,682,901 Index server architecture using tiered and sharded phrase posting lists 1 2011
8,631,027 Integrated external related phrase information into a phrase-based indexing information retrieval system 0 2012
8,600,975 Query phrasification 0 2012
8,799,099 Processing techniques for text capture from a rendered document 0 2012
8,781,228 Triggering actions in response to optically or acoustically capturing keywords from a rendered document 0 2012
8,831,365 Capturing text from rendered documents using supplement information 0 2013
 
INTERNATIONAL BUSINESS MACHINES CORPORATION (13)
8,214,391 Knowledge-based data mining system 7 2002
7,010,526 Knowledge-based data mining system 6 2002
6,993,534 Data store for knowledge-based data mining system 31 2002
7,254,571 System and method for generating and retrieving different document layouts from a given content 7 2002
7,146,361 System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND) 69 2003
7,139,752 System, method and computer program product for performing unstructured information management and automatic text analysis, and providing multiple document views derived from different document tokenizations 54 2003
7,854,009 Method of securing access to IP LANs 2 2003
7,289,983 Personalized indexing and searching for information in a distributed data processing system 18 2003
8,014,997 Method of search content enhancement 2 2003
7,512,602 System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a weighted and (WAND) 2 2006
8,280,903 System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND) 1 2008
8,027,966 Method and system for searching a multi-lingual database 1 2008
8,027,994 Searching a multi-lingual database 0 2008
 
VCVC III LLC (11)
7,283,951 Method and system for enhanced data searching 34 2001
7,398,201 Method and system for enhanced data searching 37 2003
7,526,425 Method and system for extending keyword searching to syntactically and semantically annotated data 41 2004
8,856,096 Extending keyword searching to syntactically and semantically annotated data 0 2006
8,594,996 NLP-based entity recognition and disambiguation 0 2008
8,700,604 NLP-based content recommender 0 2008
8,131,540 Method and system for extending keyword searching to syntactically and semantically annotated data 5 2009
8,645,372 Keyword-based search engine results using enhanced query strategies 0 2010
8,645,125 NLP-based systems and methods for providing quotations 0 2011
8,838,633 NLP-based sentiment analysis 0 2011
8,725,739 Category-based content recommendation 0 2011
 
ORACLE OTC SUBSIDIARY LLC (8)
8,832,140 System and method for measuring the quality of document sets 0 2008
8,219,593 System and method for measuring the quality of document sets 2 2008
8,051,073 System and method for measuring the quality of document sets 7 2008
8,051,084 System and method for measuring the quality of document sets 8 2008
8,024,327 System and method for measuring the quality of document sets 10 2008
8,005,643 System and method for measuring the quality of document sets 5 2008
8,560,529 System and method for measuring the quality of document sets 1 2011
8,527,515 System and method for concept visualization 0 2011
 
FACEBOOK, INC. (4)
7,584,194 Method and apparatus for an application crawler 18 2005
7,370,381 Method and apparatus for a ranking engine 29 2005
7,912,836 Method and apparatus for a ranking engine 5 2008
8,788,488 Ranking search results based on recency 0 2012
 
MICROSOFT CORPORATION (3)
8,713,024 Efficient forward ranking in a search engine 0 2010
8,620,907 Matching funnel for large document index 1 2010
8,478,704 Decomposable ranking for efficient precomputing that selects preliminary ranking features comprising static ranking features and dynamic atom-isolated components 0 2010
 
TOPIX LLC (3)
8,271,495 System and method for automating categorization and aggregation of content from network sites 1 2004
7,930,647 System and method for selecting pictures for presentation with text content 1 2005
7,814,089 System and method for presenting categorized content on a site using programmatic and manual selection of content items 3 2007
 
BELLSOUTH INTELLECTUAL PROPERTY CORPORATION (2)
7,409,593 Automated diagnosis for computer networks 11 2003
7,324,986 Automatically facilitated support for complex electronic services 0 2003
 
KABOODLE, INC. (2)
7,630,968 Extracting information from formatted sources 1 2006
7,606,797 Reverse value attribute extraction 1 2006
 
NetBase Solutions, Inc. (2)
8,055,608 Method and apparatus for concept-based classification of natural language discourse 1 2006
8,046,348 Method and apparatus for concept-based searching of natural language discourse 2 2006
 
HARRIS CORPORATION (1)
7,801,887 Method for re-ranking documents retrieved from a document database 4 2004
 
HYPERTEXT SOLUTIONS INC. (1)
7,953,593 Method and system for extending keyword searching to syntactically and semantically annotated data 3 2009
 
INTELLECTUAL VENTURES II LLC (1)
7,735,142 Electronic vulnerability and reliability assessment 2 2007
 
SCHLUMBERGER TECHNOLOGY CORPORATION (1)
8,156,131 Quality measure for a data context service 1 2009
 
VIAVIENTE (1)
7,580,929 Phrase-based personalization of searches in an information retrieval system 29 2004
 
Other [Check patent profile for assignment information] (1)
8,069,162 Enhanced search indexing 1 2010