US Patent Application No: 2002/0091,671

Number of patents in Portfolio can not be more than 2000

Method and system for data retrieval in large collections of data

2 Status Updates

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A method, system and computer readable medium for retrieving relevant data in large collections of documents is disclosed. The method, system and computer readable medium of the present invention includes retrieving a document to be indexed, generating a document extract from the document, wherein the document extract comprises a portion of the document, and decomposing the document extract into tokens. The tokens are then stored in a search index, wherein a search engine accesses the search index to retrieve information satifying a search query. Through aspects of the method, system and computer readable medium of the present invention, the quality of the search result is improved because the retrieved documents are more relevant in view of the semantic concept or notion represented by the search query. Moreover the storage requirements are reduced, while expediting the processing time for conducting a search.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddressTotal Patents
INTERNATIONAL BUSINESS MACHINES CORPORATIONARMONK, NY47274

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Prokoph, Andreas Boeblingen, DE 3 119

Cited Art Landscape

  • No Cited Art to Display

Patent Citation Ranking

Forward Cite Landscape

Patent Info (Count) # Cites Year
 
INTERNATIONAL BUSINESS MACHINES CORPORATION (12)
8,214,391 Knowledge-based data mining system 9 2002
* 7,010,526 Knowledge-based data mining system 8 2002
6,993,534 Data store for knowledge-based data mining system 33 2002
* 7,254,571 System and method for generating and retrieving different document layouts from a given content 10 2002
7,146,361 System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND) 90 2003
7,139,752 System, method and computer program product for performing unstructured information management and automatic text analysis, and providing multiple document views derived from different document tokenizations 66 2003
7,289,983 Personalized indexing and searching for information in a distributed data processing system 20 2003
8,014,997 Method of search content enhancement 3 2003
7,512,602 System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a weighted and (WAND) 4 2006
8,280,903 System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND) 2 2008
8,027,966 Method and system for searching a multi-lingual database 2 2008
8,027,994 Searching a multi-lingual database 0 2008
 
Other [Check patent profile for assignment information] (2)
* 8,069,162 Enhanced search indexing 2 2010
* 2011/0099,134 Method and System for Agent Based Summarization 1 2010
 
BELLSOUTH INTELLECTUAL PROPERTY CORPORATION (3)
7,409,593 Automated diagnosis for computer networks 18 2003
* 7,324,986 Automatically facilitated support for complex electronic services 0 2003
* 2005/0015,667 Automated diagnosis for electronic systems 16 2003
 
HYPERTEXT SOLUTIONS INC. (1)
7,953,593 Method and system for extending keyword searching to syntactically and semantically annotated data 4 2009
 
LINKEDIN CORPORATION (1)
7,854,009 Method of securing access to IP LANs 2 2003
 
ORACLE OTC SUBSIDIARY LLC (11)
8,874,549 System and method for measuring the quality of document sets 0 2008
8,832,140 System and method for measuring the quality of document sets 1 2008
8,219,593 System and method for measuring the quality of document sets 5 2008
8,051,073 System and method for measuring the quality of document sets 12 2008
8,051,084 System and method for measuring the quality of document sets 12 2008
8,024,327 System and method for measuring the quality of document sets 16 2008
8,005,643 System and method for measuring the quality of document sets 10 2008
* 2011/0246,378 IDENTIFYING HIGH VALUE CONTENT AND DETERMINING RESPONSES TO HIGH VALUE CONTENT 1 2010
8,560,529 System and method for measuring the quality of document sets 2 2011
8,527,515 System and method for concept visualization 1 2011
8,935,249 Visualization of concepts within a collection of information 1 2012
 
INTELLECTUAL VENTURES II LLC (1)
7,735,142 Electronic vulnerability and reliability assessment 2 2007
 
VIAVIENTE (1)
7,580,929 Phrase-based personalization of searches in an information retrieval system 36 2004
 
MICROSOFT TECHNOLOGY LICENSING, LLC (3)
8,713,024 Efficient forward ranking in a search engine 0 2010
8,620,907 Matching funnel for large document index 1 2010
8,478,704 Decomposable ranking for efficient precomputing that selects preliminary ranking features comprising static ranking features and dynamic atom-isolated components 0 2010
 
SCHLUMBERGER TECHNOLOGY CORPORATION (2)
* 9,070,172 Method and system for data context service 0 2008
* 8,156,131 Quality measure for a data context service 3 2009
 
HARRIS CORPORATION (1)
* 7,801,887 Method for re-ranking documents retrieved from a document database 4 2004
 
FACEBOOK, INC. (5)
7,584,194 Method and apparatus for an application crawler 25 2005
* 7,370,381 Method and apparatus for a ranking engine 32 2005
7,912,836 Method and apparatus for a ranking engine 6 2008
8,954,416 Method and apparatus for an application crawler 0 2009
8,788,488 Ranking search results based on recency 0 2012
 
VCVCIII LLC (1)
8,954,469 Query templates and labeled search tip system, methods, and techniques 0 2008
 
GOOGLE INC. (51)
7,711,679 Phrase-based detection of duplicate documents in an information retrieval system 26 2004
7,599,914 Phrase-based searching in an information retrieval system 32 2004
* 7,584,175 Phrase-based generation of document descriptions 39 2004
7,580,921 Phrase identification in an information retrieval system 39 2004
7,536,408 Phrase-based indexing in an information retrieval system 44 2004
7,430,556 Phrase-based indexing in an information retrieval system 1 2004
7,426,507 Automatic taxonomy generation in search results using phrases 76 2004
8,407,239 Multi-stage query processing system and method for use with tokenspace repository 0 2004
* 7,917,480 Document compression system and method for use with tokenspace repository 5 2004
7,702,618 Information retrieval system for archiving multiple document versions 30 2005
7,567,959 Multiple index based information retrieval system 53 2005
9,008,447 Method and system for character recognition 0 2005
* 8,713,418 Adding value to a rendered document 1 2005
* 2008/0141,117 Adding Value to a Rendered Document 168 2005
7,603,345 Detecting spam documents in a phrase based information retrieval system 26 2006
8,166,021 Query phrasification 10 2007
8,166,045 Phrase extraction using subphrase scoring 15 2007
8,086,594 Bifurcated document relevance scoring 10 2007
7,925,655 Query scheduling using hierarchical tiers of index servers 21 2007
7,702,614 Index updating using segment swapping 17 2007
7,693,813 Index server architecture using tiered and sharded phrase posting lists 27 2007
8,117,223 Integrating external related phrase information into a phrase-based indexing information retrieval system 10 2007
8,560,550 Multiple index based information retrieval system 1 2009
8,619,287 System and method for information gathering utilizing form identifiers 0 2009
8,078,629 Detecting spam documents in a phrase based information retrieval system 7 2009
8,090,723 Index server architecture using tiered and sharded phrase posting lists 12 2010
8,612,427 Information retrieval system for archiving multiple document versions 1 2010
8,108,412 Phrase-based detection of duplicate documents in an information retrieval system 8 2010
8,990,235 Automatically providing content associated with captured information, such as information captured in real-time 0 2010
8,874,504 Processing techniques for visual capture data from a rendered document 0 2010
8,793,162 Adding information or functionality to a rendered document via association with an electronic counterpart 0 2010
8,903,759 Determining actions involving captured information and electronic content associated with rendered documents 0 2010
8,621,349 Publishing techniques for adding value to a rendered document 0 2010
8,619,147 Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device 0 2010
8,620,760 Methods and systems for initiating application processes by data capture from rendered documents 0 2010
8,799,303 Establishing an interactive environment for rendered documents 0 2010
9,081,799 Using gestalt information to identify locations in printed information 0 2010
8,321,445 Generating content snippets using a tokenspace repository 0 2011
8,402,033 Phrase extraction using subphrase scoring 4 2011
8,489,628 Phrase-based detection of duplicate documents in an information retrieval system 1 2011
8,682,901 Index server architecture using tiered and sharded phrase posting lists 2 2011
8,631,027 Integrated external related phrase information into a phrase-based indexing information retrieval system 1 2012
8,600,975 Query phrasification 0 2012
8,799,099 Processing techniques for text capture from a rendered document 0 2012
8,781,228 Triggering actions in response to optically or acoustically capturing keywords from a rendered document 0 2012
9,098,501 Generating content snippets using a tokenspace repository 0 2012
8,831,365 Capturing text from rendered documents using supplement information 0 2013
8,943,067 Index server architecture using tiered and sharded phrase posting lists 0 2013
9,075,779 Performing actions based on capturing information from rendered documents, such as documents under copyright 0 2013
9,037,573 Phase-based personalization of searches in an information retrieval system 0 2013
9,116,890 Triggering actions in response to optically or acoustically capturing keywords from a rendered document 0 2014
 
KABOODLE, INC. (2)
* 7,630,968 Extracting information from formatted sources 1 2006
* 7,606,797 Reverse value attribute extraction 1 2006
 
VCVC III LLC (13)
7,283,951 Method and system for enhanced data searching 44 2001
7,398,201 Method and system for enhanced data searching 48 2003
7,526,425 Method and system for extending keyword searching to syntactically and semantically annotated data 58 2004
8,856,096 Extending keyword searching to syntactically and semantically annotated data 0 2006
8,594,996 NLP-based entity recognition and disambiguation 1 2008
8,700,604 NLP-based content recommender 1 2008
8,131,540 Method and system for extending keyword searching to syntactically and semantically annotated data 14 2009
8,645,372 Keyword-based search engine results using enhanced query strategies 0 2010
8,645,125 NLP-based systems and methods for providing quotations 0 2011
8,838,633 NLP-based sentiment analysis 1 2011
8,725,739 Category-based content recommendation 0 2011
9,116,995 Cluster-based identification of news stories 0 2012
9,092,416 NLP-based systems and methods for providing quotations 0 2014
 
NetBase Solutions, Inc. (8)
8,055,608 Method and apparatus for concept-based classification of natural language discourse 7 2006
8,046,348 Method and apparatus for concept-based searching of natural language discourse 7 2006
9,047,285 Method and apparatus for frame-based search 0 2008
8,935,152 Method and apparatus for frame-based analysis of search results 0 2008
9,026,529 Method and apparatus for determining search result demographics 0 2010
9,075,799 Methods and apparatus for query formulation 0 2011
9,063,970 Method and apparatus for concept-based ranking of natural language discourse 0 2011
8,949,263 Methods and apparatus for sentiment analysis 0 2012
 
Topix LLC (3)
* 8,271,495 System and method for automating categorization and aggregation of content from network sites 1 2004
7,930,647 System and method for selecting pictures for presentation with text content 3 2005
7,814,089 System and method for presenting categorized content on a site using programmatic and manual selection of content items 3 2007
 
YAHOO! INC. (2)
* 8,984,398 Generation of search result abstracts 0 2008
* 2010/0057,710 GENERATION OF SEARCH RESULT ABSTRACTS 3 2008
* Cited By Examiner