US Patent Application No: 2002/0091,671

Number of patents in Portfolio can not be more than 2000

Method and system for data retrieval in large collections of data

1 Status Updates

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A method, system and computer readable medium for retrieving relevant data in large collections of documents is disclosed. The method, system and computer readable medium of the present invention includes retrieving a document to be indexed, generating a document extract from the document, wherein the document extract comprises a portion of the document, and decomposing the document extract into tokens. The tokens are then stored in a search index, wherein a search engine accesses the search index to retrieve information satifying a search query. Through aspects of the method, system and computer readable medium of the present invention, the quality of the search result is improved because the retrieved documents are more relevant in view of the semantic concept or notion represented by the search query. Moreover the storage requirements are reduced, while expediting the processing time for conducting a search.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddressTotal Patents
INTERNATIONAL BUSINESS MACHINES CORPORATIONARMONK, NY41033

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Prokoph, Andreas Boeblingen, DE 4 136

Cited Art Landscape

Patent Info (Count) # Cites Year
 
FUJI XEROX CO., LTD. (1)
* 5,778,400 Apparatus and method for storing, searching for and retrieving text of a structured document provided with tags 40 1996
 
BRITISH TELECOMMUNICATIONS PUBLIC LIMITED COMPANY (1)
* 6,253,208 Information access 69 1998
 
RPX CORPORATION (1)
* 6,415,250 System and method for identifying language using morphologically-based techniques 121 1997
 
MICROSOFT TECHNOLOGY LICENSING, LLC (2)
* 6,076,051 Information retrieval utilizing semantic representation of text 181 1997
* 6,631,369 Method and system for incremental web crawling 107 1999
 
XEROX CORPORATION (1)
* 6,857,102 Document re-authoring systems and methods for providing device-independent access to the world wide web 117 1999
 
HARTFORD FIRE INSURANCE COMPANY (1)
* 5,557,515 Computerized system and method for work management 329 1995
 
KABUSHIKI KAISHA TOSHIBA (1)
* 5,907,841 Document detection system with improved document detection efficiency 31 1996
 
FUJITSU LIMITED (1)
* 6,205,456 Summarization apparatus and method 234 1998
 
HITACHI, LTD. (1)
* 6,473,754 METHOD AND SYSTEM FOR EXTRACTING CHARACTERISTIC STRING, METHOD AND SYSTEM FOR SEARCHING FOR RELEVANT DOCUMENT USING THE SAME, STORAGE MEDIUM FOR STORING CHARACTERISTIC STRING EXTRACTION PROGRAM, AND STORAGE MEDIUM FOR STORING RELEVANT DOCUMENT SEARCHING PROGRAM 30 1999
 
VERTICAL SEARCH WORKS, INC. (1)
* 6,243,713 Multimedia document retrieval by application of multimedia queries to a unified index of multimedia data for a plurality of multimedia data types 358 1998
 
ORACLE AMERICA, INC. (1)
* 5,724,571 Method and apparatus for generating query responses in a computer-based document retrieval system 229 1995
 
MCAFEE, INC. (1)
* 6,621,930 Automatic categorization of documents based on textual content 41 2000
* Cited By Examiner

Patent Citation Ranking

Forward Cite Landscape

Patent Info (Count) # Cites Year
 
Other [Check patent profile for assignment information] (4)
* 2008/0215,614 Pyramid Information Quantification or PIQ or Pyramid Database or Pyramided Database or Pyramided or Selective Pressure Database Management System 6 2006
* 8,069,162 Enhanced search indexing 2 2010
* 2011/0099,134 Method and System for Agent Based Summarization 2 2010
* 2011/0153,577 Query Processing System and Method for Use with Tokenspace Repository 7 2011
 
INTERNATIONAL BUSINESS MACHINES CORPORATION (26)
8,214,391 Knowledge-based data mining system 10 2002
* 7,010,526 Knowledge-based data mining system 8 2002
6,993,534 Data store for knowledge-based data mining system 37 2002
* 2003/0212,649 Knowledge-based data mining system 10 2002
* 2003/0212,699 Data store for knowledge-based data mining system 5 2002
* 2003/0212,675 Knowledge-based data mining system 13 2002
* 7,254,571 System and method for generating and retrieving different document layouts from a given content 11 2002
* 2003/0225,747 System and method for generating and retrieving different document layouts from a given content 6 2002
7,146,361 System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND) 107 2003
7,139,752 System, method and computer program product for performing unstructured information management and automatic text analysis, and providing multiple document views derived from different document tokenizations 78 2003
* 2004/0243,557 System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a weighted and (WAND) 17 2003
* 2004/0243,556 System, method and computer program product for performing unstructured information management and automatic text analysis, and including a document common analysis system (CAS) 69 2003
* 2004/0243,645 System, method and computer program product for performing unstructured information management and automatic text analysis, and providing multiple document views derived from different document tokenizations 23 2003
* 2004/0243,560 System, method and computer program product for performing unstructured information management and automatic text analysis, including an annotation inverted file system facilitating indexing and searching 33 2003
* 2004/0243,554 System, method and computer program product for performing unstructured information management and automatic text analysis 56 2003
7,289,983 Personalized indexing and searching for information in a distributed data processing system 22 2003
8,014,997 Method of search content enhancement 3 2003
* 2005/0138,007 Document enhancement method 28 2003
7,512,602 System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a weighted and (WAND) 6 2006
* 2007/0112,763 System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a weighted and (WAND) 10 2006
* 2008/0270,396 INDEXING VERSIONED DOCUMENT SEQUENCES 2 2007
* 2008/0016,039 SYSTEM AND METHOD FOR GENERATING AND RETRIEVING DIFFERENT DOCUMENT LAYOUTS FROM A GIVEN CONTENT 2 2007
8,280,903 System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND) 4 2008
* 2009/0222,441 System, Method and Computer Program Product for Performing Unstructured Information Management and Automatic Text Analysis, Including a Search Operator Functioning as a Weighted And (WAND) 16 2008
8,027,966 Method and system for searching a multi-lingual database 2 2008
8,027,994 Searching a multi-lingual database 0 2008
 
EXCALIBUR IP, LLC (2)
* 8,984,398 Generation of search result abstracts 0 2008
* 2010/0057,710 GENERATION OF SEARCH RESULT ABSTRACTS 4 2008
 
BELLSOUTH INTELLECTUAL PROPERTY CORPORATION (4)
7,409,593 Automated diagnosis for computer networks 20 2003
* 7,324,986 Automatically facilitated support for complex electronic services 0 2003
* 2005/0038,697 Automatically facilitated marketing and provision of electronic services 19 2003
* 2005/0015,667 Automated diagnosis for electronic systems 18 2003
 
HYPERTEXT SOLUTIONS INC. (1)
7,953,593 Method and system for extending keyword searching to syntactically and semantically annotated data 5 2009
 
LINKEDIN CORPORATION (2)
7,854,009 Method of securing access to IP LANs 2 2003
* 2005/0005,110 Method of securing access to IP LANs 12 2003
 
SAMSUNG ELECTRONICS CO., LTD. (1)
* 2008/0290,792 LIGHT EMITTING MATERIAL AND ORGANIC LIGHT-EMITTING DEVICE 2 2008
 
FUJITSU LIMITED (2)
* 9,405,819 Efficient indexing using compact decision diagrams 0 2008
* 2008/0243,907 Efficient Indexing Using Compact Decision Diagrams 2 2008
 
ORACLE OTC SUBSIDIARY LLC (18)
8,874,549 System and method for measuring the quality of document sets 0 2008
8,832,140 System and method for measuring the quality of document sets 1 2008
8,219,593 System and method for measuring the quality of document sets 6 2008
8,051,073 System and method for measuring the quality of document sets 13 2008
8,051,084 System and method for measuring the quality of document sets 12 2008
8,024,327 System and method for measuring the quality of document sets 19 2008
8,005,643 System and method for measuring the quality of document sets 12 2008
* 2009/0006,383 SYSTEM AND METHOD FOR MEASURING THE QUALITY OF DOCUMENT SETS 6 2008
* 2009/0006,438 SYSTEM AND METHOD FOR MEASURING THE QUALITY OF DOCUMENT SETS 17 2008
* 2009/0006,382 SYSTEM AND METHOD FOR MEASURING THE QUALITY OF DOCUMENT SETS 16 2008
* 2009/0006,385 SYSTEM AND METHOD FOR MEASURING THE QUALITY OF DOCUMENT SETS 4 2008
* 2009/0006,384 SYSTEM AND METHOD FOR MEASURING THE QUALITY OF DOCUMENT SETS 12 2008
* 2009/0006,386 SYSTEM AND METHOD FOR MEASURING THE QUALITY OF DOCUMENT SETS 4 2008
* 2009/0006,387 SYSTEM AND METHOD FOR MEASURING THE QUALITY OF DOCUMENT SETS 9 2008
* 2011/0246,378 IDENTIFYING HIGH VALUE CONTENT AND DETERMINING RESPONSES TO HIGH VALUE CONTENT 3 2010
8,560,529 System and method for measuring the quality of document sets 2 2011
8,527,515 System and method for concept visualization 1 2011
8,935,249 Visualization of concepts within a collection of information 1 2012
 
FINETOOTH ENTERPRISES, INC. (1)
* 2007/0055,670 System and method of extracting knowledge from documents 0 2005
 
AT&T DELAWARE INTELLECTUAL PROPERTY, INC. (1)
* 2008/0288,821 Automated Diagnosis for Electronic Systems 11 2008
 
VIAVIENTE (2)
7,580,929 Phrase-based personalization of searches in an information retrieval system 41 2004
* 2008/0319,971 Phrase-based personalization of searches in an information retrieval system 26 2004
 
MICROSOFT TECHNOLOGY LICENSING, LLC (4)
9,424,351 Hybrid-distribution model for search engine indexes 0 2010
8,713,024 Efficient forward ranking in a search engine 0 2010
8,620,907 Matching funnel for large document index 1 2010
8,478,704 Decomposable ranking for efficient precomputing that selects preliminary ranking features comprising static ranking features and dynamic atom-isolated components 0 2010
 
SCHLUMBERGER TECHNOLOGY CORPORATION (4)
* 9,070,172 Method and system for data context service 0 2008
* 2009/0063,230 METHOD AND SYSTEM FOR DATA CONTEXT SERVICE 6 2008
* 8,156,131 Quality measure for a data context service 5 2009
* 2010/0121,861 QUALITY MEASURE FOR A DATA CONTEXT SERVICE 2 2009
 
HARRIS CORPORATION (3)
* 7,801,887 Method for re-ranking documents retrieved from a document database 5 2004
* 2006/0089,926 Method for re-ranking documents retrieved from a document database 37 2004
* 2011/0016,113 METHOD FOR RE-RANKING DOCUMENTS RETRIEVED FROM A DOCUMENT DATABASE 0 2010
 
INTELLECTUAL VENTURES II LLC (2)
7,735,142 Electronic vulnerability and reliability assessment 2 2007
* 2008/0172,743 Electronic Vulnerability and Reliability Assessment 2 2007
 
FACEBOOK, INC. (10)
7,584,194 Method and apparatus for an application crawler 30 2005
* 7,370,381 Method and apparatus for a ranking engine 34 2005
* 2006/0230,011 Method and apparatus for an application crawler 58 2005
* 2006/0218,141 Method and apparatus for a ranking engine 14 2005
7,912,836 Method and apparatus for a ranking engine 6 2008
* 2008/0201,323 METHOD AND APPARATUS FOR A RANKING ENGINE 3 2008
8,954,416 Method and apparatus for an application crawler 0 2009
* 2009/0216,758 METHOD AND APPARATUS FOR AN APPLICATION CRAWLER 18 2009
9,405,833 Methods for analyzing dynamic web pages 0 2012
8,788,488 Ranking search results based on recency 0 2012
 
VCVCIII LLC (1)
8,954,469 Query templates and labeled search tip system, methods, and techniques 0 2008
 
GOOGLE INC. (72)
7,711,679 Phrase-based detection of duplicate documents in an information retrieval system 31 2004
7,599,914 Phrase-based searching in an information retrieval system 36 2004
* 7,584,175 Phrase-based generation of document descriptions 45 2004
7,580,921 Phrase identification in an information retrieval system 45 2004
7,536,408 Phrase-based indexing in an information retrieval system 53 2004
* 2008/0306,943 PHRASE-BASED DETECTION OF DUPLICATE DOCUMENTS IN AN INFORMATION RETRIEVAL SYSTEM 35 2004
7,430,556 Phrase-based indexing in an information retrieval system 1 2004
7,426,507 Automatic taxonomy generation in search results using phrases 91 2004
* 2006/0031,195 Phrase-based searching in an information retrieval system 74 2004
* 2006/0020,571 Phrase-based generation of document descriptions 49 2004
* 2006/0020,607 Phrase-based indexing in an information retrieval system 22 2004
8,407,239 Multi-stage query processing system and method for use with tokenspace repository 1 2004
* 7,917,480 Document compression system and method for use with tokenspace repository 10 2004
* 2007/0220,023 Document compression system and method for use with tokenspace repository 10 2004
* 2006/0036,593 Multi-stage query processing system and method for use with tokenspace repository 79 2004
7,702,618 Information retrieval system for archiving multiple document versions 38 2005
7,567,959 Multiple index based information retrieval system 59 2005
9,008,447 Method and system for character recognition 0 2005
* 8,713,418 Adding value to a rendered document 3 2005
* 2008/0141,117 Adding Value to a Rendered Document 204 2005
7,603,345 Detecting spam documents in a phrase based information retrieval system 30 2006
* 2006/0294,155 Detecting spam documents in a phrase based information retrieval system 51 2006
8,166,021 Query phrasification 19 2007
8,166,045 Phrase extraction using subphrase scoring 24 2007
8,086,594 Bifurcated document relevance scoring 14 2007
7,925,655 Query scheduling using hierarchical tiers of index servers 27 2007
7,702,614 Index updating using segment swapping 22 2007
7,693,813 Index server architecture using tiered and sharded phrase posting lists 54 2007
8,117,223 Integrating external related phrase information into a phrase-based indexing information retrieval system 15 2007
8,560,550 Multiple index based information retrieval system 4 2009
* 2010/0030,773 MULTIPLE INDEX BASED INFORMATION RETRIEVAL SYSTEM 11 2009
8,619,287 System and method for information gathering utilizing form identifiers 0 2009
8,078,629 Detecting spam documents in a phrase based information retrieval system 9 2009
* 2011/0131,223 DETECTING SPAM DOCUMENTS IN A PHRASE BASED INFORMATION RETRIEVAL SYSTEM 14 2009
8,090,723 Index server architecture using tiered and sharded phrase posting lists 18 2010
* 2010/0161,617 INDEX SERVER ARCHITECTURE USING TIERED AND SHARDED PHRASE POSTING LISTS 28 2010
8,612,427 Information retrieval system for archiving multiple document versions 2 2010
8,108,412 Phrase-based detection of duplicate documents in an information retrieval system 12 2010
* 2010/0169,305 INFORMATION RETRIEVAL SYSTEM FOR ARCHIVING MULTIPLE DOCUMENT VERSIONS 11 2010
* 2010/0161,625 PHRASE-BASED DETECTION OF DUPLICATE DOCUMENTS IN AN INFORMATION RETRIEVAL SYSTEM 13 2010
8,990,235 Automatically providing content associated with captured information, such as information captured in real-time 16 2010
8,874,504 Processing techniques for visual capture data from a rendered document 3 2010
8,793,162 Adding information or functionality to a rendered document via association with an electronic counterpart 1 2010
8,903,759 Determining actions involving captured information and electronic content associated with rendered documents 0 2010
8,621,349 Publishing techniques for adding value to a rendered document 0 2010
8,619,147 Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device 2 2010
8,620,760 Methods and systems for initiating application processes by data capture from rendered documents 0 2010
8,799,303 Establishing an interactive environment for rendered documents 0 2010
9,081,799 Using gestalt information to identify locations in printed information 0 2010
9,323,784 Image search using text-based elements within the contents of images 0 2010
8,321,445 Generating content snippets using a tokenspace repository 0 2011
8,402,033 Phrase extraction using subphrase scoring 7 2011
8,489,628 Phrase-based detection of duplicate documents in an information retrieval system 2 2011
8,682,901 Index server architecture using tiered and sharded phrase posting lists 6 2011
8,631,027 Integrated external related phrase information into a phrase-based indexing information retrieval system 2 2012
8,600,975 Query phrasification 2 2012
9,355,169 Phrase extraction using subphrase scoring 0 2012
9,268,852 Search engines and systems with handheld document data capture devices 0 2012
8,799,099 Processing techniques for text capture from a rendered document 0 2012
8,781,228 Triggering actions in response to optically or acoustically capturing keywords from a rendered document 0 2012
9,275,051 Automatic modification of web pages 0 2012
9,098,501 Generating content snippets using a tokenspace repository 0 2012
8,831,365 Capturing text from rendered documents using supplement information 1 2013
9,361,331 Multiple index based information retrieval system 0 2013
8,943,067 Index server architecture using tiered and sharded phrase posting lists 1 2013
9,146,967 Multi-stage query processing system and method for use with tokenspace repository 0 2013
9,075,779 Performing actions based on capturing information from rendered documents, such as documents under copyright 1 2013
9,143,638 Data capture from rendered documents using handheld device 1 2013
9,037,573 Phase-based personalization of searches in an information retrieval system 0 2013
9,384,224 Information retrieval system for archiving multiple document versions 0 2013
9,116,890 Triggering actions in response to optically or acoustically capturing keywords from a rendered document 0 2014
9,223,877 Index server architecture using tiered and sharded phrase posting lists 0 2015
 
Kaboodle, Inc. (3)
* 7,630,968 Extracting information from formatted sources 3 2006
* 7,606,797 Reverse value attribute extraction 1 2006
* 2006/0190,684 Reverse value attribute extraction 10 2006
 
VCVC III LLC (21)
7,283,951 Method and system for enhanced data searching 53 2001
* 2004/0221,235 METHOD AND SYSTEM FOR ENHANCED DATA SEARCHING 20 2001
7,398,201 Method and system for enhanced data searching 57 2003
* 2003/0233,224 Method and system for enhanced data searching 68 2003
7,526,425 Method and system for extending keyword searching to syntactically and semantically annotated data 69 2004
* 2005/0267,871 Method and system for extending keyword searching to syntactically and semantically annotated data 139 2004
8,856,096 Extending keyword searching to syntactically and semantically annotated data 3 2006
* 2007/0156,669 Extending keyword searching to syntactically and semantically annotated data 83 2006
8,594,996 NLP-based entity recognition and disambiguation 6 2008
8,700,604 NLP-based content recommender 1 2008
* 2009/0150,388 NLP-based content recommender 11 2008
8,131,540 Method and system for extending keyword searching to syntactically and semantically annotated data 21 2009
* 2010/0268,600 ENHANCED ADVERTISEMENT TARGETING 11 2010
8,645,372 Keyword-based search engine results using enhanced query strategies 1 2010
8,645,125 NLP-based systems and methods for providing quotations 2 2011
8,838,633 NLP-based sentiment analysis 3 2011
9,405,848 Recommending mobile device activities 0 2011
8,725,739 Category-based content recommendation 0 2011
9,116,995 Cluster-based identification of news stories 0 2012
9,092,416 NLP-based systems and methods for providing quotations 0 2014
9,378,285 Extending keyword searching to syntactically and semantically annotated data 0 2014
 
TOPIX LLC (5)
* 8,271,495 System and method for automating categorization and aggregation of content from network sites 4 2004
7,930,647 System and method for selecting pictures for presentation with text content 3 2005
* 2007/0136,680 System and method for selecting pictures for presentation with text content 5 2005
9,405,732 System and method for displaying quotations 0 2006
7,814,089 System and method for presenting categorized content on a site using programmatic and manual selection of content items 7 2007
 
NETBASE SOLUTIONS, INC. (10)
8,055,608 Method and apparatus for concept-based classification of natural language discourse 9 2006
8,046,348 Method and apparatus for concept-based searching of natural language discourse 10 2006
9,047,285 Method and apparatus for frame-based search 0 2008
8,935,152 Method and apparatus for frame-based analysis of search results 0 2008
9,026,529 Method and apparatus for determining search result demographics 0 2010
9,390,525 Graphical representation of frame instances 0 2011
9,075,799 Methods and apparatus for query formulation 0 2011
9,063,970 Method and apparatus for concept-based ranking of natural language discourse 0 2011
8,949,263 Methods and apparatus for sentiment analysis 1 2012
9,135,243 Methods and apparatus for identification and analysis of temporally differing corpora 0 2013
* Cited By Examiner