Method and system for data retrieval in large collections of data

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 20020091671
APP PUB NO 20020091671A1
SERIAL NO

09989970

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A method, system and computer readable medium for retrieving relevant data in large collections of documents is disclosed. The method, system and computer readable medium of the present invention includes retrieving a document to be indexed, generating a document extract from the document, wherein the document extract comprises a portion of the document, and decomposing the document extract into tokens. The tokens are then stored in a search index, wherein a search engine accesses the search index to retrieve information satifying a search query. Through aspects of the method, system and computer readable medium of the present invention, the quality of the search result is improved because the retrieved documents are more relevant in view of the semantic concept or notion represented by the search query. Moreover the storage requirements are reduced, while expediting the processing time for conducting a search.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddressTotal Patents
INTERNATIONAL BUSINESS MACHINES CORPORATIONARMONK, NY44666

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Prokoph, Andreas Boeblingen, DE 4 146

Cited Art Landscape

Patent Info (Count) # Cites Year
 
FUJI XEROX CO., LTD. (1)
* 5778400 Apparatus and method for storing, searching for and retrieving text of a structured document provided with tags 40 1996
 
BRITISH TELECOMMUNICATIONS PUBLIC LIMITED COMPANY (1)
* 6253208 Information access 72 1998
 
RPX CORPORATION (1)
* 6415250 System and method for identifying language using morphologically-based techniques 138 1997
 
MICROSOFT TECHNOLOGY LICENSING, LLC (2)
* 6076051 Information retrieval utilizing semantic representation of text 205 1997
* 6631369 Method and system for incremental web crawling 111 1999
 
XEROX CORPORATION (1)
* 6857102 Document re-authoring systems and methods for providing device-independent access to the world wide web 124 1999
 
HARTFORD FIRE INSURANCE COMPANY (1)
* 5557515 Computerized system and method for work management 340 1995
 
KABUSHIKI KAISHA TOSHIBA (1)
* 5907841 Document detection system with improved document detection efficiency 31 1996
 
FUJITSU LIMITED (1)
* 6205456 Summarization apparatus and method 251 1998
 
HITACHI, LTD. (1)
* 6473754 METHOD AND SYSTEM FOR EXTRACTING CHARACTERISTIC STRING, METHOD AND SYSTEM FOR SEARCHING FOR RELEVANT DOCUMENT USING THE SAME, STORAGE MEDIUM FOR STORING CHARACTERISTIC STRING EXTRACTION PROGRAM, AND STORAGE MEDIUM FOR STORING RELEVANT DOCUMENT SEARCHING PROGRAM 33 1999
 
VERTICAL SEARCH WORKS, INC. (1)
* 6243713 Multimedia document retrieval by application of multimedia queries to a unified index of multimedia data for a plurality of multimedia data types 379 1998
 
ORACLE AMERICA, INC. (1)
* 5724571 Method and apparatus for generating query responses in a computer-based document retrieval system 244 1995
 
MCAFEE, INC. (1)
* 6621930 Automatic categorization of documents based on textual content 41 2000
* Cited By Examiner

Patent Citation Ranking

Forward Cite Landscape

Patent Info (Count) # Cites Year
 
Other [Check patent profile for assignment information] (4)
* 2008/0215,614 Pyramid Information Quantification or PIQ or Pyramid Database or Pyramided Database or Pyramided or Selective Pressure Database Management System 6 2006
* 8069162 Enhanced search indexing 3 2010
* 2011/0099,134 Method and System for Agent Based Summarization 3 2010
* 2011/0153,577 Query Processing System and Method for Use with Tokenspace Repository 9 2011
 
INTERNATIONAL BUSINESS MACHINES CORPORATION (26)
8214391 Knowledge-based data mining system 10 2002
* 7010526 Knowledge-based data mining system 9 2002
6993534 Data store for knowledge-based data mining system 38 2002
* 2003/0212,649 Knowledge-based data mining system 11 2002
* 2003/0212,699 Data store for knowledge-based data mining system 7 2002
* 2003/0212,675 Knowledge-based data mining system 14 2002
* 7254571 System and method for generating and retrieving different document layouts from a given content 12 2002
* 2003/0225,747 System and method for generating and retrieving different document layouts from a given content 7 2002
7146361 System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND) 120 2003
7139752 System, method and computer program product for performing unstructured information management and automatic text analysis, and providing multiple document views derived from different document tokenizations 90 2003
* 2004/0243,557 System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a weighted and (WAND) 18 2003
* 2004/0243,556 System, method and computer program product for performing unstructured information management and automatic text analysis, and including a document common analysis system (CAS) 78 2003
* 2004/0243,645 System, method and computer program product for performing unstructured information management and automatic text analysis, and providing multiple document views derived from different document tokenizations 33 2003
* 2004/0243,560 System, method and computer program product for performing unstructured information management and automatic text analysis, including an annotation inverted file system facilitating indexing and searching 34 2003
* 2004/0243,554 System, method and computer program product for performing unstructured information management and automatic text analysis 58 2003
7289983 Personalized indexing and searching for information in a distributed data processing system 25 2003
8014997 Method of search content enhancement 4 2003
* 2005/0138,007 Document enhancement method 33 2003
7512602 System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a weighted and (WAND) 8 2006
* 2007/0112,763 System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a weighted and (WAND) 10 2006
* 2008/0270,396 INDEXING VERSIONED DOCUMENT SEQUENCES 2 2007
* 2008/0016,039 SYSTEM AND METHOD FOR GENERATING AND RETRIEVING DIFFERENT DOCUMENT LAYOUTS FROM A GIVEN CONTENT 3 2007
8280903 System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND) 5 2008
* 2009/0222,441 System, Method and Computer Program Product for Performing Unstructured Information Management and Automatic Text Analysis, Including a Search Operator Functioning as a Weighted And (WAND) 17 2008
8027966 Method and system for searching a multi-lingual database 2 2008
8027994 Searching a multi-lingual database 0 2008
 
EXCALIBUR IP, LLC (2)
* 8984398 Generation of search result abstracts 0 2008
* 2010/0057,710 GENERATION OF SEARCH RESULT ABSTRACTS 4 2008
 
BELLSOUTH INTELLECTUAL PROPERTY CORPORATION (4)
7409593 Automated diagnosis for computer networks 21 2003
* 7324986 Automatically facilitated support for complex electronic services 0 2003
* 2005/0038,697 Automatically facilitated marketing and provision of electronic services 19 2003
* 2005/0015,667 Automated diagnosis for electronic systems 20 2003
 
HYPERTEXT SOLUTIONS INC. (1)
7953593 Method and system for extending keyword searching to syntactically and semantically annotated data 5 2009
 
LINKEDIN CORPORATION (2)
7854009 Method of securing access to IP LANs 2 2003
* 2005/0005,110 Method of securing access to IP LANs 13 2003
 
SAMSUNG ELECTRONICS CO., LTD. (1)
* 2008/0290,792 LIGHT EMITTING MATERIAL AND ORGANIC LIGHT-EMITTING DEVICE 2 2008
 
FUJITSU LIMITED (2)
* 9405819 Efficient indexing using compact decision diagrams 0 2008
* 2008/0243,907 Efficient Indexing Using Compact Decision Diagrams 2 2008
 
ORACLE OTC SUBSIDIARY LLC (18)
8874549 System and method for measuring the quality of document sets 0 2008
8832140 System and method for measuring the quality of document sets 1 2008
8219593 System and method for measuring the quality of document sets 6 2008
8051073 System and method for measuring the quality of document sets 14 2008
8051084 System and method for measuring the quality of document sets 13 2008
8024327 System and method for measuring the quality of document sets 20 2008
8005643 System and method for measuring the quality of document sets 16 2008
* 2009/0006,383 SYSTEM AND METHOD FOR MEASURING THE QUALITY OF DOCUMENT SETS 6 2008
* 2009/0006,438 SYSTEM AND METHOD FOR MEASURING THE QUALITY OF DOCUMENT SETS 19 2008
* 2009/0006,382 SYSTEM AND METHOD FOR MEASURING THE QUALITY OF DOCUMENT SETS 20 2008
* 2009/0006,385 SYSTEM AND METHOD FOR MEASURING THE QUALITY OF DOCUMENT SETS 4 2008
* 2009/0006,384 SYSTEM AND METHOD FOR MEASURING THE QUALITY OF DOCUMENT SETS 13 2008
* 2009/0006,386 SYSTEM AND METHOD FOR MEASURING THE QUALITY OF DOCUMENT SETS 4 2008
* 2009/0006,387 SYSTEM AND METHOD FOR MEASURING THE QUALITY OF DOCUMENT SETS 10 2008
* 2011/0246,378 IDENTIFYING HIGH VALUE CONTENT AND DETERMINING RESPONSES TO HIGH VALUE CONTENT 5 2010
8560529 System and method for measuring the quality of document sets 2 2011
8527515 System and method for concept visualization 2 2011
8935249 Visualization of concepts within a collection of information 3 2012
 
FINETOOTH ENTERPRISES, INC. (1)
* 2007/0055,670 System and method of extracting knowledge from documents 1 2005
 
AT&T DELAWARE INTELLECTUAL PROPERTY, INC. (1)
* 2008/0288,821 Automated Diagnosis for Electronic Systems 11 2008
 
VIAVIENTE (2)
7580929 Phrase-based personalization of searches in an information retrieval system 45 2004
* 2008/0319,971 Phrase-based personalization of searches in an information retrieval system 31 2004
 
MICROSOFT TECHNOLOGY LICENSING, LLC (5)
9529908 Tiering of posting lists in search engine index 0 2010
9424351 Hybrid-distribution model for search engine indexes 0 2010
8713024 Efficient forward ranking in a search engine 0 2010
8620907 Matching funnel for large document index 1 2010
8478704 Decomposable ranking for efficient precomputing that selects preliminary ranking features comprising static ranking features and dynamic atom-isolated components 0 2010
 
SCHLUMBERGER TECHNOLOGY CORPORATION (4)
* 9070172 Method and system for data context service 0 2008
* 2009/0063,230 METHOD AND SYSTEM FOR DATA CONTEXT SERVICE 8 2008
* 8156131 Quality measure for a data context service 6 2009
* 2010/0121,861 QUALITY MEASURE FOR A DATA CONTEXT SERVICE 7 2009
 
HARRIS CORPORATION (3)
* 7801887 Method for re-ranking documents retrieved from a document database 5 2004
* 2006/0089,926 Method for re-ranking documents retrieved from a document database 38 2004
* 2011/0016,113 METHOD FOR RE-RANKING DOCUMENTS RETRIEVED FROM A DOCUMENT DATABASE 0 2010
 
INTELLECTUAL VENTURES II LLC (2)
7735142 Electronic vulnerability and reliability assessment 2 2007
* 2008/0172,743 Electronic Vulnerability and Reliability Assessment 3 2007
 
FACEBOOK, INC. (10)
7584194 Method and apparatus for an application crawler 31 2005
* 7370381 Method and apparatus for a ranking engine 36 2005
* 2006/0230,011 Method and apparatus for an application crawler 70 2005
* 2006/0218,141 Method and apparatus for a ranking engine 16 2005
7912836 Method and apparatus for a ranking engine 6 2008
* 2008/0201,323 METHOD AND APPARATUS FOR A RANKING ENGINE 3 2008
8954416 Method and apparatus for an application crawler 0 2009
* 2009/0216,758 METHOD AND APPARATUS FOR AN APPLICATION CRAWLER 18 2009
9405833 Methods for analyzing dynamic web pages 0 2012
8788488 Ranking search results based on recency 0 2012
 
VCVCIII LLC (1)
8954469 Query templates and labeled search tip system, methods, and techniques 1 2008
 
GOOGLE INC. (80)
7711679 Phrase-based detection of duplicate documents in an information retrieval system 33 2004
7599914 Phrase-based searching in an information retrieval system 39 2004
* 7584175 Phrase-based generation of document descriptions 49 2004
7580921 Phrase identification in an information retrieval system 49 2004
7536408 Phrase-based indexing in an information retrieval system 58 2004
* 2008/0306,943 PHRASE-BASED DETECTION OF DUPLICATE DOCUMENTS IN AN INFORMATION RETRIEVAL SYSTEM 40 2004
7430556 Phrase-based indexing in an information retrieval system 1 2004
7426507 Automatic taxonomy generation in search results using phrases 99 2004
* 2006/0031,195 Phrase-based searching in an information retrieval system 81 2004
* 2006/0020,571 Phrase-based generation of document descriptions 54 2004
* 2006/0020,607 Phrase-based indexing in an information retrieval system 27 2004
8407239 Multi-stage query processing system and method for use with tokenspace repository 3 2004
* 7917480 Document compression system and method for use with tokenspace repository 11 2004
* 2007/0220,023 Document compression system and method for use with tokenspace repository 12 2004
* 2006/0036,593 Multi-stage query processing system and method for use with tokenspace repository 88 2004
7702618 Information retrieval system for archiving multiple document versions 41 2005
7567959 Multiple index based information retrieval system 63 2005
9008447 Method and system for character recognition 0 2005
* 8713418 Adding value to a rendered document 10 2005
* 2008/0141,117 Adding Value to a Rendered Document 219 2005
7603345 Detecting spam documents in a phrase based information retrieval system 31 2006
* 2006/0294,155 Detecting spam documents in a phrase based information retrieval system 55 2006
8166021 Query phrasification 22 2007
8166045 Phrase extraction using subphrase scoring 29 2007
8086594 Bifurcated document relevance scoring 17 2007
7925655 Query scheduling using hierarchical tiers of index servers 31 2007
7702614 Index updating using segment swapping 28 2007
7693813 Index server architecture using tiered and sharded phrase posting lists 73 2007
8117223 Integrating external related phrase information into a phrase-based indexing information retrieval system 17 2007
8560550 Multiple index based information retrieval system 6 2009
* 2010/0030,773 MULTIPLE INDEX BASED INFORMATION RETRIEVAL SYSTEM 13 2009
8619287 System and method for information gathering utilizing form identifiers 0 2009
* 2010/0182,631 INFORMATION GATHERING SYSTEM AND METHOD 57 2009
8078629 Detecting spam documents in a phrase based information retrieval system 10 2009
* 2011/0131,223 DETECTING SPAM DOCUMENTS IN A PHRASE BASED INFORMATION RETRIEVAL SYSTEM 15 2009
8090723 Index server architecture using tiered and sharded phrase posting lists 21 2010
* 2010/0161,617 INDEX SERVER ARCHITECTURE USING TIERED AND SHARDED PHRASE POSTING LISTS 31 2010
8612427 Information retrieval system for archiving multiple document versions 3 2010
8108412 Phrase-based detection of duplicate documents in an information retrieval system 14 2010
* 2010/0169,305 INFORMATION RETRIEVAL SYSTEM FOR ARCHIVING MULTIPLE DOCUMENT VERSIONS 12 2010
* 2010/0161,625 PHRASE-BASED DETECTION OF DUPLICATE DOCUMENTS IN AN INFORMATION RETRIEVAL SYSTEM 14 2010
8990235 Automatically providing content associated with captured information, such as information captured in real-time 19 2010
8874504 Processing techniques for visual capture data from a rendered document 3 2010
8793162 Adding information or functionality to a rendered document via association with an electronic counterpart 1 2010
8903759 Determining actions involving captured information and electronic content associated with rendered documents 0 2010
8621349 Publishing techniques for adding value to a rendered document 1 2010
8619147 Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device 5 2010
8620760 Methods and systems for initiating application processes by data capture from rendered documents 0 2010
8799303 Establishing an interactive environment for rendered documents 0 2010
9454764 Contextual dynamic advertising based upon captured rendered text 0 2010
9081799 Using gestalt information to identify locations in printed information 0 2010
9323784 Image search using text-based elements within the contents of images 1 2010
8321445 Generating content snippets using a tokenspace repository 1 2011
8402033 Phrase extraction using subphrase scoring 9 2011
8489628 Phrase-based detection of duplicate documents in an information retrieval system 3 2011
8682901 Index server architecture using tiered and sharded phrase posting lists 9 2011
8631027 Integrated external related phrase information into a phrase-based indexing information retrieval system 3 2012
8600975 Query phrasification 4 2012
9355169 Phrase extraction using subphrase scoring 1 2012
9268852 Search engines and systems with handheld document data capture devices 0 2012
8799099 Processing techniques for text capture from a rendered document 0 2012
8781228 Triggering actions in response to optically or acoustically capturing keywords from a rendered document 2 2012
9275051 Automatic modification of web pages 0 2012
9098501 Generating content snippets using a tokenspace repository 1 2012
8831365 Capturing text from rendered documents using supplement information 1 2013
9361331 Multiple index based information retrieval system 1 2013
8943067 Index server architecture using tiered and sharded phrase posting lists 3 2013
9146967 Multi-stage query processing system and method for use with tokenspace repository 0 2013
9075779 Performing actions based on capturing information from rendered documents, such as documents under copyright 1 2013
9143638 Data capture from rendered documents using handheld device 2 2013
9037573 Phase-based personalization of searches in an information retrieval system 1 2013
9384224 Information retrieval system for archiving multiple document versions 1 2013
9501506 Indexing system 0 2013
9483568 Indexing system 0 2013
9116890 Triggering actions in response to optically or acoustically capturing keywords from a rendered document 2 2014
9223877 Index server architecture using tiered and sharded phrase posting lists 2 2015
9569505 Phrase-based searching in an information retrieval system 0 2015
9514134 Triggering actions in response to optically or acoustically capturing keywords from a rendered document 0 2015
9619565 Generating content snippets using a tokenspace repository 0 2015
9633013 Triggering actions in response to optically or acoustically capturing keywords from a rendered document 0 2016
 
Kaboodle, Inc. (3)
* 7630968 Extracting information from formatted sources 4 2006
* 7606797 Reverse value attribute extraction 1 2006
* 2006/0190,684 Reverse value attribute extraction 10 2006
 
VCVC III LLC (24)
7283951 Method and system for enhanced data searching 64 2001
* 2004/0221,235 METHOD AND SYSTEM FOR ENHANCED DATA SEARCHING 28 2001
7398201 Method and system for enhanced data searching 66 2003
* 2003/0233,224 Method and system for enhanced data searching 76 2003
7526425 Method and system for extending keyword searching to syntactically and semantically annotated data 75 2004
* 2005/0267,871 Method and system for extending keyword searching to syntactically and semantically annotated data 152 2004
8856096 Extending keyword searching to syntactically and semantically annotated data 9 2006
* 2007/0156,669 Extending keyword searching to syntactically and semantically annotated data 95 2006
8594996 NLP-based entity recognition and disambiguation 12 2008
8700604 NLP-based content recommender 7 2008
* 2009/0150,388 NLP-based content recommender 18 2008
8131540 Method and system for extending keyword searching to syntactically and semantically annotated data 24 2009
* 2010/0268,600 ENHANCED ADVERTISEMENT TARGETING 13 2010
8645372 Keyword-based search engine results using enhanced query strategies 1 2010
* 2011/0119,243 KEYWORD-BASED SEARCH ENGINE RESULTS USING ENHANCED QUERY STRATEGIES 41 2010
8645125 NLP-based systems and methods for providing quotations 3 2011
8838633 NLP-based sentiment analysis 4 2011
9405848 Recommending mobile device activities 0 2011
8725739 Category-based content recommendation 3 2011
9116995 Cluster-based identification of news stories 0 2012
9613004 NLP-based entity recognition and disambiguation 0 2013
9092416 NLP-based systems and methods for providing quotations 0 2014
9471670 NLP-based content recommender 0 2014
9378285 Extending keyword searching to syntactically and semantically annotated data 0 2014
 
TOPIX LLC (5)
* 8271495 System and method for automating categorization and aggregation of content from network sites 7 2004
7930647 System and method for selecting pictures for presentation with text content 3 2005
* 2007/0136,680 System and method for selecting pictures for presentation with text content 5 2005
9405732 System and method for displaying quotations 0 2006
7814089 System and method for presenting categorized content on a site using programmatic and manual selection of content items 10 2007
 
NETBASE SOLUTIONS, INC. (10)
8055608 Method and apparatus for concept-based classification of natural language discourse 9 2006
8046348 Method and apparatus for concept-based searching of natural language discourse 11 2006
9047285 Method and apparatus for frame-based search 1 2008
8935152 Method and apparatus for frame-based analysis of search results 0 2008
9026529 Method and apparatus for determining search result demographics 1 2010
9390525 Graphical representation of frame instances 0 2011
9075799 Methods and apparatus for query formulation 0 2011
9063970 Method and apparatus for concept-based ranking of natural language discourse 0 2011
8949263 Methods and apparatus for sentiment analysis 1 2012
9135243 Methods and apparatus for identification and analysis of temporally differing corpora 0 2013
* Cited By Examiner