US Patent No: 5,371,807

Number of patents in Portfolio can not be more than 2000

Method and apparatus for text classification

Stats

ATTORNEY / AGENT: (SPONSORED)
 

Importance

Loading Importance Indicators... loading....

Abstract

A text classification system and method that can be used by an application for classifying natural language text input into a computer system having a domain specific knowledge base that includes a knowledge base having a plurality of categories. The text classification system classifies input natural language input text by first parsing the natural language input text into a first list of recognized keywords. This list is then used to deduce further facts from the natural language input text which are then compiled into a second list. Next, a numeric similarity score for each one of the plurality of categories in the knowledge base is calculated which indicates how similar one of the plurality of categories is to the natural language input text. A dynamic threshold is then applied to determine which ones of the plurality of categories are most similar to the recognized keywords of the natural language input text. A third list is compiled of the ones of the plurality of categories determined to be most similar to the recognized keywords. An optional rule base can be utilized to further refine the determination of which ones of the plurality of categories are most similar to the recognized keywords of the natural language input text. Also, an optional learning capability can be added to improve the accuracy of the text classification system.

Loading the Abstract Image... loading....

First Claim

Related Publications

Loading Related Publications... loading....

Patent Owner(s)

Patent OwnerAddressTotal Patents
HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.HOUSTON, TX25733

International Classification(s)

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Kannan, Narasimhan Colorado Springs, CO 1 150
Register, Michael S Colorado Springs, CO 1 150

Cited Art

Patent Info (Count) # Cites Year
 
INTERNATIONAL BUSINESS MACHINES CORPORATION (2)
4,674,065 System for detecting and correcting contextual errors in a text processing system 111 1985
5,146,406 Computer method for identifying predicate-argument structures in natural language text 62 1989
 
BSO/BURO VOOR SYSTEEMONTWIKKELING B.V. (1)
5,128,865 Method for determining the semantic relatedness of lexical items in a text 77 1990
 
GOOGLE INC. (1)
4,876,731 Neural network model in pattern recognition using probabilistic contextual information 74 1988
 
HITACHI, LTD. (1)
4,682,365 System and method for preparing a recognition dictionary 34 1985
 
NEC CORPORATION (1)
5,050,218 Apparatus for recognizing address appearing on mail article 22 1991
 
NUANCE COMMUNICATIONS, INC. (1)
4,754,489 Means for resolving ambiguities in text based upon character context 58 1985
 
TEXAS INSTRUMENTS INCORPORATED (1)
5,083,268 System and method for parsing natural language by unifying lexical features of words 53 1990
 
OTHER [CHECK PATENT PROFILE FOR ASSIGNMENT INFORMATION] (1)
5,056,021 Method and apparatus for abstracting concepts from natural language 131 1989

Patent Citation Ranking

Forward Cites

Patent Info (Count) # Cites Year
 
INTERNATIONAL BUSINESS MACHINES CORPORATION (24)
5,548,507 Language identification process using coded language words 71 1994
6,704,698 Word counting natural language determination 18 1996
6,009,382 Word storage table for natural language determination 57 1996
6,002,998 Fast, efficient hardware mechanism for natural language determination 23 1996
6,023,670 Natural language determination using correlation between common words 16 1996
5,913,185 Determining a natural language shift in a computer document 29 1996
6,553,385 Architecture of a framework for information extraction from natural language documents 68 1998
6,212,532 Text categorization toolkit 37 1998
6,510,431 Method and system for the routing of requests using an automated classification and profile matching in a networked environment 13 1999
6,662,168 Coding system for high data volume 3 2000
6,408,277 System and method for automatic task prioritization 46 2000
6,785,683 Categorization and presentation tool for code resources 24 2000
7,493,252 Method and system to analyze data 1 2000
6,760,490 Efficient checking of key-in data entry 10 2000
7,099,855 System and method for electronic communication management 16 2001
8,290,768 System and method for determining a set of attributes based on content of communications 0 2002
7,376,641 Information retrieval from a collection of data 7 2004
7,644,057 System and method for electronic communication management 4 2004
7,849,044 System and method for automatic task prioritization 1 2005
7,266,535 System and method for electronic communication management 5 2005
7,756,810 Software tool for training and testing a knowledge base 1 2007
7,752,159 System and method for classifying text 4 2007
8,086,608 Management of resource identifiers 0 2007
7,702,677 Information retrieval from a collection of data 6 2008
 
YAHOO! INC. (12)
7,711,838 Internet radio and broadcast method 20 2000
7,406,529 System and method for detecting and verifying digitized content over a computer network 24 2001
7,251,665 Determining a known character string equivalent to a query string 3 2001
7,454,509 Online playback system with community bias 8 2001
7,305,483 Method for the real-time distribution of streaming data on a network 3 2002
7,162,482 Information retrieval engine 4 2002
7,574,513 Controllable track-skipping 14 2002
8,005,724 Relationship discovery engine 0 2003
7,707,221 Associating and linking compact disc metadata 5 2003
7,315,899 System for controlling and enforcing playback restrictions for a media file by splitting the media file into usable and unusable portions for playback 1 2005
7,720,852 Information retrieval engine 2 2006
7,546,316 Determining a known character string equivalent to a query string 3 2007
 
ORACLE INTERNATIONAL CORPORATION (7)
6,061,675 Methods and apparatus for classifying terminology utilizing a knowledge catalog 65 1995
5,887,120 Method and apparatus for determining theme for discourse 35 1995
5,930,788 Disambiguation of themes in a document classification system 48 1997
6,199,034 Methods and apparatus for determining theme for discourse 76 1998
6,654,731 Automated integration of terminological information into a knowledge base 20 1999
6,487,545 Methods and apparatus for classifying terminology utilizing a knowledge catalog 50 1999
7,512,575 Automated integration of terminological information into a knowledge base 3 2003
 
NUANCE COMMUNICATIONS, INC. (6)
6,047,251 Automatic language identification system for multilingual optical character recognition 44 1997
6,253,169 Method for improvement accuracy of decision tree based text categorization 58 1998
6,519,557 Software and method for recognizing similarity of documents written in different languages based on a quantitative measure of similarity 66 2000
7,039,579 Monte Carlo method for natural language understanding and speech recognition language models 2 2001
6,928,407 System and method for the automatic discovery of salient segments in speech transcripts 15 2002
7,389,230 System and method for classification of voice signals 5 2003
 
APPLE INC. (5)
7,047,242 Weighted term ranking for on-line query tool 71 1999
6,826,559 Hybrid category mapping for on-line query tool 57 1999
7,024,416 Semi-automatic index term augmentation in document retrieval 3 2002
7,725,424 Use of generalized term frequency scores in information retrieval systems 1 2002
8,095,533 Automatic index term augmentation in document retrieval 0 2004
 
DRUGLOGIC, INC. (5)
6,789,091 Method and system for web-based analysis of drug adverse effects 22 2001
6,778,994 Pharmacovigilance database 6 2001
7,539,684 Processing drug data 3 2002
8,131,769 Processing drug data 0 2009
7,979,373 Method and system for analyzing drug adverse effects 0 2009
 
INGENUITY SYSTEMS, INC. (5)
6,772,160 Techniques for facilitating information acquisition and storage 30 2000
6,741,986 Method and system for performing information extraction and quality control for a knowledgebase 12 2001
7,577,683 Methods for the construction and maintenance of a knowledge representation system 1 2004
7,650,339 Techniques for facilitating information acquisition and storage 0 2004
8,392,353 Computerized knowledge representation system with flexible user entry fields 0 2009
 
MICROSOFT CORPORATION (5)
5,694,559 On-line help method and system utilizing free text query 100 1995
5,970,449 Text normalization using a context-free grammar 35 1997
6,272,456 System and method for identifying the language of written text having a plurality of different length n-gram profiles 61 1998
7,970,600 Using a first natural language parser to train a second parser 0 2004
7,333,965 Classifying text in a code editor using multiple classifiers 1 2006
 
GOZOOM.COM, INC. (4)
8,032,604 Methods and systems for analyzing email messages 1 2009
7,970,845 Methods and systems for suppressing undesireable email messages 3 2009
8,280,971 Suppression of undesirable email messages by emulating vulnerable systems 0 2011
8,285,806 Methods and systems for analyzing email messages 0 2011
 
HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. (4)
6,823,323 Automatic classification method and apparatus 15 2001
7,715,533 Brokering of information acquisition by devices in a wireless network 0 2001
6,968,178 Profiles for information acquisition by devices in a wireless network 52 2001
6,950,646 Information acquisition decision making by devices in a wireless network 8 2001
 
NEURIC TECHNOLOGIES, LLC (4)
7,089,218 Method for inclusion of psychological temperament in an electronic emulation of the human brain 8 2005
7,849,034 Method of emulating human cognition in a brain model containing a plurality of electronically represented neurons 0 2006
8,001,067 Method for substituting an electronic emulation of the human brain into an application to replace a human 0 2007
7,925,492 Method for determining relationships through use of an ordered list between processing nodes in an emulated human brain 0 2007
 
MUSICMATCH, INC. (3)
8,271,333 Content-related wallpaper 2001
7,024,485 System for controlling and enforcing playback restrictions for a media file by splitting the media file into usable and unusable portions for playback 35 2002
7,672,873 Music purchasing and playing system and method 1 2004
 
VERITY, INC. (3)
6,604,090 System and method for selecting responses to user input in an automated interface program 21 1998
6,314,410 System and method for identifying the context of a statement made to a virtual robot 16 1998
6,629,087 Methods for creating and editing topics for virtual robots conversing in natural language 18 1999
 
XEROX CORPORATION (3)
5,659,766 Method and apparatus for inferring the topical content of a document based upon its lexical content without supervision 39 1994
6,973,423 Article and method of automatically determining text genre using surface features of untagged texts 4 1998
7,587,307 Method and apparatus for evaluating machine translation quality 4 2003
 
ATHENAINVEST, INC. (2)
7,734,526 Investment classification and tracking system 4 2007
8,352,347 Investment classification and tracking system using diamond ratings 0 2009
 
CANON KABUSHIKI KAISHA (2)
5,854,860 Image filing apparatus having a character recognition function 2 1995
6,188,977 Natural language processing apparatus and method for converting word notation grammar description data 9 1998
 
GOOGLE INC. (2)
6,167,369 Automatic language identification using both N-gram and word information 114 1998
8,015,173 Techniques for web site integration 1 2005
 
KONINKLIJKE PHILIPS ELECTRONICS N.V. (2)
7,210,157 Apparatus and method of program classification using observed cues in the transcript information 1 2000
6,798,912 Apparatus and method of program classification based on syntax of transcript information 3 2000
 
LUCENT TECHNOLOGIES INC. (2)
5,675,710 Method and apparatus for training a text classifier 117 1995
6,269,153 Methods and apparatus for automatic call routing including disambiguating routing decisions 106 1998
 
LUCIDMEDIA NETWORKS, INC. (2)
6,665,681 System and method for generating a taxonomy from a plurality of documents 31 1999
6,424,982 System and method for parsing a document using one or more break characters 14 1999
 
MITRE CORPORATION, THE (2)
7,765,574 Automated segmentation and information extraction of broadcast news via finite state presentation model 9 2004
7,386,542 Personalized broadcast news navigator 10 2004
 
TRUSTWAVE HOLDINGS, INC. (2)
7,315,891 Employee internet management device 6 2001
7,577,739 Employee internet management device 1 2007
 
VERIZON LABORATORIES INC. (2)
8,275,661 Targeted banner advertisements 0 1999
8,244,795 Page aggregation for web sites 0 2004
 
ACTIONEER, INC. (1)
7,146,381 Information organization and collaboration tool for processing notes and action requests in computer systems 12 1999
 
AT&T CORP. (1)
6,028,970 Method and apparatus for enhancing optical character recognition 22 1997
 
CRYSTAL SEMANTICS LIMITED (1)
7,305,415 Apparatus for classifying or disambiguating data 8 2004
 
DATALIGN, INC. (1)
6,886,011 Good and service description system and method 2 2001
 
EDUCATIONAL TESTING SERVICE (1)
6,115,683 Automatic essay scoring system using content-based techniques 44 1997
 
EMC (BENELUX) B.V., S.A.R.L. (1)
8,244,792 Apparatus and method for information recovery quality assessment in a computer system 0 2006
 
ENTRLEVA, INC. (1)
7,113,954 System and method for generating a taxonomy from a plurality of documents 10 2003
 
ERNST & YOUNG U.S. LLP (1)
7,805,673 Method and apparatus to provide a unified redaction system 3 2006
 
FUJITSU LIMITED (1)
7,003,442 Document file group organizing apparatus and method thereof 11 1999
 
GEORGE MASON INTELLECTUAL PROPERTIES, INC. (1)
5,748,973 Advanced integrated requirements engineering system for CE-based requirements assessment 34 1994
 
INTELLECTUAL VENTURES HOLDING 56 LLC (1)
6,272,490 Document data linking apparatus 7 1998
 
INTELLIGENT RESULTS, INC. (1)
7,249,312 Attribute scoring for unstructured content 7 2002
 
IPHRASE TECHNOLOGIES, INC. (1)
6,961,720 System and method for automatic task prioritization 17 2001
 
JUSTSYSTEMS CORPORATION (1)
6,292,772 Method for identifying the language of individual words 28 1998
 
KABUSHIKI KAISHA TOSHIBA (1)
5,651,101 Knowledge base system for setting attribute value derivation data independently from attribute value derivation procedure and shared data management apparatus for selectively locking attribute 8 1995
 
KENT RIDGE DIGITAL LABS (1)
7,386,560 Method and system for user-configurable clustering of information 2 2001
 
KRONOS TALENT MANAGEMENT INC. (1)
7,555,441 Conceptualization of job candidate information 7 2003
 
LEVERANCE, INC. (1)
5,682,539 Anticipated meaning natural language interface 45 1994
 
LORAL FEDERAL SYSTEMS COMPANY (1)
5,754,671 Method for improving cursive address recognition in mail pieces using adaptive data base management 44 1995
 
LUCIMEDIA NETWORKS, INC. (1)
8,327,265 System and method for parsing a document 0 2000
 
MUSICMATCH (1)
8,352,331 Relationship discovery engine 0 2001
 
NATIONAL RESEARCH COUNCIL OF CANADA (1)
6,470,307 Method and apparatus for automatically identifying keywords within a document 93 1997
 
NATIONAL SECURITY AGENCY (1)
5,991,714 Method of identifying data type and locating in a file 31 1998
 
PANASONIC CORPORATION (1)
7,657,906 Program recommendation apparatus, method and program used in the program recommendation apparatus 2 2004
 
PENDRAGON WIRELESS LLC (1)
6,990,496 System and method for automated classification of text by time slicing 5 2000
 
QWEST COMMUNICATIONS INTERNATIONAL INC. (1)
6,493,694 Method and system for correcting customer service orders 33 1999
 
RICOH COMPANY, LTD. (1)
6,546,383 Method and device for document retrieval 9 2000
 
SA DIGITAL DATA HOLDINGS, L.L.C. (1)
6,295,543 Method of automatically classifying a text appearing in a document when said text has been converted into digital data 15 1998
 
SAP AG (1)
7,386,785 Automatic electronic timesheet filler 2 2004
 
SHL US INC. (1)
8,086,558 Computer-implemented system for human resources management 6 2009
 
SIEMENS ENTERPRISE COMMUNICATIONS GMBH & CO. KG (1)
7,096,179 Text-based automatic content classification and grouping 8 2001
 
SIZATOLA, LLC (1)
7,917,519 Categorized document bases 1 2006
 
SYRACUSE UNIVERSITY (1)
5,873,056 Natural language processing system for semantic vector representation which accounts for lexical ambiguity 180 1993
 
TERADATA US, INC. (1)
7,613,717 Automated system for rating customer feedback 0 2001
 
TONFU CORPORATION (1)
6,931,394 Law retrieval system, law retrieval apparatus and law retrieval program 0 2001
 
UNIVERSITY TECHNOLOGY CORPORATION (1)
6,356,864 Methods for analysis and evaluation of the semantic content of a writing based on vector length 80 1998
 
VALITY TECHNOLOGY INCORPORATED (1)
6,938,053 Categorization based on record linkage theory 11 2001
 
VERIZON CORPORATE SERVICES GROUP INC. (1)
7,565,401 Page aggregation for web sites 0 2004
 
VOICES HEARD MEDIA, INC. (1)
8,060,390 Computer based method for generating representative questions from an audience 0 2006
 
OTHER [CHECK PATENT PROFILE FOR ASSIGNMENT INFORMATION] (3)
5,991,709 Document automated classification/declassification system 29 1997
7,925,612 Method for graphically depicting drug adverse effect risks 0 2001
7,542,961 Method and system for analyzing drug adverse effects 4 2001