Systems and methods for determining the topic structure of a document including text utilize a Probabilistic Latent Semantic Analysis (PLSA) model and select segmentation points based on similarity values between pairs of adjacent text blocks. PLSA forms a framework for both text segmentation and topic identification. The use of PLSA provides an improved representation for the sparse information in a text block, such as a sentence or a sequence of sentences. Topic characterization of each text segment is derived from PLSA parameters that relate words to 'topics', latent variables in the PLSA model, and 'topics' to text segments. A system executing the method exhibits significant performance improvement. Once determined, the topic structure of a document may be employed for document retrieval and/or document summarization.
Please note there is up to 60 days of latency in this Status indicator for certain status conditions. You can obtain up-to-date Status indicator readings by ordering PAIR for the file.
An application with the status "Published" (which means it is pending) may be recently abandoned, but not yet updated to reflect its abandoned status. However, an application filed less than one year ago is unlikely to be abandoned.
A patent with the status "Granted" may be recently expired, but not yet updated to reflect its expired status. However, it is highly unlikely a patent less than 3.5 years old would be expired.
An application with the status "Abandoned" is almost always current, but there is a small chance it was recently revived and the status not yet updated.
This priority date is an estimated earliest
priority date and is purely an estimation. This date should not be
taken as legal conclusion. No representations are made as to the
accuracy of the date listed. Please consult a legal professional
before relying on this date.