Majority schema in semi-structured data

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 6604099
SERIAL NO

09628097

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A schema discovery system and associated method discover a majority schema for a set of related and similarly marked up documents, such as HTML documents, based on the assumption that though the structure of these documents is mostly for visual purposes, the keywords used in the documents along with the structural tags provide some hints, and allow a rough sketch of the underlying intended schema. With the assumption that albeit the set of HTML documents are marked up differently due to diverse authoring skills, they are closely related in content, it is reasonable to find a schema that can unify these different schemas, which schema is shared by the majority of these HTML documents. The system employs constraint rules on tree ordering to reduce the computational complexity in arriving at optimized XML DTD schema. These generalized XML DTD schemas may be used to perform automated comparison and evaluation schemes of profile documents on the WWW.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
GOOGLE LLC1600 AMPHITHEATRE PARKWAY MOUNTAIN VIEW CA 94043

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Chung, Christina Yip Davis, CA 22 1658
Sundaresan, Neelakantan San Jose, CA 428 10718

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation