Techniques for clustering structurally similar web pages

Number of patents in Portfolio can not be more than 2000

United States of America Patent

PATENT NO 7680858
APP PUB NO 20080010291A1
SERIAL NO

11481734

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

Web page clustering techniques described herein are URL Clustering and Page Clustering, whereby clustering algorithms cluster together pages that are structurally similar. Regarding URL clustering, because similarly structured pages have similar patterns in their URLs, grouping similar URL patterns will group structurally similar pages. Embodiments of URL clustering may involve: (a) URL normalization and (b) URL variation computation. Regarding page clustering, page feature-based techniques further cluster any given set of homogenous clusters, reducing the number of clusters based on the underlying page code. Embodiments of page clustering may reduce the number of clusters based on the tag probabilities and the tag sequence, utilizing an Approximate Nearest Neighborhood (ANN) graph along with evaluation of intra-cluster and inter-cluster compactness.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
R2 SOLUTIONS LLC6136 FRISCO SQUARE BLVD SUITE 400 FRISCO TX 75034

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Poola, Krishna Leela Bangalore, IN 14 336
Ramanujapuram, Arun Bangalore, IN 14 616

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation