Visual Leaf Page Identification And Processing

Patent No. US11086961 (titled "Visual Leaf Page Identification And Processing") was filed by Google Llc on Apr 5, 2017.

’961 is related to the field of internet search and, more specifically, to improving the relevance of search results, particularly for image-based searches. The background acknowledges that identifying the format and content of web pages is useful for search engine processing, noting that pages with primarily visual content are more useful for some searches than pages with primarily textual content. The patent addresses the problem of identifying and prioritizing "visual leaf pages," which are terminal pages with a dominant intent for salient images representing the topics described on the page.

The underlying idea behind ’961 is to automatically identify visual leaf pages by analyzing the characteristics of web pages and their relationships to other pages. The system identifies "hub pages" that link to visual leaf pages using image-based links. By analyzing the features of the visual leaf pages linked to by a hub page, the system generates cluster data representing the central tendencies of those features. This cluster data is then used to train a classifier that can identify other visual leaf pages, even those not directly linked to by known hub pages.

The claims of ’961 focus on a method, system, and computer-readable medium for identifying and classifying visual leaf pages. The core process involves, for each host, identifying visual leaf pages, identifying hub pages that link to those visual leaf pages via images, and generating cluster data for each hub page based on the features of the visual leaf pages it links to. The claims emphasize that the cluster data for each hub page is generated separately. The system then classifies new web pages as visual leaf pages by comparing their features to the generated cluster data and increases their search score if the query requests image search results.

In practice, the system first crawls the web to identify potential visual leaf pages based on criteria like the prominence of images or videos relative to other content. It then identifies hub pages that link to these pages using image-based links. For each hub page, the system extracts features from the linked visual leaf pages, such as URL depth, the number of images, and the presence of specific metadata. These features are then used to create a cluster representing the typical characteristics of visual leaf pages linked to by that hub page. This process creates a baseline set of data to train a visual leaf page classifier.

The key differentiation from prior approaches lies in the automated and unsupervised nature of the visual leaf page identification. Instead of relying on human-annotated training data, the system bootstraps the process by using hub pages as a starting point. By clustering visual leaf pages based on their relationship to hub pages and their shared features, the system can create a robust classifier that generalizes well to new, unseen pages. This approach reduces the need for human input and allows the search engine to adapt dynamically to changes in web content and user search behavior. The system also increases the search score of the classified visual leaf page if the search query requests image search results for a particular type of activity.

How does this patent fit in bigger picture?

Technical landscape at the time

In the late 2010s when ’961 was filed, web-based systems commonly relied on distributed architectures to serve content to users. At a time when web pages were typically implemented using HTML, CSS, and JavaScript, identifying and classifying different types of web pages based on their content and structure was a common task. When hardware or software constraints made large-scale data processing non-trivial, techniques for efficiently analyzing and categorizing web pages were valuable.

Novelty and Inventive Step

The examiner approved the application because the prior art, taken as a whole, did not teach or suggest the specific combination of elements recited in the independent claims. Specifically, the prior art failed to suggest generating cluster data representing visual leaf pages by determining feature values for each visual leaf page linked to a hub page, where these feature values include layout information. The prior art also failed to suggest generating central feature values as cluster data, indicative of a central tendency of the visual leaf pages, and classifying web pages based on a classifier trained with this cluster data, while also considering the type of search query.

Claims

This patent includes 20 claims, with independent claims 1, 12, and 17. The independent claims focus on a method, a system, and a computer readable medium for classifying web pages as visual leaf pages and increasing their search score based on image search results. The dependent claims generally elaborate on and refine the specifics of the method, system, and computer readable medium described in the independent claims.

Key Claim Terms New

Definitions of key terms used in the patent claims.

Term (Source)	Support for Specification	Interpretation
Cluster data (Claim 1, Claim 12, Claim 17)	“Once the leaf pages and hub pages are identified for a host system, the system, for each hub page of the set of one or more hub pages, generates cluster data representing the visual leaf pages to which the hub page links. This may involve determining, for each visual leaf page to which the hub page links, a set of feature values, and then generating, from the sets of feature values, a set of central feature values as the cluster data for the hub page.”	Data representing the visual leaf pages to which the hub page links.
Image-based link (Claim 1, Claim 12, Claim 17)	“Each identified hub page links to at least one of the visual leaf pages through an image-based link on the hub page. Thereafter, additional leaf pages from the hub pages may be discovered.”	A link on a hub page that links to at least one of the visual leaf pages.
Set of central feature values (Claim 1, Claim 12, Claim 17)	“This may involve determining, for each visual leaf page to which the hub page links, a set of feature values, and then generating, from the sets of feature values, a set of central feature values as the cluster data for the hub page. The set of central feature values are indicative of a central tendency of each respective pre-defined feature of the visual leaf pages.”	Cluster data for the hub page, indicative of a central tendency of each respective pre-defined feature of the two or more feature values of the visual leaf pages to which the hub page links.
Visual leaf page (Claim 1, Claim 12, Claim 17)	“A visual leaf page is a leaf page that has a dominant intent for one or more salient images representing the topics described in that page. Accordingly, the removal of the images in that page will cause it to become significantly less informative. For example, when a user requests image search results from a search engine for a particular type of activity, such as shopping or looking for recipes, visual leaf pages may provide content that is highly relevant to what the user is searching for, and these visual leaf pages may provide results that are more relevant than results presented by a page that is not a visual leaf page.”	A web page that includes image data defining an image or a video that is prominently displayed relative to all other content of the web page.
Visual leaf page classifier (Claim 1, Claim 12, Claim 17)	“By classifying web pages as visual leaf pages and storing the identification of the visual leaf pages, a classifier system can achieve high precision and recall when returning search results. The system is able to operate without requiring human annotation of training data, reducing the amount of human input needed.”	A classifier trained by a baseline set of data comprising at least the cluster data, used to classify a web page as a visual leaf page based at least in part on a comparison of a set of feature values associated with the web page to the set of central feature values for at least one of the set of two or more hub pages.

Litigation Cases New

US Latest litigation cases involving this patent.

Case Number	Filing Date	Title
1:25-cv-00514	Apr 29, 2025	Accusearch Technologies Llc V. Google Llc

Patent Family

File Wrapper

The dossier documents provide a comprehensive record of the patent's prosecution history - including filings, correspondence, and decisions made by patent offices - and are crucial for understanding the patent's legal journey and any challenges it may have faced during examination.

Get instant alerts for new documents

US11086961

GOOGLE LLC

Application Number: US15479927
Filing Date: Apr 5, 2017
Status: Granted
Expiry Date: May 1, 2039
External Links: Slate, USPTO, Google Patents

IP Verse