Patent No. US11086961 (titled "Visual Leaf Page Identification And Processing") was filed by Google Llc on Apr 5, 2017.
’961 is related to the field of internet search and, more specifically, to improving the relevance of search results, particularly for image-based searches. The background acknowledges that identifying the format and content of web pages is useful for search engine processing, noting that pages with primarily visual content are more useful for some searches than pages with primarily textual content. The patent addresses the problem of identifying and prioritizing "visual leaf pages," which are terminal pages with a dominant intent for salient images representing the topics described on the page.
The underlying idea behind ’961 is to automatically identify visual leaf pages by analyzing the characteristics of web pages and their relationships to other pages. The system identifies "hub pages" that link to visual leaf pages using image-based links. By analyzing the features of the visual leaf pages linked to by a hub page, the system generates cluster data representing the central tendencies of those features. This cluster data is then used to train a classifier that can identify other visual leaf pages, even those not directly linked to by known hub pages.
The claims of ’961 focus on a method, system, and computer-readable medium for identifying and classifying visual leaf pages. The core process involves, for each host, identifying visual leaf pages, identifying hub pages that link to those visual leaf pages via images, and generating cluster data for each hub page based on the features of the visual leaf pages it links to. The claims emphasize that the cluster data for each hub page is generated separately. The system then classifies new web pages as visual leaf pages by comparing their features to the generated cluster data and increases their search score if the query requests image search results.
In practice, the system first crawls the web to identify potential visual leaf pages based on criteria like the prominence of images or videos relative to other content. It then identifies hub pages that link to these pages using image-based links. For each hub page, the system extracts features from the linked visual leaf pages, such as URL depth, the number of images, and the presence of specific metadata. These features are then used to create a cluster representing the typical characteristics of visual leaf pages linked to by that hub page. This process creates a baseline set of data to train a visual leaf page classifier.
The key differentiation from prior approaches lies in the automated and unsupervised nature of the visual leaf page identification. Instead of relying on human-annotated training data, the system bootstraps the process by using hub pages as a starting point. By clustering visual leaf pages based on their relationship to hub pages and their shared features, the system can create a robust classifier that generalizes well to new, unseen pages. This approach reduces the need for human input and allows the search engine to adapt dynamically to changes in web content and user search behavior. The system also increases the search score of the classified visual leaf page if the search query requests image search results for a particular type of activity.
In the late 2010s when ’961 was filed, web-based systems commonly relied on distributed architectures to serve content to users. At a time when web pages were typically implemented using HTML, CSS, and JavaScript, identifying and classifying different types of web pages based on their content and structure was a common task. When hardware or software constraints made large-scale data processing non-trivial, techniques for efficiently analyzing and categorizing web pages were valuable.
The examiner approved the application because the prior art, taken as a whole, did not teach or suggest the specific combination of elements recited in the independent claims. Specifically, the prior art failed to suggest generating cluster data representing visual leaf pages by determining feature values for each visual leaf page linked to a hub page, where these feature values include layout information. The prior art also failed to suggest generating central feature values as cluster data, indicative of a central tendency of the visual leaf pages, and classifying web pages based on a classifier trained with this cluster data, while also considering the type of search query.
This patent includes 20 claims, with independent claims 1, 12, and 17. The independent claims focus on a method, a system, and a computer readable medium for classifying web pages as visual leaf pages and increasing their search score based on image search results. The dependent claims generally elaborate on and refine the specifics of the method, system, and computer readable medium described in the independent claims.
Definitions of key terms used in the patent claims.
US Latest litigation cases involving this patent.

The dossier documents provide a comprehensive record of the patent's prosecution history - including filings, correspondence, and decisions made by patent offices - and are crucial for understanding the patent's legal journey and any challenges it may have faced during examination.
Date
Description
Get instant alerts for new documents