Patent No. US9171541 (titled "System and method for hybrid processing in a natural language voice services environment") on Feb 9, 2010. The application was issued on Oct 27, 2015.
'541 is related to the field of natural language processing, specifically voice services environments. The background involves the increasing complexity of consumer electronic devices and the challenges users face in fully utilizing their features. Existing voice user interfaces often require specific command sequences and suffer from inaccurate speech recognition, limiting their ability to engage users in productive dialogues and leverage information across different devices and applications.
The underlying idea behind '541 is to enable hybrid processing of natural language utterances by distributing the interpretation and processing workload across multiple devices. This involves a central system, acting as a virtual router, that receives audio and contextual data from various user devices (e.g., a phone and a computer), selects the best audio sample, and coordinates with other devices to determine the user's intent and resolve the request.
The claims of '541 focus on a computer system receiving a natural language utterance from one device and a related non-voice input from another device. The system processes the non-voice input to determine context, then uses this context to interpret the utterance. Based on this interpretation, the system generates a user request and selects a user processing device (which could be the original devices or a third one) that has the content needed to fulfill the request, and transmits the request to that device.
In practice, the invention allows a user to interact with multiple devices simultaneously to complete a task. For example, a user might speak a command into their phone while simultaneously highlighting text on their computer screen. The system combines the voice input with the highlighted text to understand the user's intent more accurately. The virtual router then intelligently routes the request to the device best equipped to handle it, such as a server with a large database or the user's own computer if the relevant data is stored locally.
This approach differs from prior solutions by not being limited to a single device or a pre-defined set of commands. Instead, it leverages the capabilities of multiple devices in a coordinated manner to provide a more intuitive and efficient user experience. By selecting the 'cleanest' audio sample and incorporating contextual information from non-voice inputs, the system aims to improve the accuracy of speech recognition and intent determination, leading to a more seamless and natural interaction.
In the late 2000s when ’541 was filed, voice recognition was typically implemented using rigid command-and-control syntaxes that required users to memorize specific keywords. At a time when mobile devices commonly relied on isolated, on-device processing rather than distributed cloud-based coordination, hardware constraints made the real-time interpretation of natural, conversational language non-trivial. Systems of this era often struggled to integrate data across multiple independent devices, meaning that a user’s interaction with one hardware interface rarely informed the context or task execution of another nearby device.
The examiner allowed the application because the prior art did not teach a specific method of cooperative natural language processing involving multiple independent devices. Specifically, the allowed claims involve receiving a spoken utterance from a first device and a separate non-voice input from a second independent device to determine context. The examiner noted that the combination of existing technologies did not disclose the unique sequence of using that multi-device context to generate a request, and then selecting a specific processing device to execute the task based on whether the required content actually resides on that device.
This patent contains 31 claims, with independent claims 1, 16, and 25. The independent claims are directed to methods and systems for processing natural language utterances using both voice and non-voice user inputs to determine context and generate user requests. The dependent claims generally elaborate on and refine the specifics of the independent claims, adding details and features to the methods and systems.
Definitions of key terms used in the patent claims.
US Latest litigation cases involving this patent.

The dossier documents provide a comprehensive record of the patent's prosecution history - including filings, correspondence, and decisions made by patent offices - and are crucial for understanding the patent's legal journey and any challenges it may have faced during examination.
Get instant alerts for new documents