Methods For Neural Network-Based Voice Enhancement And Systems Thereof

Patent No. US12125496 (titled "Methods For Neural Network-Based Voice Enhancement And Systems Thereof") was filed by Sanas.Ai Inc on Apr 24, 2024.

’496 is related to the field of audio processing, specifically voice enhancement. Background noise and unclear speech (e.g., mumbling, slurring) can significantly degrade the quality and intelligibility of audio, hindering communication and the accuracy of speech recognition systems. Traditional voice enhancement techniques often focus on noise reduction, which can inadvertently distort speech features, leading to further inaccuracies. This patent addresses the need for improved voice enhancement that preserves speech characteristics while effectively suppressing noise and enhancing clarity.

The underlying idea behind ’496 is to use a two-stage neural network approach to enhance voice quality in real-time. The first neural network reduces the dimensionality of the input audio, focusing on relevant speech content and characteristics while discarding noise. The second neural network then reconstructs the audio from this reduced representation, generating enhanced speech that is clearer and more intelligible.

The claims of ’496 focus on a voice enhancement system, a method for real-time voice enhancement, and a non-transitory computer-readable medium. The independent claims cover the process of fragmenting input audio into frames, converting these frames into low-dimensional representations using a first neural network (which also removes non-content elements like noise), applying a second neural network to generate enhanced target speech frames, and combining these frames into output audio.

In practice, the system first takes an audio stream and divides it into short frames. The first neural network, having been trained on a large dataset of speech and noise, then transforms each frame into a compact, lower-dimensional representation. This representation is designed to capture the essential features of the speech while discarding irrelevant noise and artifacts. The second neural network then takes these compact representations and reconstructs them into enhanced speech frames, which are then stitched together to form the final output audio.

This approach differs from prior methods by explicitly using a low-dimensional representation to filter out noise and preserve speech characteristics. Instead of directly manipulating the raw audio signal, the system operates on a compressed representation, allowing it to more effectively separate speech from noise and enhance clarity. The two neural networks are trained to work together, with the first network learning to extract relevant features and the second network learning to reconstruct high-quality speech from these features.

How does this patent fit in bigger picture?

Technical landscape at the time

In the early 2020s when '496 was filed, voice enhancement was typically implemented using signal processing techniques in the STFT domain, often relying on ratio masks and equalization to reduce noise and improve speech clarity. At a time when systems commonly relied on noise reduction algorithms to preserve original speech audio, hardware or software constraints made it non-trivial to enhance speech intelligibility when the original speech was already degraded due to factors like slurring or mumbling.

Novelty and Inventive Step

The examiner approved the application because the prior art does not teach, disclose, or suggest applying a second neural network to the low-dimensional representations of input speech frames to generate target speech frames. This feature, in combination with all other claim limitations, distinguishes the invention from the prior art.

Claims

This patent contains 20 claims, with independent claims numbered 1, 11, and 16. The independent claims are generally directed to a voice enhancement system, a method for real-time voice enhancement, and a non-transitory computer-readable medium for voice enhancement, respectively, all involving neural networks to process and enhance audio data. The dependent claims generally elaborate on and refine the specifics of the voice enhancement system, method, and computer-readable medium described in the independent claims.

Key Claim Terms New

Definitions of key terms used in the patent claims.

Term (Source)	Support for Specification	Interpretation
Input audio data (Claim 1, Claim 11, Claim 16)	“Referring now to background noise, poor enunciation, heavy accents, language barriers and/or mumbled, creaky, slurred, and/or quiet speech, for example. In some examples, the non-content elements 614 may include background noise 616 and other elements 618 such as microphone pops, low-fidelity audio, and/or audio clipping, although other types of background noise can also be used.”	Audio data that is received by the voice enhancement system and processed to enhance the speech content.
Input speech frames (Claim 1, Claim 11, Claim 16)	“The input audio training data 602 in this example may be fragmented into a plurality of input training speech frames 630 . Input training speech frames 630 may be converted dynamically to a low-dimensional input audio training data representation 632 by the first neural network 208.”	Fragments of the input audio data, which are processed individually by the neural networks.
Low-dimensional representations (Claim 1, Claim 11, Claim 16)	“The low-dimensional input audio training data representation 632 may comprise multiple low-dimensional representations of input audio training data speech frames 634 ( 1 )- 634 ( n ). The low-dimensional input audio training data representation 632 may further include one or more portions of the foreground speech content 610 and/or the speech characteristics 612 .”	A compressed form of the input speech frames, generated by the first neural network, that omits non-content elements.
Output audio data (Claim 1, Claim 11, Claim 16)	“This technology advantageously improves speech clarity and intelligibility in various applications by utilizing noise suppression algorithms that more accurately estimate the background noise signal from a single microphone recording, thereby suppressing noise without distorting the target or output enhanced speech data.”	The final enhanced audio data, generated by combining the target speech frames, which includes foreground speech content and speech characteristics.
Target speech frames (Claim 1, Claim 11, Claim 16)	“In step 508 , the voice enhancement system 100 provides to the second neural network 210 the low-dimensional input audio data representation 404 generated in step 508 . Referring now to”	Enhanced speech frames generated by the second neural network from the low-dimensional representations.

Patent Family

File Wrapper

The dossier documents provide a comprehensive record of the patent's prosecution history - including filings, correspondence, and decisions made by patent offices - and are crucial for understanding the patent's legal journey and any challenges it may have faced during examination.

Date
Description

Get instant alerts for new documents

Oct 22, 2024
Digitally signed official patent eGrant document
Oct 22, 2024
eGrant day-of Notification
Oct 2, 2024
Issue Notification
Sep 17, 2024
Issue Fee Payment (PTO-85B)
Sep 17, 2024
Electronic Filing System Acknowledgment Receipt
Sep 17, 2024
Electronic Fee Payment
Jul 23, 2024
Filing Receipt
Jul 18, 2024
Electronic Filing System Acknowledgment Receipt
Jul 18, 2024
Oath or Declaration filed
Jul 18, 2024
Amendment after Notice of Allowance (Rule 312)
Jul 18, 2024
Miscellaneous Communication to Applicant - No Action Count
Jul 15, 2024
Filing Receipt
Jul 9, 2024
Oath or Declaration filed
Jul 9, 2024
Electronic Filing System Acknowledgment Receipt
Jul 5, 2024
Notice of Allowance and Fees Due (PTOL-85)
Jul 5, 2024
List of references cited by examiner
Jul 5, 2024
Issue Information including classification, examiner, name, claim, renumbering, etc.
Jul 5, 2024
Search information including classification, databases and other search related notes
Jul 5, 2024
Index of Claims
Jul 5, 2024
Bibliographic Data Sheet
Jul 5, 2024
Examiner's search strategy and results
Jul 5, 2024
Examiner's search strategy and results
Jun 7, 2024
Track One request Granted
Jun 6, 2024
Fee Worksheet (SB06)
May 9, 2024
Filing Receipt
May 9, 2024
Fee Worksheet (SB06)
May 9, 2024
Miscellaneous Communication to Applicant - No Action Count
May 9, 2024
Welcome Letter from USPTO Director and Deputy Director
Apr 24, 2024
Application body structured text document
Apr 24, 2024
Application Data Sheet
Apr 24, 2024
Electronic Filing System Acknowledgment Receipt
Apr 24, 2024
Electronic Fee Payment
Apr 24, 2024
Drawings-other than black and white line drawings
Apr 24, 2024
Track One Request
Apr 24, 2024
Power of Attorney
Apr 24, 2024
Specification
Apr 24, 2024
Claims
Apr 24, 2024
Abstract
Apr 24, 2024
Drawings-black and white line and/or other drawings
Apr 24, 2024
Drawings-black and white line and/or other drawings
Apr 24, 2024
Placeholder sheet indicating presence of supplemental content in Supplemental Complex Repository for Examiners(SCORE)
Apr 24, 2024
Placeholder sheet indicating presence of supplemental content in Supplemental Complex Repository for Examiners(SCORE)

US12125496

SANAS.AI INC

Application Number: US18644959
Filing Date: Apr 24, 2024
Status: Granted
Expiry Date: Apr 24, 2044
External Links: Slate, USPTO, Google Patents

IP Verse