Highly Trainable Neural Network Configuration

Patent No. US10984320 (titled "Highly Trainable Neural Network Configuration") was filed by Tesseract Systems Llc on May 1, 2017.

What is this patent about?

’320 is related to the field of computer-based neural networks, specifically addressing the challenge of training very deep networks. Traditional deep neural networks, while theoretically powerful, suffer from optimization difficulties as the number of layers increases. This is due to issues like vanishing gradients, where the error signal diminishes as it propagates back through the network during training, hindering the learning process, especially in early layers.

The underlying idea behind ’320 is to introduce a gating mechanism within each neuron that controls the flow of information. Instead of always applying a non-linear transformation to the input signal, the neuron can selectively allow the original input signal to pass through, unmodified, or pass a transformed version of the input, or a mix of both. This is achieved using transform and carry gates , which determine the weighting between the transformed and non-transformed components.

The claims of ’320 focus on a computer-based method, a neural network architecture, and a computer-readable medium implementing this architecture. The core element is a neuron that receives an input signal, applies a first non-linear transform to produce a 'plain' signal, and then uses two additional non-linear transforms (gates) to produce a 'transform' signal and a 'carry' signal. The neuron then calculates a weighted sum of the original input signal and the 'plain' signal, where the weights are determined by the 'transform' and 'carry' signals.

In practice, this architecture allows for the creation of 'information highways' through the network. By initializing the network to favor the 'carry' behavior, the original input signal can propagate through many layers without significant attenuation. This makes it easier to train very deep networks because the error signal can propagate back more effectively, even through many layers. The transform gate learns to regulate information flow, allowing the network to dynamically choose between transforming the input or simply passing it through.

This approach differs from prior solutions that rely on careful initialization schemes or complex training techniques to overcome the optimization challenges of deep networks. By introducing the gating mechanism, ’320 provides a more robust and flexible way to train networks with virtually arbitrary depth, using standard Stochastic Gradient Descent (SGD) with momentum. This allows for the exploration of deeper architectures and potentially more complex problem-solving capabilities without the limitations imposed by traditional training difficulties.

How does this patent fit in bigger picture?

Technical landscape at the time

In the mid-2010s when ’320 was filed, neural networks were typically implemented using deep architectures, at a time when training very deep networks was non-trivial. Optimization of deep networks was difficult, leading to research on initialization techniques and multi-stage training approaches.

Novelty and Inventive Step

The examiner allowed the claims because amendments made by the applicant overcame previous rejections. Specifically, the examiner was persuaded that the claims, as amended, contained additional elements (applying several non-linear transforms to signals) that were significantly more than the abstract ideas recited in the claims, thus rendering the claims statutory under 35 U.S.C. 101. The examiner also noted that the prior art did not teach the specific transforms and calculations in the context of the methods, neural network, and computer-readable medium as claimed.

Claims

This patent includes 28 claims, with independent claims 1, 15, 26, and 28. The independent claims generally focus on a computer-based method, a computer-based neural network, a computer-readable medium, and another method, all related to facilitating training in a computer-based neural network using non-linear transforms and weighted sums within neurons. The dependent claims generally elaborate on and refine the specifics of the independent claims, adding details and limitations to the method, network, and medium.

Key Claim Terms New

Definitions of key terms used in the patent claims.

Term (Source)Support for SpecificationInterpretation
Carry signal
(Claim 1, Claim 15, Claim 26, Claim 28)
“In yet another aspect, a training method is disclosed that applies to a computer-based neural network. The computer-based neural network includes a plurality of layers of neurons, where each layer of neurons comprises a plurality of neurons. One or more of the neurons includes a means for applying a first non-linear transform to an input signal to produce a plain signal, a first gate configured to apply a second non-linear transform to the input signal to produce a transform signal, and a second gate configured to apply a third non-linear transform to the input signal to produce a carry signal, and a means for calculating a weighted sum of a first component of the input signal and the plain signal.”A signal produced by applying a third non-linear transform to the input signal at a second gate in the neuron.
Plain signal
(Claim 1, Claim 15, Claim 26, Claim 28)
“In one aspect, a computer-based method includes receiving an input signal at a neuron in a computer-based neural network that comprises a plurality of neuron layers, applying a first non-linear transform to the input signal at the neuron to produce a plain signal, and calculating a weighted sum of a first component of the input signal and the plain signal at the neuron.”A signal produced by applying a first non-linear transform to the input signal at a neuron.
Transform signal
(Claim 1, Claim 15, Claim 26, Claim 28)
“In yet another aspect, a training method is disclosed that applies to a computer-based neural network. The computer-based neural network includes a plurality of layers of neurons, where each layer of neurons comprises a plurality of neurons. One or more of the neurons includes a means for applying a first non-linear transform to an input signal to produce a plain signal, a first gate configured to apply a second non-linear transform to the input signal to produce a transform signal, and a second gate configured to apply a third non-linear transform to the input signal to produce a carry signal, and a means for calculating a weighted sum of a first component of the input signal and the plain signal.”A signal produced by applying a second non-linear transform to the input signal at a first gate in the neuron.
Weighted input signal component
(Claim 1, Claim 15, Claim 26, Claim 28)
“The training method includes initializing parameters associated with each of the first, second and/or third of the non-linear transforms for each of a plurality of the neurons in the computer-based neural network, analyzing an input by passing an associated input signal through the computer-based neural network in a forward direction to produce an output signal at an output of the computer-based neural network, propagating an error associated with the output signal back through the computer-based neural network, and determining whether to adjust any parameters (e.g., weights) associated with the computer-based neural network.”A product of the non-transformed first component of the input signal and the carry signal.
Weighted plain signal
(Claim 1, Claim 15, Claim 26, Claim 28)
“The training method includes initializing parameters associated with each of the first, second and/or third of the non-linear transforms for each of a plurality of the neurons in the computer-based neural network, analyzing an input by passing an associated input signal through the computer-based neural network in a forward direction to produce an output signal at an output of the computer-based neural network, propagating an error associated with the output signal back through the computer-based neural network, and determining whether to adjust any parameters (e.g., weights) associated with the computer-based neural network.”A product of the plain signal and the transform signal.

Patent Family

Patent Family

File Wrapper

The dossier documents provide a comprehensive record of the patent's prosecution history - including filings, correspondence, and decisions made by patent offices - and are crucial for understanding the patent's legal journey and any challenges it may have faced during examination.

  • Date

    Description

  • Get instant alerts for new documents

US10984320

TESSERACT SYSTEMS LLC
Application Number
US15582831
Filing Date
May 1, 2017
Status
Granted
Expiry Date
Jan 20, 2040
External Links
Slate, USPTO, Google Patents