Rapid predictive analysis of very large data sets using the distributed computational graph

Patent No. US12143424 (titled "Rapid predictive analysis of very large data sets using the distributed computational graph") on Jul 21, 2024. The application was issued on Nov 12, 2024.

'424 is related to the field of analyzing large datasets using a distributed computational graph. Traditional data analysis pipelines are often linear and limited in their ability to handle complex, changing data streams. The background highlights the increasing volume of data and the need for tools that can analyze this data in real-time, drawing meaningful conclusions and enabling effective action. Existing systems lack the ability to self-assess and adapt to optimize performance.

The underlying idea behind '424 is to create a system that combines real-time processing of streaming data with the ability to retrieve and analyze relevant stored data. This is achieved through a distributed computational graph where data transformations are represented as nodes and data flow as edges. The system monitors its own operations and intermediate results, adapting its function to optimize performance and ensure reliable conclusions. This involves both a streaming pathway and a batch analysis pathway, allowing for predictive analysis based on both current and historical data.

The claims of '424 focus on a distributed computing cluster comprising a first and second plurality of computer systems. The first plurality stores portions of a computational graph, describing the flow of data from a first transformation pipeline to a second. A first computer system receives a stream of input data, processes it using the first pipeline, determines information about the second pipeline, and transmits the output to a second computer system. The second computer system then processes the received data using the second pipeline. A third computer system causes a fourth to execute at least one of the pipelines.

In practice, the system receives streaming data from various sources, filters it, and splits it into two pathways: a streaming pathway and a batch pathway. The streaming pathway uses a transformation pipeline to perform real-time analysis, while the batch pathway stores and analyzes historical data. The system sanity and retrain module monitors the progress of the analysis, optimizing coordination between the two pathways and adapting the system's behavior as needed. The messaging module facilitates communication between the different components, ensuring that the system can respond to changing conditions and new information.

The invention differentiates itself from prior approaches by enabling non-linear transformation pipelines, including afferent branch, efferent branch, and cyclical configurations. This allows for more complex and iterative analyses. The system also incorporates a system sanity and retrain module that monitors the progress of the analysis and adjusts the system's behavior to optimize performance. This self-adaptive capability is crucial for handling the vast amounts of data and the dynamic nature of the analysis tasks.

How does this patent fit in bigger picture?

Technical Landscape

In the mid-2010s when ’424 was filed, large-scale data analysis was typically implemented using distributed storage and batch processing frameworks that relied on rigid, linear data pipelines. At a time when systems commonly relied on static map-reduce architectures to handle high-volume information, the orchestration of complex, branching, or non-linear transformations across multiple computing clusters was often limited by manual configuration. Software constraints made the real-time synchronization of streaming data with historical batch records non-trivial, as existing tools frequently lacked the native ability to self-modify or dynamically retrain processing logic based on intermediate operational sanity checks.

Prosecution Position

The examiner allowed the application because the prior art did not disclose the specific multi-cluster architecture described in the claims. While existing systems could handle distributed computations and data pipelines, they did not feature a secondary group of computer systems specifically configured to manage and trigger the execution of software instructions on a primary group of computer systems. Specifically, the prior art lacked the mechanism where a third computer system directs a fourth computer system to run the specific instructions that apply the first and second transformation pipelines to the data streams.

Claims

There are 13 claims in total. Claims 1 and 9 are independent. The independent claims focus on a distributed computing cluster comprising multiple computer systems configured to process data streams using transformation pipelines. The dependent claims generally add specific details or limitations to the elements and functionality described in the independent claims.

Key Claim Terms New

Definitions of key terms used in the patent claims.

Term (Source)	Support for Specification	Interpretation
Distributed computational graph (Claim 1)	Combinations of results from the batch pathway, partial and streaming output results from the transformation pipeline, administrative directives from the authors of the analysis as well as operational status messages from components of the distributed computational graph are used to perform system sanity checks and retraining of one or more of the modules of the system 1606.	A graph that describes the flow of output data from a first transformation pipeline to the input of a second transformation pipeline.
First input feed (Claim 1, Claim 9)	The data receipt software module: receives streams of input from one or more of a plurality of data sources, and sends the data stream to the data filter module.	A source of a first stream of input data.
First pipeline output messages (Claim 1, Claim 9)	Combinations of results from the batch pathway, partial and streaming output results from the transformation pipeline, administrative directives from the authors of the analysis as well as operational status messages from components of the distributed computational graph are used to perform system sanity checks and retraining of one or more of the modules of the system 1606.	Data generated by applying the first transformation pipeline to the first stream of input data.
First transformation pipeline (Claim 1, Claim 9)	This framework information enables steps to be taken and notifications to be passed if individual transformation nodes 640 within a transformation pipeline 600 become unresponsive during analysis operations.	A pipeline of software instructions applied to a first stream of input data to generate first pipeline output messages.
Second transformation pipeline (Claim 1, Claim 9)	This framework information enables steps to be taken and notifications to be passed if individual transformation nodes 640 within a transformation pipeline 600 become unresponsive during analysis operations.	A pipeline of software instructions applied to the first pipeline output messages to generate second pipeline output messages.

Litigation Cases New

US Latest litigation cases involving this patent.

Case Number	Filing Date	Title
1:25-cv-01383	Nov 14, 2025	Astellas Pharma Inc. v. Renata Limited
2:25-cv-00913	Oct 14, 2025	VeriDoc Systems LLC v. PlanRadar Inc

Patent Family

File Wrapper

The dossier documents provide a comprehensive record of the patent's prosecution history - including filings, correspondence, and decisions made by patent offices - and are crucial for understanding the patent's legal journey and any challenges it may have faced during examination.

Get instant alerts for new documents

US12143424

Application Number: US18779029A
Filing Date: Jul 21, 2024
Publication Date: Nov 12, 2024
External Links: Slate, USPTO, Google Patents

IP Verse