Patent No. US12143424 (titled "Rapid predictive analysis of very large data sets using the distributed computational graph") on Jul 21, 2024. The application was issued on Nov 12, 2024.
'424 is related to the field of analyzing large datasets using a distributed computational graph. Traditional data analysis pipelines are often linear and limited in their ability to handle complex, changing data streams. The background highlights the increasing volume of data and the need for tools that can analyze this data in real-time, drawing meaningful conclusions and enabling effective action. Existing systems lack the ability to self-assess and adapt to optimize performance.
The underlying idea behind '424 is to create a system that combines real-time processing of streaming data with the ability to retrieve and analyze relevant stored data. This is achieved through a distributed computational graph where data transformations are represented as nodes and data flow as edges. The system monitors its own operations and intermediate results, adapting its function to optimize performance and ensure reliable conclusions. This involves both a streaming pathway and a batch analysis pathway, allowing for predictive analysis based on both current and historical data.
The claims of '424 focus on a distributed computing cluster comprising a first and second plurality of computer systems. The first plurality stores portions of a computational graph, describing the flow of data from a first transformation pipeline to a second. A first computer system receives a stream of input data, processes it using the first pipeline, determines information about the second pipeline, and transmits the output to a second computer system. The second computer system then processes the received data using the second pipeline. A third computer system causes a fourth to execute at least one of the pipelines.
In practice, the system receives streaming data from various sources, filters it, and splits it into two pathways: a streaming pathway and a batch pathway. The streaming pathway uses a transformation pipeline to perform real-time analysis, while the batch pathway stores and analyzes historical data. The system sanity and retrain module monitors the progress of the analysis, optimizing coordination between the two pathways and adapting the system's behavior as needed. The messaging module facilitates communication between the different components, ensuring that the system can respond to changing conditions and new information.
The invention differentiates itself from prior approaches by enabling non-linear transformation pipelines, including afferent branch, efferent branch, and cyclical configurations. This allows for more complex and iterative analyses. The system also incorporates a system sanity and retrain module that monitors the progress of the analysis and adjusts the system's behavior to optimize performance. This self-adaptive capability is crucial for handling the vast amounts of data and the dynamic nature of the analysis tasks.
In the mid-2010s when ’424 was filed, large-scale data analysis was typically implemented using distributed storage and batch processing frameworks that relied on rigid, linear data pipelines. At a time when systems commonly relied on static map-reduce architectures to handle high-volume information, the orchestration of complex, branching, or non-linear transformations across multiple computing clusters was often limited by manual configuration. Software constraints made the real-time synchronization of streaming data with historical batch records non-trivial, as existing tools frequently lacked the native ability to self-modify or dynamically retrain processing logic based on intermediate operational sanity checks.
The examiner allowed the application because the prior art did not disclose the specific multi-cluster architecture described in the claims. While existing systems could handle distributed computations and data pipelines, they did not feature a secondary group of computer systems specifically configured to manage and trigger the execution of software instructions on a primary group of computer systems. Specifically, the prior art lacked the mechanism where a third computer system directs a fourth computer system to run the specific instructions that apply the first and second transformation pipelines to the data streams.
There are 13 claims in total. Claims 1 and 9 are independent. The independent claims focus on a distributed computing cluster comprising multiple computer systems configured to process data streams using transformation pipelines. The dependent claims generally add specific details or limitations to the elements and functionality described in the independent claims.
Definitions of key terms used in the patent claims.
US Latest litigation cases involving this patent.

The dossier documents provide a comprehensive record of the patent's prosecution history - including filings, correspondence, and decisions made by patent offices - and are crucial for understanding the patent's legal journey and any challenges it may have faced during examination.
Get instant alerts for new documents