Patent No. US11334485 (titled "System And Method For Dynamic Enforcement Of Store Atomicity") was filed by Array Cache Technologies Llc on Dec 16, 2019.
’485 is related to the field of multiprocessor computer architecture, specifically addressing the problem of maintaining store atomicity in systems employing out-of-order execution and store buffers. Modern processors use store buffers to allow cores to continue executing instructions without waiting for stores to be globally visible, which can lead to violations of memory consistency models like Total Store Order (TSO). The patent addresses the challenge of dynamically enforcing store atomicity to ensure program correctness without unduly sacrificing performance.
The underlying idea behind ’485 is to dynamically track potential store atomicity violations arising from store-to-load forwarding . When a core forwards a store from its store buffer to a load, subsequent loads on that core are marked as potentially speculative. This speculation is maintained until the forwarded store becomes globally visible. If, during this speculative period, a younger load experiences a coherence invalidation or cache eviction, it is squashed and re-executed, preventing potential memory ordering violations.
The claims of ’485 focus on a computer system with multiple processor cores, each having a local cache and a store buffer. A key aspect is that a load on a core receiving data from a store in its store buffer temporarily prevents younger loads from committing. Furthermore, if a younger load's address is invalidated or evicted from the cache during this period, the load is squashed. The claims also cover a method for dynamic enforcement of store atomicity, involving executing a load, receiving data from a store buffer, and assigning a speculative state to subsequent loads to prevent premature commitment.
In practice, the invention introduces a new type of speculation called Store-Atomicity Speculative (SA-Speculative). A load becomes SA-Speculative if an older load (in program order) is performed via store-to-load forwarding. The load that receives the forwarded value is not speculative itself, but it makes all subsequent loads potentially speculative. This SA-Speculative state persists until the forwarded store is globally performed. The system uses identifier keys to link stores in the store buffer with a commit gate in the load queue, preventing younger loads from committing until the store is ordered.
This approach differs from prior solutions that treat all loads bypassing unperformed stores as speculative. ’485 only marks loads as speculative when a potential store atomicity violation is detected, minimizing performance overhead. The patent also describes mechanisms for handling squashes, including bulk and on-demand identifier key resets , to ensure that stale keys do not interfere with correct execution. By dynamically enforcing store atomicity only when necessary, the invention provides a more efficient way to maintain memory consistency in out-of-order processors.
In the late 2010s when ’485 was filed, multiprocessor systems commonly relied on shared memory architectures with local caches to improve performance. At a time when store buffers were typically implemented to relax memory ordering constraints, store-to-load forwarding was a known technique to reduce latency. However, hardware or software constraints made ensuring store atomicity non-trivial, especially in out-of-order execution processors.
The examiner allowed the claims because prior art references teach systems with multiple processors and store-to-load forwarding. However, they do not teach or suggest a first processor core load receiving a value from a first processor core store in the store buffer of the first processor core at a first time, and preventing any other first processor core load younger in program order from committing until a second time when the first processor core store is performed globally. Also, the prior art does not teach that between the first time and the second time any other first processor core load younger in program order than the first processor core load and having an address matched by a coherence invalidation caused by another processor core or having an address matched by an eviction in any local cache memory associated with the first processor core is squashed. The examiner also stated that the prior art fails to teach or suggest assigning a speculative state to a subsequent in program order processor core load executed at one of the processor cores in the plurality of processor cores to prevent the subsequent in program order processor core load from committing before the existing processor core store is performed.
This patent contains 21 claims, with independent claims numbered 1, 7, 11, 18, and 20. The independent claims generally focus on a computer system and methods for dynamic enforcement of store atomicity using store buffers and speculative states for processor core loads. The dependent claims generally elaborate on specific features, conditions, and steps related to the computer system and methods described in the independent claims.
Definitions of key terms used in the patent claims.

The dossier documents provide a comprehensive record of the patent's prosecution history - including filings, correspondence, and decisions made by patent offices - and are crucial for understanding the patent's legal journey and any challenges it may have faced during examination.
Date
Description
Get instant alerts for new documents