System And Method For Dynamic Enforcement Of Store Atomicity

Patent No. US11334485 (titled "System And Method For Dynamic Enforcement Of Store Atomicity") was filed by Array Cache Technologies Llc on Dec 16, 2019.

What is this patent about?

’485 is related to the field of multiprocessor computer architecture, specifically addressing the problem of maintaining store atomicity in systems employing out-of-order execution and store buffers. Modern processors use store buffers to allow cores to continue executing instructions without waiting for stores to be globally visible, which can lead to violations of memory consistency models like Total Store Order (TSO). The patent addresses the challenge of dynamically enforcing store atomicity to ensure program correctness without unduly sacrificing performance.

The underlying idea behind ’485 is to dynamically track potential store atomicity violations arising from store-to-load forwarding . When a core forwards a store from its store buffer to a load, subsequent loads on that core are marked as potentially speculative. This speculation is maintained until the forwarded store becomes globally visible. If, during this speculative period, a younger load experiences a coherence invalidation or cache eviction, it is squashed and re-executed, preventing potential memory ordering violations.

The claims of ’485 focus on a computer system with multiple processor cores, each having a local cache and a store buffer. A key aspect is that a load on a core receiving data from a store in its store buffer temporarily prevents younger loads from committing. Furthermore, if a younger load's address is invalidated or evicted from the cache during this period, the load is squashed. The claims also cover a method for dynamic enforcement of store atomicity, involving executing a load, receiving data from a store buffer, and assigning a speculative state to subsequent loads to prevent premature commitment.

In practice, the invention introduces a new type of speculation called Store-Atomicity Speculative (SA-Speculative). A load becomes SA-Speculative if an older load (in program order) is performed via store-to-load forwarding. The load that receives the forwarded value is not speculative itself, but it makes all subsequent loads potentially speculative. This SA-Speculative state persists until the forwarded store is globally performed. The system uses identifier keys to link stores in the store buffer with a commit gate in the load queue, preventing younger loads from committing until the store is ordered.

This approach differs from prior solutions that treat all loads bypassing unperformed stores as speculative. ’485 only marks loads as speculative when a potential store atomicity violation is detected, minimizing performance overhead. The patent also describes mechanisms for handling squashes, including bulk and on-demand identifier key resets , to ensure that stale keys do not interfere with correct execution. By dynamically enforcing store atomicity only when necessary, the invention provides a more efficient way to maintain memory consistency in out-of-order processors.

How does this patent fit in bigger picture?

Technical landscape at the time

In the late 2010s when ’485 was filed, multiprocessor systems commonly relied on shared memory architectures with local caches to improve performance. At a time when store buffers were typically implemented to relax memory ordering constraints, store-to-load forwarding was a known technique to reduce latency. However, hardware or software constraints made ensuring store atomicity non-trivial, especially in out-of-order execution processors.

Novelty and Inventive Step

The examiner allowed the claims because prior art references teach systems with multiple processors and store-to-load forwarding. However, they do not teach or suggest a first processor core load receiving a value from a first processor core store in the store buffer of the first processor core at a first time, and preventing any other first processor core load younger in program order from committing until a second time when the first processor core store is performed globally. Also, the prior art does not teach that between the first time and the second time any other first processor core load younger in program order than the first processor core load and having an address matched by a coherence invalidation caused by another processor core or having an address matched by an eviction in any local cache memory associated with the first processor core is squashed. The examiner also stated that the prior art fails to teach or suggest assigning a speculative state to a subsequent in program order processor core load executed at one of the processor cores in the plurality of processor cores to prevent the subsequent in program order processor core load from committing before the existing processor core store is performed.

Claims

This patent contains 21 claims, with independent claims numbered 1, 7, 11, 18, and 20. The independent claims generally focus on a computer system and methods for dynamic enforcement of store atomicity using store buffers and speculative states for processor core loads. The dependent claims generally elaborate on specific features, conditions, and steps related to the computer system and methods described in the independent claims.

Key Claim Terms New

Definitions of key terms used in the patent claims.

Term (Source)Support for SpecificationInterpretation
Coherence invalidation
(Claim 1)
“A first processor core load on a first processor core receives a value at a first time from a first processor core store in the store buffer of the first processor core and prevents any other first processor core load younger than the first processor core load in program order from committing until at the earliest a second time when the first processor core store is performed globally in the memory system and visible to all cores. In addition, between the first time and the second time any other first processor core load younger in program order than the first processor core load and having an address matched by a coherence invalidation caused by another processor core or having an address matched by an eviction in any local cache memory associated with the first processor core is squashed.”An event caused by another processor core that invalidates a cache line in a local cache memory.
Identifier key
(Claim 11, Claim 18)
“In one embodiment, an identifier key or a unique identifier key encoding a load order is associated with the first processor core load and the first processor core store. Identifier keys can be reused; however, a given identifier key is uniquely associated with a given core or load until that identifier key is removed. In one embodiment, the existing processor core store is marked with an identifier key encoding the given load, and the given processor core load is marked as a store-to-load forwarding processor core load.”A key encoding a load order, associated with a processor core load and a processor core store. It can be reused, but is uniquely associated with a given core or load until removed.
Speculative state
(Claim 7, Claim 20)
“A given processor core load is executed at a given time and at a given processor core in a plurality of processor cores. The given processor core load is associated with a first memory location. Data for the given processor core load are received from an existing processor core store associated with the first memory location and located in a store buffer associated with the given processor core. A speculative state is assigned to a subsequent in program order processor core load executed at one of the processor cores in the plurality of processor cores to prevent the subsequent in program order processor core load from committing before the existing processor core store is performed.”A state assigned to a processor core load to prevent it from committing before an existing processor core store is performed.
Store buffer
(Claim 1, Claim 7, Claim 18, Claim 20)
“A separate store buffer is associated with and operatively coupled to each processor core for placing processor core stores from the associated processor core when the processor core stores are executed, i.e., by the processor core, and before the processor core stores are performed in the shared memory. In one embodiment, each store buffer includes committed processor core stores and uncommitted processor core stores. Exemplary embodiments utilize store buffers and Total Store Order (TSO) to relax the store→load order to accommodate the store buffers.”A storage area associated with each processor core that holds processor core stores before they are performed in the shared memory. It can contain committed and uncommitted stores.
Store-to-load forwarding
(Claim 18)
“Implementations of TSO such as x86-TSO allow a load to be performed with a store-to-load forwarding from the store buffer. If the load is at the head of the ROB, that load can commit. Allowing a core to see its own stores and letting a load take its value from the most recent matching store in the store buffer, if such a store exists, is known as store-to-load forwarding, and store-to-load forwarding is used to safeguard the local thread's sequential program semantics while achieving high performance.”A process where a processor core load receives data from a store in the store buffer of the same processor core before that store is globally visible.

Patent Family

Patent Family

File Wrapper

The dossier documents provide a comprehensive record of the patent's prosecution history - including filings, correspondence, and decisions made by patent offices - and are crucial for understanding the patent's legal journey and any challenges it may have faced during examination.

  • Date

    Description

  • Get instant alerts for new documents

US11334485

ARRAY CACHE TECHNOLOGIES LLC
Application Number
US16715771
Filing Date
Dec 16, 2019
Status
Granted
Expiry Date
Nov 18, 2040
External Links
Slate, USPTO, Google Patents