RFR: JDK-8317661: [REDO] store/load order not preserved when handling memory pool due to weakly ordered memory architecture of aarch64

Damon Fenacci dfenacci at openjdk.org
Wed Oct 18 10:22:13 UTC 2023


On Wed, 18 Oct 2023 05:55:10 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> # Issue
>> An intermittent _Memory Pool not found_ error has been noticed when running  a few tests (_vmTestbase/vm/mlvm/meth/stress/compiler/deoptimize/Test.java_, _vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java_) on _macosx_aarch64_ (production build) with non-segmented code cache.
>> 
>> ## Origin
>> The issue originates from the fact that aarch64 architecture is a weakly ordered memory architecture, i.e. it _permits the observation and completion of memory accesses in a different order from the program order_.
>> 
>> More precisely: while calling `CodeHeapPool::get_memory_usage`, the `used` and `committed` variables are retrieved
>> https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/services/memoryPool.cpp#L181-L182
>> and these are computed based on different variables saved in memory in `CodeCache::allocate` (during `heap->allocate` and `heap->expand_by` to be precise) .https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/code/codeCache.cpp#L535-L537
>> The problem happens when first `heap->expand_by` gets called (which _increases_ `committed`) and then `heap->allocate` gets called in a second loop pass (which _increases_ `used`). Although stores in `CodeCache::allocate` happen in the this order, when reading from memory in `CodeHeapPool::get_memory_usage` it can happen that `used` has the newly computed value, while `committed` is still "old" (because of ARM’s weak memory order). This is a problem, since `committed` must be > than `used`.
>> 
>> # Solution
>> 
>> To avoid this situation we must assure that values used to calculate `committed` are actually saved before the values used to calculate `used` and that the opposite be true for reading. To enforce this we add an `acquire/release` ordering for `CodeHeap::_next_segment` ensuring that when `CodeHeap::allocated_capacity` is called, if `_next_segment` (which is used to calculate `used`) has already been updated in `CodeHeap::allocate`, then all values written before are going to be visible (the ones to calculate `committed`).
>> 
>> Acquiring a `CodeCache_lock` has been attempted in #15819 but resulted in deadlocks (that seem to be unavoidable).
>
> src/hotspot/share/memory/heap.cpp line 306:
> 
>> 304:     block = block_at(_next_segment);
>> 305:     block->initialize(number_of_segments);
>> 306:     Atomic::release_store(&_next_segment, _next_segment + number_of_segments);
> 
> When we do the release here, what previous stores are we trying to order with the reading code below? It is hard to see the complete picture with this code just from the PR changes.

@dholmes-ora you’re right, it is quite difficult to get the whole picture: let me try to make it a bit clearer (I hope it makes sense).

The issue originates here:
https://github.com/openjdk/jdk/blob/138542de7889e8002df0e15a79e31d824c6a0473/src/hotspot/share/services/memoryPool.cpp#L181-L182
We want to ensure that the values used to calculate `committed` are actually saved before the values used to calculate `used` are used.

The problem starts in  `CodeCache::allocate`. First `CodeHeap::allocate` is called and if there is not enough space we call `CodeHeap::expand_by` and retry `CodeHeap::allocate` (while loop):
https://github.com/openjdk/jdk/blob/e510dee162612d9a706ba54d0ab79a49139e77d8/src/hotspot/share/code/codeCache.cpp#L535-L537
During the second call of `CodeHeap::allocate` we update the values used to calculate the `used` variable (`_next_segment` in particular since it is the last to be updated) and we want to make sure that when we read it, all earlier values have been written.

As for the link between `used` and `_next_segment`, `used` is calculated here
https://github.com/openjdk/jdk/blob/e510dee162612d9a706ba54d0ab79a49139e77d8/src/hotspot/share/memory/heap.cpp#L556 from `_next_segment` and `_freelist_segments`. `_next_segment` is the last to be written and it is updated here
https://github.com/openjdk/jdk/blob/e510dee162612d9a706ba54d0ab79a49139e77d8/src/hotspot/share/memory/heap.cpp#L306 which is called from `CodeCache::allocate`
https://github.com/openjdk/jdk/blob/e510dee162612d9a706ba54d0ab79a49139e77d8/src/hotspot/share/code/codeCache.cpp#L535

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16143#discussion_r1363622265


More information about the hotspot-runtime-dev mailing list