RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe

Thu Nov 30 10:59:06 UTC 2023

On Tue, 28 Nov 2023 06:23:29 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I'm making sure that `allocate_bci_to_data` is only called when holding the `extra_data_lock`, so that no concurrent calls of it can ever occur.
> 
> Testing: tier1-3 and stress.

Update:
The current fix is not sufficient. I'll summarize my state of knowledge:

We have these accesses to the extra data:
- read / update of a record (race conditions for counter updates are acceptable, they just make the result imprecise but not incorrect).
- allocating a new record
- cleaning out stale records

I have had conversations with @fisk and @rwestrel about which can happen concurrently. There may have been a time when cleaning only happened during SafePoint, and hence there probably was no concurrent read / allocation with it. But this has changed, certainly with ZGC, but also with `WB_ClearMethodState` (call clearing at any time).

The data structure uses the concept of "monotonicity" to ensure that reads are safe, even with concurrent allocations. The idea is that new records are only appended (into the section with all `DataLayout::arg_info_data_tag`).
https://github.com/openjdk/jdk/blob/a5ccd3beaf069bdfe81736f6c62e5b4b9e18b5fe/src/hotspot/share/oops/methodData.cpp#L1438-L1445
We wrap the extra data in `BitData` or `SpeculativeTrapData` objects, which are nothing but pointers to the extra data array/buffer.

Of course concurrent allocations have to be made mutually exclusive, hence we need the `extra_data_lock`. And @tkrodriguez has found an instance where a lock was not taken, hence this bug report.

The problem now gets more complicated with cleaning. It removes records, and shifts other records "to the left" to compact the extra data. This certainly breaks monotonicity. This would be ok if there could be no concurrent read/update or allocation. But that seems certainly not to hold anymore now.
https://github.com/openjdk/jdk/blob/a5ccd3beaf069bdfe81736f6c62e5b4b9e18b5fe/src/hotspot/share/oops/methodData.cpp#L1737-L1756
We can protect all of these with locks. But that still is not good enough if there are concurrent reads, which make assumptions about the offsets in the extra data array/buffer. Imagine this:

Thread A: starts iterating over the extra data.
Thread B: cleans out some stale records, and shifts around things.
Thread A: continues iterating, with offsets that are now possibly not correct any more.

I have another concern about what we do today during allocation:
https://github.com/openjdk/jdk/blob/a5ccd3beaf069bdfe81736f6c62e5b4b9e18b5fe/src/hotspot/share/oops/methodData.cpp#L1503-L1532
We take the lock, and append a new record.
But is this writing done safely, such that a concurrent read would be safe? Because currently it must be.
We do `set_header`, which does a `u8` store to the extra data, including the tag. But the payload object is stored separately. I wonder what here is guaranteed to be atomic, maybe the header is stored atomically. But a concurrent read could certainly find complete garbage instead of the (not yet written) payload object. What if a read thus sees the tag for a `SpeculativeTrapData`, and we read its `data->method()` field, and it is garbage but not nullptr?

In the light of this, I see 2 alternatives:
1. add locks everywhere, and make sure no `ProfileData* pdata` pointer leaks out of a lock.
2.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16840#issuecomment-1833529561