RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet
Tobias Hartmann
thartmann at openjdk.org
Tue May 9 08:53:35 UTC 2023
On Mon, 8 May 2023 15:52:56 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
>> [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`:
>>
>> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482
>>
>> while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`:
>>
>> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468
>>
>> As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded.
>>
>> Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account:
>>
>> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301
>>
>> A loaded type can therefore be replaced by an unloaded type during GVN.
>>
>> In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert).
>>
>> Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them.
>>
>> The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running.
>>
>> In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build.
>>
>> Thanks,
>> Tobias
>
> It seems to me that if 2 threads access a variable racily without using atomic, it results in undefined behaviours. This patch removes the logic and therefore removes the race condition. Is there any risk with that in the surrounding code, too? I see you reordered `_hash_computed` and `_exact_klass_computed` assignment, if these reorders matter, should they be at least releasing stores instead? Thanks a lot.
Thanks for looking at this, @merykitty. I did these reorders because I found it weird that we set `..._computed` before initializing the corresponding fields. I thought that for `_hash` and `_exact_klass` it shouldn't really matter if they are racily accessed.
On second thought though, I think you are right that this could lead to undefined behavior, especially if the C++ compiler decides to reorder again. One thread could then observe `_exact_klass_computed == true` and read garbage from `_exact_klass` because another thread is still in the process of computing and setting `_exact_klass`.
Since lock-free data structures are hard to get rid, and the overhead would only be needed for a few shared types, I propose to eagerly compute `_hash` and `_exact_klass`. I pushed an updated fix.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/13868#issuecomment-1539709643
More information about the hotspot-compiler-dev
mailing list