RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3]

Wed Oct 23 05:38:11 UTC 2024

On Tue, 22 Oct 2024 19:04:16 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2234:
>> 
>>> 2232:     retry_fast_path = true;
>>> 2233:   } else {
>>> 2234:     relativize_chunk_concurrently(chunk);
>> 
>> Is the `relativize_chunk_concurrently` solution to the race only to have a single flag read in `can_thaw_fast` or is there some other subtlety here? 
>> 
>> While not required for the PR, if it is just to optimise the `can_thaw_fast` check, it can probably be made to work with one load and still allow concurrent gcs do fast_thaw when we only get here due to a lockstack.
>
> Yes, it's just to do a single read. I guess you are thinking of combining flags and lockStackSize into a int16_t?

Something along those lines, yes.

>> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2247:
>> 
>>> 2245:     _thread->lock_stack().move_from_address(tmp_lockstack, lockStackSize);
>>> 2246: 
>>> 2247:     chunk->set_lockstack_size(0);
>> 
>> After some discussion here at the office we think there might be an issue here with simply hiding the oops without clearing them. Below in `recurse_thaw` we `do_barriers`. But it does not touch these lockstack. Missing the SATB store barrier is probably fine from a liveness perspective, because the oops in the lockstack must also be in the frames. But removing the oops without a barrier and clear will probably lead to problems down the line.
>> 
>> Something like the following would probably handle this. Or even fuse the `copy_lockstack` and `clear_lockstack` together into some kind of `transfer_lockstack` which both loads and clears the oops.
>> 
>> 
>> diff --git a/src/hotspot/share/oops/stackChunkOop.cpp b/src/hotspot/share/oops/stackChunkOop.cpp
>> index d3d63533eed..f737bd2db71 100644
>> --- a/src/hotspot/share/oops/stackChunkOop.cpp
>> +++ b/src/hotspot/share/oops/stackChunkOop.cpp
>> @@ -470,6 +470,28 @@ void stackChunkOopDesc::copy_lockstack(oop* dst) {
>>    }
>>  }
>>  
>> +void stackChunkOopDesc::clear_lockstack() {
>> +  const int cnt = lockstack_size();
>> +  const bool requires_gc_barriers = is_gc_mode() || requires_barriers();
>> +  const bool requires_uncompress = has_bitmap() && UseCompressedOops;
>> +  const auto clear_obj = [&](intptr_t* at) {
>> +    if (requires_uncompress) {
>> +      HeapAccess<>::oop_store(reinterpret_cast<narrowOop*>(at), nullptr);
>> +    } else {
>> +      HeapAccess<>::oop_store(reinterpret_cast<oop*>(at), nullptr);
>> +    }
>> +  };
>> +
>> +  if (requires_gc_barriers) {
>> +    intptr_t* lockstack_start = start_address();
>> +    for (int i = 0; i < cnt; i++) {
>> +      clear_obj(&lockstack_start[i]);
>> +    }
>> +  }
>> +  set_lockstack_size(0);
>> +  set_has_lockstack(false);
>> +}
>> +
>>  void stackChunkOopDesc::print_on(bool verbose, outputStream* st) const {
>>    if (*((juint*)this) == badHeapWordVal) {
>>      st->print_cr("BAD WORD");
>> diff --git a/src/hotspot/share/oops/stackChunkOop.hpp b/src/hotspot/share/oops/stackChunkOop.hpp
>> index 28e0576801e..928e94dd695 100644
>> --- a/src/hotspot/share/oops/stackChunkOop.hpp
>> +++ b/src/hotspot/share/oops/stackChunkOop.hpp
>> @@ -167,6 +167,7 @@ class stackChunkOopDesc : public instanceOopDesc {
>>    void fix_thawed_frame(const frame& f, const RegisterMapT* map);
>>  
>>    void copy_lo...
>
> Ok, I'll change copy_lockstack to both load and clear the oops in the same method. Now, when we call do_barriers on recurse_thaw we don't clear the oops, we just load and store the loaded value again. Is it the case that we just need to do a store, so that already works, or are we missing clearing the oops from the copied frames?

The store is the important part for SATB. The fact that do_barriers (only) does a self store seems is an optimisation. As we need to do the store before  we do the copy (to enable a plane memcpy). And clearing is not something that we rely on / need at the moment.  The nicest model would have been to first fix the oops, (mem)copy, then clear them.  But as mentioned, clearing is currently unnecessary. For the lockstack we do not need this optimisation as we do the copy when we do the load barrier. So we can just clear in our store.

It is a little interesting that we template parameterise `do_barriers` on the barrier type and instantiate all the load functions, while only ever using the store version. Guess it is a remnant from some earlier model.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811903902
PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811900946