RFR: 8262377: Parallel class resolution loses constant pool error

Fri Mar 5 16:02:42 UTC 2021

On Fri, 5 Mar 2021 06:20:12 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> This PR was to originally add some tests for parallel class loading situations that aren't covered in our internal parallel class loading tests.  The tests found that class loading resolution errors weren't saving the error in the constant pool to implement JVMS 5.4.3.  The compiler was also doing re-resolution rather than using the error saved at that constant pool index.
>> 
>> One of the existing CDS tests verified that the Throwable.cause so this change also adds the cause and cause message to the resolution_errors() saved exceptions.
>> 
>> I didn't squash the commits so it would be easier to see the different changes, but they all go together.
>> 
>> The test description:
>> 
>> Two Threads T1, T2
>> 
>> Three definitions of class A, defined by user defined class loader
>> Class A extends B extends A (CCE)
>> Class A extends B
>> Class A extends C
>> 
>> Five modes:
>> Sequential
>> Concurrent loading with user defined class loader
>> Concurrent loading parallelCapable class loader
>> Wait when loading the superclass with parallelCapable class loader
>> Wait when loading the superclass with user defined class loader
>> 
>> In all cases, after A is parsed and calls resolve_super_or_fail to load B
>> and loading B waits.  Classes ClassInLoader, CP1 and CP2 provide
>> constant pool references to A.
>> 
>> In all cases, when B waits, A is replaced with bytes so A extends C.
>> 
>> Two tests x 3 modes (both threads do the same):
>> (CCE) First test A extends B, which throws CCE.
>>    -- All three modes: first constant pool reference throws CCE, second reference A extends C
>> (B) Second test A extends B which doesn't throw CCE.
>>    -- All three modes: both references A extends B.
>> 
>> The code in SystemDictionary::handle_parallel_super_load treats the parallel case for thread T2 as if T1
>> is not stalled and wins the race to load the class, by attempting to load the same superclass as T1 is
>> currently loading.  
>> 
>> Resolution for a constant pool reference should always fail with the same error even if there are concurrent threads doing that resolution.  Forcing the second thread to resolve the super class of the first, even if the thread has a different set of bytes for the class A, is a way to do that, but this actually exposed that the second successful thread should check the result of the constant pool resolution for the first. So this exposed this bug.
>> 
>> Tested with tier1, on all Oracle supported platforms and tier2-8 on linux-x64-debug and windows-x64-debug.
>
> src/hotspot/share/oops/constantPool.cpp line 555:
> 
>> 553:     throw_resolution_error(this_cp, which, CHECK_NULL);
>> 554:   }
>> 555: 
> 
> I'm unclear how this race is resolved. There must be a serialization point between the two threads otherwise this re-check is just as racy as the original. We need to know that the CP entry is now stable and was resolved either by this thread or the other one. But I can't see where this serialization point arises. ??

Yes this is the wrong commit.  Sorry I thought it would help to have the separate commits.  The first commit should only add the test.

> src/hotspot/share/oops/constantPool.cpp line 555:
> 
>> 553: 
>> 554:   Klass** adr = this_cp->resolved_klasses()->adr_at(resolved_klass_index);
>> 555:   Atomic::release_store(adr, k);
> 
> If we are racing then isn't it the case that we may not have an entry in resolved_klasses()?

The order is: add the klass to resolved_klasses() and then set the tag.  We need to check the tag in order to see whether the klass is correct or not.  This is the way it worked before resolved_klasses() was added, but there were a couple of shortcuts to just check the klass != NULL.  With the race to set UnresolvedClassInError, we need to check the tag first again, because the klass is set to null if the unresolved class has won the race.

> src/hotspot/share/oops/constantPool.cpp line 569:
> 
>> 567:     if (old_tag == JVM_CONSTANT_UnresolvedClassInError) {
>> 568:       // Remove klass.
>> 569:       Atomic::release_store(adr, (Klass*)NULL);
> 
> This is all very unclear to me. The CAS above provides a serialization point for the racing threads, but prior to that we have already published the klass into adr and now we are saying "oops I'd better undo that", but surely it is too late as we have let it escape. ??

It hasn't escaped yet for *this* location in the code, which is what my reading of the JVMS requires.  The newly loaded class is in the SystemDictionary and can be used for future resolutions (see CP2 in the test).  If I'm reading the JVMS incorrectly though, we've got a lot more (invasive) work to do to prevent a class loader from loading a correct class after it's tried to load one with an error.

> src/hotspot/share/oops/constantPool.cpp line 545:
> 
>> 543:     // To preserve old behavior, we return the resolved class.
>> 544:     Klass* klass = this_cp->resolved_klasses()->at(resolved_klass_index);
>> 545:     assert(klass != NULL, "must be resolved if exception was cleared");
> 
> Existing code but this seems a bizarre thing to do. The only way the two resolving threads can disagree about the resolution is if we have a bad classloader. With a bad classloader it is hard to justify any argument about what should happen, so it would have seemed preferrable to me to always report the error. It is a race so the outcome is arbitrary to begin with.

Yes, you need a non well behaved class loader.  I think the key point is that the *first* outcome is the one that must be saved as the resolution for this constant pool index.  This bug was that if the first was already saved as an error, we were overwriting it.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2718