JDK-8230459: Test failed to resume JVMCI CompilerThread

David Holmes david.holmes at oracle.com
Tue Nov 5 23:48:13 UTC 2019


Hi Martin,

On 5/11/2019 11:37 pm, Doerr, Martin wrote:
> Hi David
> 
>> With JVMCI compiler threads, each getting a new j.l.Thread oop that
>> lasts for the lifetime of that compiler thread (just like a regular
>> JavaThread) do we even actually need these arrays? I'm unclear what
>> purpose they serve when we are not trying to reuse the oops stored in
>> the array. ??
> 
> Compiler threads can lookup j.l.Thread objects of live compilers by iterating over the arrays.
> That's used to find the last compiler alive or to find a log instance for a compiler.
> Could get designed differently, but that would make the change even bigger.

Yes but given we don't have a clean working fix for current arrangement ...

Is hacking into oopStorage internals the only way forward?

Thanks,
David

> 
> Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Dienstag, 5. November 2019 13:33
>> To: Doerr, Martin <martin.doerr at sap.com>; dean.long at oracle.com; Kim
>> Barrett <kim.barrett at oracle.com>
>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>
>> On 5/11/2019 6:40 pm, Doerr, Martin wrote:
>>> Hi David
>>>
>>>> I don't understand what you mean. If a compiler thread holds an oop, any
>>>> oop, it must hold it in a Handle to ensure it can't be gc'd.
>>>
>>> The problem is not related to gc.
>>> My change introduces destroy_global for the handles. This means that the
>> OopStorage portion which has held the oop can get freed.
>>> However, other compiler threads are running concurrently. They may
>> execute code which reads the oop from the handle which is freed by this
>> thread. Reading stale data is not a problem here, but reading freed memory
>> may assert or even crash in general.
>>> I can't see how OopStorage supports reading from handles which were
>> freed by destroy_global.
>>
>> With JVMCI compiler threads, each getting a new j.l.Thread oop that
>> lasts for the lifetime of that compiler thread (just like a regular
>> JavaThread) do we even actually need these arrays? I'm unclear what
>> purpose they serve when we are not trying to reuse the oops stored in
>> the array. ??
>>
>> David
>> -----
>>
>>> I think it would be safe if the freeing only occurred at safepoints, but I don't
>> think this is the case.
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> Sent: Dienstag, 5. November 2019 00:19
>>>> To: Doerr, Martin <martin.doerr at sap.com>; dean.long at oracle.com; Kim
>>>> Barrett <kim.barrett at oracle.com>
>>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>>>
>>>> On 4/11/2019 9:12 pm, Doerr, Martin wrote:
>>>>> Hi all,
>>>>>
>>>>> @Dean
>>>>>> changing can_remove() to CompileBroker::can_remove()?
>>>>> Yes. That would be an option.
>>>>>
>>>>> @Kim, David
>>>>> I think there's another problem with this implementation.
>>>>> It introduces a use-after-free pattern due to concurrency.
>>>>> Compiler threads may still read the oops from the handles after one of
>>>> them has called destroy_global until next safepoint. It doesn't matter
>> which
>>>> values they get in this case, but the VM should not crash. I believe that
>>>> OopStorage allows freeing storage without safepoints, so this may be
>>>> unsafe. Right?
>>>>
>>>> I don't understand what you mean. If a compiler thread holds an oop, any
>>>> oop, it must hold it in a Handle to ensure it can't be gc'd.
>>>>
>>>> David
>>>>
>>>>> If so, I think replacing the oops in the handles (and keeping the handles
>>>> alive) would be better. And also much more simple.
>>>>>
>>>>> Best regards,
>>>>> Martin
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: dean.long at oracle.com <dean.long at oracle.com>
>>>>>> Sent: Samstag, 2. November 2019 08:36
>>>>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>>>>>> <david.holmes at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
>>>>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>>>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
>> dev at openjdk.java.net
>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>>>>>
>>>>>> Hi Martin,
>>>>>>
>>>>>> On 10/30/19 3:18 AM, Doerr, Martin wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>>> I don't think factoring out CompileBroker::clear_compiler2_object
>>>> when
>>>>>>>> it is only used once was warranted, but that's call for compiler team
>> to
>>>>>>>> make.
>>>>>>> I did that because _compiler2_objects is private and there's currently
>> no
>>>>>> setter available.
>>>>>>> But let's see what the compiler folks think.
>>>>>>
>>>>>> how about changing can_remove() to CompileBroker::can_remove()?
>>>> Then
>>>>>> you
>>>>>> can access _compiler2_objects directly, right?
>>>>>>
>>>>>> dl
>>>>>>>> Otherwise changes seem fine and I have noted the use of the
>>>>>>>> MutexUnlocker as per your direct email.
>>>>>>> Thanks a lot for reviewing. It was not a trivial one ��
>>>>>>>
>>>>>>> You had noticed an incorrect usage of the CHECK macro. I've created a
>>>> new
>>>>>> bug for that:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8233193
>>>>>>> Would be great if you could take a look if that's what you meant and
>>>> made
>>>>>> adaptions if needed.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>>>> Sent: Mittwoch, 30. Oktober 2019 05:47
>>>>>>>> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
>>>>>>>> <kim.barrett at oracle.com>
>>>>>>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>>>>>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
>>>> dev at openjdk.java.net;
>>>>>>>> David Holmes <David.Holmes at oracle.com>
>>>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
>> CompilerThread
>>>>>>>>
>>>>>>>> Hi Martin,
>>>>>>>>
>>>>>>>> On 29/10/2019 12:06 am, Doerr, Martin wrote:
>>>>>>>>> Hi David and Kim,
>>>>>>>>>
>>>>>>>>> I think it's easier to talk about code. So here's a new webrev:
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
>>>>>>>> ev.03/
>>>>>>>>
>>>>>>>> I don't think factoring out CompileBroker::clear_compiler2_object
>>>> when
>>>>>>>> it is only used once was warranted, but that's call for compiler team
>> to
>>>>>>>> make. Otherwise changes seem fine and I have noted the use of the
>>>>>>>> MutexUnlocker as per your direct email.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> @Kim:
>>>>>>>>> Thanks for looking at the handle related parts. It's ok if you don't
>> want
>>>> to
>>>>>>>> be a reviewer of the whole change.
>>>>>>>>>> I think it's weird that can_remove() is a predicate with optional
>> side
>>>>>>>>>> effects.  I think it would be simpler to have it be a pure predicate,
>>>>>>>>>> and have the one caller with do_it = true perform the updates.
>> That
>>>>>>>>>> should include NULLing out the handle pointer (perhaps debug-
>> only,
>>>>>> but
>>>>>>>>>> it doesn't cost much to cleanly maintain the data structure).
>>>>>>>>> Nevertheless, it has the advantage that it enforces the update to
>> be
>>>>>>>> consistent.
>>>>>>>>> A caller could use it without holding the lock or mess it up
>> otherwise.
>>>>>>>>> In addition, I don't what to change that as part of this fix.
>>>>>>>>>
>>>>>>>>>> So far as I can tell, THREAD == NULL here.
>>>>>>>>> This is a very tricky part (not my invention):
>>>>>>>>> EXCEPTION_MARK contains an ExceptionMark constructor call
>> which
>>>>>> sets
>>>>>>>> __the_thread__ to Thread::current().
>>>>>>>>> I don't want to publish my opinion about this ��
>>>>>>>>>
>>>>>>>>> @David:
>>>>>>>>> Seems like this option is preferred over option 3
>>>>>>>> (possibly_add_compiler_threads part of webrev.02 and leave the
>>>>>>>> initialization as is).
>>>>>>>>> So when you're ok with it, I'll request a 2nd review from the
>> compiler
>>>>>> folks
>>>>>>>> (I should change the subject to contain RFR).
>>>>>>>>> Thanks,
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>>>>>> Sent: Montag, 28. Oktober 2019 05:04
>>>>>>>>>> To: Kim Barrett <kim.barrett at oracle.com>
>>>>>>>>>> Cc: Doerr, Martin <martin.doerr at sap.com>; Vladimir Kozlov
>>>>>>>>>> (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>;
>>>> hotspot-
>>>>>>>>>> compiler-dev at openjdk.java.net
>>>>>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
>>>> CompilerThread
>>>>>>>>>>
>>>>>>>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote:
>>>>>>>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes
>>>>>>>> <david.holmes at oracle.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote:
>>>>>>>>>>>>> Hi Kim,
>>>>>>>>>>>>> I didn't like using the OopStorage stuff directly, either. I just
>> have
>>>>>> not
>>>>>>>>>> seen how to allocate a global handle and add the oop later.
>>>>>>>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not
>> aware
>>>> of
>>>>>>>> that.
>>>>>>>>>>>>> So I can imagine 3 ways to implement it:
>>>>>>>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I
>> just
>>>>>> added
>>>>>>>>>> that to
>>>>>>>>
>>>>>>
>>>>
>> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
>>>>>>>>>> ev.01/
>>>>>>>>>>>>> We may want to improve that further by setting the handle
>>>> pointer
>>>>>> to
>>>>>>>>>> NULL and asserting that it is NULL before adding the new one.
>>>>>>>>>>>>> I had been concerned about NULLs in the array, but looks like
>>>> the
>>>>>>>>>> existing code can deal with that.
>>>>>>>>>>>> I think it would be cleaner to both destroy the global handle
>> and
>>>>>> NULL it
>>>>>>>> in
>>>>>>>>>> the array at the same time.
>>>>>>>>>>>> This comment
>>>>>>>>>>>>
>>>>>>>>>>>> 325         // Old j.l.Thread object can die here.
>>>>>>>>>>>>
>>>>>>>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct-
>>>>>>> threadObj()
>>>>>>>> so
>>>>>>>>>> can't "die" until that is also cleared during the actual termination
>>>>>> process.
>>>>>>>>>>> I think if there is such a thread here that it can't die, because the
>>>>>>>>>>> death predicate (the can_remove stuff) won't see that old
>> thread
>>>> as
>>>>>>>>>>> the last thread in _compiler2_objects.  That's what I meant by
>> this:
>>>>>>>>>>>
>>>>>>>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett
>>>> <kim.barrett at oracle.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>> I also think that here:
>>>>>>>>>>>>
>>>>>>>>>>>> 947         jobject thread_handle =
>>>>>>>> JNIHandles::make_global(thread_oop);
>>>>>>>>>>>> 948         _compiler2_objects[i] = thread_handle;
>>>>>>>>>>>>
>>>>>>>>>>>> should assert _compiler2_objects[i] == NULL.  Or if that isn't a
>>>> valid
>>>>>>>>>>>> assertion then I think there are other problems.
>>>>>>>>>>> I think either that comment about an old thread is wrong (and
>> the
>>>>>> NULL
>>>>>>>>>>> assertion I suggested is okay), or I think the whole mechanism
>> here
>>>>>>>>>>> has problems.  Or at least I was unable to figure out how it could
>>>>>> work...
>>>>>>>>>>>
>>>>>>>>>> I'm not following sorry. You can't assert NULL unless it's actually
>> set
>>>>>>>>>> to NULL which it presently isn't. But it could be set NULL as Martin
>>>>>>>>>> suggested:
>>>>>>>>>>
>>>>>>>>>> "We may want to improve that further by setting the handle
>> pointer
>>>> to
>>>>>>>>>> NULL and asserting that it is NULL before adding the new one."
>>>>>>>>>>
>>>>>>>>>> and which I also supported. But that aside once the delete_global
>>>> has
>>>>>>>>>> been called that JNIHandle no longer references the j.l.Thread
>> that it
>>>>>>>>>> did, at which point it is only reachable via the threadObj() of the
>>>>>>>>>> CompilerThread.
>>>>>>>>>>
>>>>>>>>>> David
>>>>>


More information about the hotspot-compiler-dev mailing list