JDK-8230459: Test failed to resume JVMCI CompilerThread
David Holmes
david.holmes at oracle.com
Tue Nov 5 12:33:21 UTC 2019
On 5/11/2019 6:40 pm, Doerr, Martin wrote:
> Hi David
>
>> I don't understand what you mean. If a compiler thread holds an oop, any
>> oop, it must hold it in a Handle to ensure it can't be gc'd.
>
> The problem is not related to gc.
> My change introduces destroy_global for the handles. This means that the OopStorage portion which has held the oop can get freed.
> However, other compiler threads are running concurrently. They may execute code which reads the oop from the handle which is freed by this thread. Reading stale data is not a problem here, but reading freed memory may assert or even crash in general.
> I can't see how OopStorage supports reading from handles which were freed by destroy_global.
With JVMCI compiler threads, each getting a new j.l.Thread oop that
lasts for the lifetime of that compiler thread (just like a regular
JavaThread) do we even actually need these arrays? I'm unclear what
purpose they serve when we are not trying to reuse the oops stored in
the array. ??
David
-----
> I think it would be safe if the freeing only occurred at safepoints, but I don't think this is the case.
>
> Best regards,
> Martin
>
>
>> -----Original Message-----
>> From: David Holmes <david.holmes at oracle.com>
>> Sent: Dienstag, 5. November 2019 00:19
>> To: Doerr, Martin <martin.doerr at sap.com>; dean.long at oracle.com; Kim
>> Barrett <kim.barrett at oracle.com>
>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>
>> On 4/11/2019 9:12 pm, Doerr, Martin wrote:
>>> Hi all,
>>>
>>> @Dean
>>>> changing can_remove() to CompileBroker::can_remove()?
>>> Yes. That would be an option.
>>>
>>> @Kim, David
>>> I think there's another problem with this implementation.
>>> It introduces a use-after-free pattern due to concurrency.
>>> Compiler threads may still read the oops from the handles after one of
>> them has called destroy_global until next safepoint. It doesn't matter which
>> values they get in this case, but the VM should not crash. I believe that
>> OopStorage allows freeing storage without safepoints, so this may be
>> unsafe. Right?
>>
>> I don't understand what you mean. If a compiler thread holds an oop, any
>> oop, it must hold it in a Handle to ensure it can't be gc'd.
>>
>> David
>>
>>> If so, I think replacing the oops in the handles (and keeping the handles
>> alive) would be better. And also much more simple.
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>>> -----Original Message-----
>>>> From: dean.long at oracle.com <dean.long at oracle.com>
>>>> Sent: Samstag, 2. November 2019 08:36
>>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>>>> <david.holmes at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
>>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>>>
>>>> Hi Martin,
>>>>
>>>> On 10/30/19 3:18 AM, Doerr, Martin wrote:
>>>>> Hi David,
>>>>>
>>>>>> I don't think factoring out CompileBroker::clear_compiler2_object
>> when
>>>>>> it is only used once was warranted, but that's call for compiler team to
>>>>>> make.
>>>>> I did that because _compiler2_objects is private and there's currently no
>>>> setter available.
>>>>> But let's see what the compiler folks think.
>>>>
>>>> how about changing can_remove() to CompileBroker::can_remove()?
>> Then
>>>> you
>>>> can access _compiler2_objects directly, right?
>>>>
>>>> dl
>>>>>> Otherwise changes seem fine and I have noted the use of the
>>>>>> MutexUnlocker as per your direct email.
>>>>> Thanks a lot for reviewing. It was not a trivial one
>>>>>
>>>>> You had noticed an incorrect usage of the CHECK macro. I've created a
>> new
>>>> bug for that:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8233193
>>>>> Would be great if you could take a look if that's what you meant and
>> made
>>>> adaptions if needed.
>>>>>
>>>>> Best regards,
>>>>> Martin
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>> Sent: Mittwoch, 30. Oktober 2019 05:47
>>>>>> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
>>>>>> <kim.barrett at oracle.com>
>>>>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
>>>>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
>> dev at openjdk.java.net;
>>>>>> David Holmes <David.Holmes at oracle.com>
>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>>>>>>
>>>>>> Hi Martin,
>>>>>>
>>>>>> On 29/10/2019 12:06 am, Doerr, Martin wrote:
>>>>>>> Hi David and Kim,
>>>>>>>
>>>>>>> I think it's easier to talk about code. So here's a new webrev:
>>>>>>>
>>>>>>
>>>>
>> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
>>>>>> ev.03/
>>>>>>
>>>>>> I don't think factoring out CompileBroker::clear_compiler2_object
>> when
>>>>>> it is only used once was warranted, but that's call for compiler team to
>>>>>> make. Otherwise changes seem fine and I have noted the use of the
>>>>>> MutexUnlocker as per your direct email.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> @Kim:
>>>>>>> Thanks for looking at the handle related parts. It's ok if you don't want
>> to
>>>>>> be a reviewer of the whole change.
>>>>>>>> I think it's weird that can_remove() is a predicate with optional side
>>>>>>>> effects. I think it would be simpler to have it be a pure predicate,
>>>>>>>> and have the one caller with do_it = true perform the updates. That
>>>>>>>> should include NULLing out the handle pointer (perhaps debug-only,
>>>> but
>>>>>>>> it doesn't cost much to cleanly maintain the data structure).
>>>>>>> Nevertheless, it has the advantage that it enforces the update to be
>>>>>> consistent.
>>>>>>> A caller could use it without holding the lock or mess it up otherwise.
>>>>>>> In addition, I don't what to change that as part of this fix.
>>>>>>>
>>>>>>>> So far as I can tell, THREAD == NULL here.
>>>>>>> This is a very tricky part (not my invention):
>>>>>>> EXCEPTION_MARK contains an ExceptionMark constructor call which
>>>> sets
>>>>>> __the_thread__ to Thread::current().
>>>>>>> I don't want to publish my opinion about this
>>>>>>>
>>>>>>> @David:
>>>>>>> Seems like this option is preferred over option 3
>>>>>> (possibly_add_compiler_threads part of webrev.02 and leave the
>>>>>> initialization as is).
>>>>>>> So when you're ok with it, I'll request a 2nd review from the compiler
>>>> folks
>>>>>> (I should change the subject to contain RFR).
>>>>>>> Thanks,
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>>>> Sent: Montag, 28. Oktober 2019 05:04
>>>>>>>> To: Kim Barrett <kim.barrett at oracle.com>
>>>>>>>> Cc: Doerr, Martin <martin.doerr at sap.com>; Vladimir Kozlov
>>>>>>>> (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>;
>> hotspot-
>>>>>>>> compiler-dev at openjdk.java.net
>>>>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
>> CompilerThread
>>>>>>>>
>>>>>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote:
>>>>>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes
>>>>>> <david.holmes at oracle.com>
>>>>>>>> wrote:
>>>>>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote:
>>>>>>>>>>> Hi Kim,
>>>>>>>>>>> I didn't like using the OopStorage stuff directly, either. I just have
>>>> not
>>>>>>>> seen how to allocate a global handle and add the oop later.
>>>>>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not aware
>> of
>>>>>> that.
>>>>>>>>>>> So I can imagine 3 ways to implement it:
>>>>>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just
>>>> added
>>>>>>>> that to
>>>>>>
>>>>
>> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
>>>>>>>> ev.01/
>>>>>>>>>>> We may want to improve that further by setting the handle
>> pointer
>>>> to
>>>>>>>> NULL and asserting that it is NULL before adding the new one.
>>>>>>>>>>> I had been concerned about NULLs in the array, but looks like
>> the
>>>>>>>> existing code can deal with that.
>>>>>>>>>> I think it would be cleaner to both destroy the global handle and
>>>> NULL it
>>>>>> in
>>>>>>>> the array at the same time.
>>>>>>>>>> This comment
>>>>>>>>>>
>>>>>>>>>> 325 // Old j.l.Thread object can die here.
>>>>>>>>>>
>>>>>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct-
>>>>> threadObj()
>>>>>> so
>>>>>>>> can't "die" until that is also cleared during the actual termination
>>>> process.
>>>>>>>>> I think if there is such a thread here that it can't die, because the
>>>>>>>>> death predicate (the can_remove stuff) won't see that old thread
>> as
>>>>>>>>> the last thread in _compiler2_objects. That's what I meant by this:
>>>>>>>>>
>>>>>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett
>> <kim.barrett at oracle.com>
>>>>>>>> wrote:
>>>>>>>>>> I also think that here:
>>>>>>>>>>
>>>>>>>>>> 947 jobject thread_handle =
>>>>>> JNIHandles::make_global(thread_oop);
>>>>>>>>>> 948 _compiler2_objects[i] = thread_handle;
>>>>>>>>>>
>>>>>>>>>> should assert _compiler2_objects[i] == NULL. Or if that isn't a
>> valid
>>>>>>>>>> assertion then I think there are other problems.
>>>>>>>>> I think either that comment about an old thread is wrong (and the
>>>> NULL
>>>>>>>>> assertion I suggested is okay), or I think the whole mechanism here
>>>>>>>>> has problems. Or at least I was unable to figure out how it could
>>>> work...
>>>>>>>>>
>>>>>>>> I'm not following sorry. You can't assert NULL unless it's actually set
>>>>>>>> to NULL which it presently isn't. But it could be set NULL as Martin
>>>>>>>> suggested:
>>>>>>>>
>>>>>>>> "We may want to improve that further by setting the handle pointer
>> to
>>>>>>>> NULL and asserting that it is NULL before adding the new one."
>>>>>>>>
>>>>>>>> and which I also supported. But that aside once the delete_global
>> has
>>>>>>>> been called that JNIHandle no longer references the j.l.Thread that it
>>>>>>>> did, at which point it is only reachable via the threadObj() of the
>>>>>>>> CompilerThread.
>>>>>>>>
>>>>>>>> David
>>>
More information about the hotspot-compiler-dev
mailing list