JDK-8230459: Test failed to resume JVMCI CompilerThread
Doerr, Martin
martin.doerr at sap.com
Tue Nov 5 08:40:02 UTC 2019
Hi David
> I don't understand what you mean. If a compiler thread holds an oop, any
> oop, it must hold it in a Handle to ensure it can't be gc'd.
The problem is not related to gc.
My change introduces destroy_global for the handles. This means that the OopStorage portion which has held the oop can get freed.
However, other compiler threads are running concurrently. They may execute code which reads the oop from the handle which is freed by this thread. Reading stale data is not a problem here, but reading freed memory may assert or even crash in general.
I can't see how OopStorage supports reading from handles which were freed by destroy_global.
I think it would be safe if the freeing only occurred at safepoints, but I don't think this is the case.
Best regards,
Martin
> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Dienstag, 5. November 2019 00:19
> To: Doerr, Martin <martin.doerr at sap.com>; dean.long at oracle.com; Kim
> Barrett <kim.barrett at oracle.com>
> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>
> On 4/11/2019 9:12 pm, Doerr, Martin wrote:
> > Hi all,
> >
> > @Dean
> >> changing can_remove() to CompileBroker::can_remove()?
> > Yes. That would be an option.
> >
> > @Kim, David
> > I think there's another problem with this implementation.
> > It introduces a use-after-free pattern due to concurrency.
> > Compiler threads may still read the oops from the handles after one of
> them has called destroy_global until next safepoint. It doesn't matter which
> values they get in this case, but the VM should not crash. I believe that
> OopStorage allows freeing storage without safepoints, so this may be
> unsafe. Right?
>
> I don't understand what you mean. If a compiler thread holds an oop, any
> oop, it must hold it in a Handle to ensure it can't be gc'd.
>
> David
>
> > If so, I think replacing the oops in the handles (and keeping the handles
> alive) would be better. And also much more simple.
> >
> > Best regards,
> > Martin
> >
> >
> >> -----Original Message-----
> >> From: dean.long at oracle.com <dean.long at oracle.com>
> >> Sent: Samstag, 2. November 2019 08:36
> >> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
> >> <david.holmes at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
> >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> >> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> >>
> >> Hi Martin,
> >>
> >> On 10/30/19 3:18 AM, Doerr, Martin wrote:
> >>> Hi David,
> >>>
> >>>> I don't think factoring out CompileBroker::clear_compiler2_object
> when
> >>>> it is only used once was warranted, but that's call for compiler team to
> >>>> make.
> >>> I did that because _compiler2_objects is private and there's currently no
> >> setter available.
> >>> But let's see what the compiler folks think.
> >>
> >> how about changing can_remove() to CompileBroker::can_remove()?
> Then
> >> you
> >> can access _compiler2_objects directly, right?
> >>
> >> dl
> >>>> Otherwise changes seem fine and I have noted the use of the
> >>>> MutexUnlocker as per your direct email.
> >>> Thanks a lot for reviewing. It was not a trivial one
> >>>
> >>> You had noticed an incorrect usage of the CHECK macro. I've created a
> new
> >> bug for that:
> >>> https://bugs.openjdk.java.net/browse/JDK-8233193
> >>> Would be great if you could take a look if that's what you meant and
> made
> >> adaptions if needed.
> >>>
> >>> Best regards,
> >>> Martin
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: David Holmes <david.holmes at oracle.com>
> >>>> Sent: Mittwoch, 30. Oktober 2019 05:47
> >>>> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
> >>>> <kim.barrett at oracle.com>
> >>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> >>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
> dev at openjdk.java.net;
> >>>> David Holmes <David.Holmes at oracle.com>
> >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> >>>>
> >>>> Hi Martin,
> >>>>
> >>>> On 29/10/2019 12:06 am, Doerr, Martin wrote:
> >>>>> Hi David and Kim,
> >>>>>
> >>>>> I think it's easier to talk about code. So here's a new webrev:
> >>>>>
> >>>>
> >>
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> >>>> ev.03/
> >>>>
> >>>> I don't think factoring out CompileBroker::clear_compiler2_object
> when
> >>>> it is only used once was warranted, but that's call for compiler team to
> >>>> make. Otherwise changes seem fine and I have noted the use of the
> >>>> MutexUnlocker as per your direct email.
> >>>>
> >>>> Thanks,
> >>>> David
> >>>> -----
> >>>>
> >>>>> @Kim:
> >>>>> Thanks for looking at the handle related parts. It's ok if you don't want
> to
> >>>> be a reviewer of the whole change.
> >>>>>> I think it's weird that can_remove() is a predicate with optional side
> >>>>>> effects. I think it would be simpler to have it be a pure predicate,
> >>>>>> and have the one caller with do_it = true perform the updates. That
> >>>>>> should include NULLing out the handle pointer (perhaps debug-only,
> >> but
> >>>>>> it doesn't cost much to cleanly maintain the data structure).
> >>>>> Nevertheless, it has the advantage that it enforces the update to be
> >>>> consistent.
> >>>>> A caller could use it without holding the lock or mess it up otherwise.
> >>>>> In addition, I don't what to change that as part of this fix.
> >>>>>
> >>>>>> So far as I can tell, THREAD == NULL here.
> >>>>> This is a very tricky part (not my invention):
> >>>>> EXCEPTION_MARK contains an ExceptionMark constructor call which
> >> sets
> >>>> __the_thread__ to Thread::current().
> >>>>> I don't want to publish my opinion about this
> >>>>>
> >>>>> @David:
> >>>>> Seems like this option is preferred over option 3
> >>>> (possibly_add_compiler_threads part of webrev.02 and leave the
> >>>> initialization as is).
> >>>>> So when you're ok with it, I'll request a 2nd review from the compiler
> >> folks
> >>>> (I should change the subject to contain RFR).
> >>>>> Thanks,
> >>>>> Martin
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: David Holmes <david.holmes at oracle.com>
> >>>>>> Sent: Montag, 28. Oktober 2019 05:04
> >>>>>> To: Kim Barrett <kim.barrett at oracle.com>
> >>>>>> Cc: Doerr, Martin <martin.doerr at sap.com>; Vladimir Kozlov
> >>>>>> (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>;
> hotspot-
> >>>>>> compiler-dev at openjdk.java.net
> >>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
> CompilerThread
> >>>>>>
> >>>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote:
> >>>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes
> >>>> <david.holmes at oracle.com>
> >>>>>> wrote:
> >>>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote:
> >>>>>>>>> Hi Kim,
> >>>>>>>>> I didn't like using the OopStorage stuff directly, either. I just have
> >> not
> >>>>>> seen how to allocate a global handle and add the oop later.
> >>>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not aware
> of
> >>>> that.
> >>>>>>>>> So I can imagine 3 ways to implement it:
> >>>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just
> >> added
> >>>>>> that to
> >>>>
> >>
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> >>>>>> ev.01/
> >>>>>>>>> We may want to improve that further by setting the handle
> pointer
> >> to
> >>>>>> NULL and asserting that it is NULL before adding the new one.
> >>>>>>>>> I had been concerned about NULLs in the array, but looks like
> the
> >>>>>> existing code can deal with that.
> >>>>>>>> I think it would be cleaner to both destroy the global handle and
> >> NULL it
> >>>> in
> >>>>>> the array at the same time.
> >>>>>>>> This comment
> >>>>>>>>
> >>>>>>>> 325 // Old j.l.Thread object can die here.
> >>>>>>>>
> >>>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct-
> >>> threadObj()
> >>>> so
> >>>>>> can't "die" until that is also cleared during the actual termination
> >> process.
> >>>>>>> I think if there is such a thread here that it can't die, because the
> >>>>>>> death predicate (the can_remove stuff) won't see that old thread
> as
> >>>>>>> the last thread in _compiler2_objects. That's what I meant by this:
> >>>>>>>
> >>>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett
> <kim.barrett at oracle.com>
> >>>>>> wrote:
> >>>>>>>> I also think that here:
> >>>>>>>>
> >>>>>>>> 947 jobject thread_handle =
> >>>> JNIHandles::make_global(thread_oop);
> >>>>>>>> 948 _compiler2_objects[i] = thread_handle;
> >>>>>>>>
> >>>>>>>> should assert _compiler2_objects[i] == NULL. Or if that isn't a
> valid
> >>>>>>>> assertion then I think there are other problems.
> >>>>>>> I think either that comment about an old thread is wrong (and the
> >> NULL
> >>>>>>> assertion I suggested is okay), or I think the whole mechanism here
> >>>>>>> has problems. Or at least I was unable to figure out how it could
> >> work...
> >>>>>>>
> >>>>>> I'm not following sorry. You can't assert NULL unless it's actually set
> >>>>>> to NULL which it presently isn't. But it could be set NULL as Martin
> >>>>>> suggested:
> >>>>>>
> >>>>>> "We may want to improve that further by setting the handle pointer
> to
> >>>>>> NULL and asserting that it is NULL before adding the new one."
> >>>>>>
> >>>>>> and which I also supported. But that aside once the delete_global
> has
> >>>>>> been called that JNIHandle no longer references the j.l.Thread that it
> >>>>>> did, at which point it is only reachable via the threadObj() of the
> >>>>>> CompilerThread.
> >>>>>>
> >>>>>> David
> >
More information about the hotspot-compiler-dev
mailing list