JDK-8230459: Test failed to resume JVMCI CompilerThread
Doerr, Martin
martin.doerr at sap.com
Mon Nov 4 11:12:45 UTC 2019
Hi all,
@Dean
> changing can_remove() to CompileBroker::can_remove()?
Yes. That would be an option.
@Kim, David
I think there's another problem with this implementation.
It introduces a use-after-free pattern due to concurrency.
Compiler threads may still read the oops from the handles after one of them has called destroy_global until next safepoint. It doesn't matter which values they get in this case, but the VM should not crash. I believe that OopStorage allows freeing storage without safepoints, so this may be unsafe. Right?
If so, I think replacing the oops in the handles (and keeping the handles alive) would be better. And also much more simple.
Best regards,
Martin
> -----Original Message-----
> From: dean.long at oracle.com <dean.long at oracle.com>
> Sent: Samstag, 2. November 2019 08:36
> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
> <david.holmes at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
>
> Hi Martin,
>
> On 10/30/19 3:18 AM, Doerr, Martin wrote:
> > Hi David,
> >
> >> I don't think factoring out CompileBroker::clear_compiler2_object when
> >> it is only used once was warranted, but that's call for compiler team to
> >> make.
> > I did that because _compiler2_objects is private and there's currently no
> setter available.
> > But let's see what the compiler folks think.
>
> how about changing can_remove() to CompileBroker::can_remove()? Then
> you
> can access _compiler2_objects directly, right?
>
> dl
> >> Otherwise changes seem fine and I have noted the use of the
> >> MutexUnlocker as per your direct email.
> > Thanks a lot for reviewing. It was not a trivial one
> >
> > You had noticed an incorrect usage of the CHECK macro. I've created a new
> bug for that:
> > https://bugs.openjdk.java.net/browse/JDK-8233193
> > Would be great if you could take a look if that's what you meant and made
> adaptions if needed.
> >
> > Best regards,
> > Martin
> >
> >
> >> -----Original Message-----
> >> From: David Holmes <david.holmes at oracle.com>
> >> Sent: Mittwoch, 30. Oktober 2019 05:47
> >> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
> >> <kim.barrett at oracle.com>
> >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> >> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net;
> >> David Holmes <David.Holmes at oracle.com>
> >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> >>
> >> Hi Martin,
> >>
> >> On 29/10/2019 12:06 am, Doerr, Martin wrote:
> >>> Hi David and Kim,
> >>>
> >>> I think it's easier to talk about code. So here's a new webrev:
> >>>
> >>
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> >> ev.03/
> >>
> >> I don't think factoring out CompileBroker::clear_compiler2_object when
> >> it is only used once was warranted, but that's call for compiler team to
> >> make. Otherwise changes seem fine and I have noted the use of the
> >> MutexUnlocker as per your direct email.
> >>
> >> Thanks,
> >> David
> >> -----
> >>
> >>> @Kim:
> >>> Thanks for looking at the handle related parts. It's ok if you don't want to
> >> be a reviewer of the whole change.
> >>>> I think it's weird that can_remove() is a predicate with optional side
> >>>> effects. I think it would be simpler to have it be a pure predicate,
> >>>> and have the one caller with do_it = true perform the updates. That
> >>>> should include NULLing out the handle pointer (perhaps debug-only,
> but
> >>>> it doesn't cost much to cleanly maintain the data structure).
> >>> Nevertheless, it has the advantage that it enforces the update to be
> >> consistent.
> >>> A caller could use it without holding the lock or mess it up otherwise.
> >>> In addition, I don't what to change that as part of this fix.
> >>>
> >>>> So far as I can tell, THREAD == NULL here.
> >>> This is a very tricky part (not my invention):
> >>> EXCEPTION_MARK contains an ExceptionMark constructor call which
> sets
> >> __the_thread__ to Thread::current().
> >>> I don't want to publish my opinion about this
> >>>
> >>> @David:
> >>> Seems like this option is preferred over option 3
> >> (possibly_add_compiler_threads part of webrev.02 and leave the
> >> initialization as is).
> >>> So when you're ok with it, I'll request a 2nd review from the compiler
> folks
> >> (I should change the subject to contain RFR).
> >>> Thanks,
> >>> Martin
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: David Holmes <david.holmes at oracle.com>
> >>>> Sent: Montag, 28. Oktober 2019 05:04
> >>>> To: Kim Barrett <kim.barrett at oracle.com>
> >>>> Cc: Doerr, Martin <martin.doerr at sap.com>; Vladimir Kozlov
> >>>> (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>; hotspot-
> >>>> compiler-dev at openjdk.java.net
> >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> >>>>
> >>>> On 28/10/2019 1:42 pm, Kim Barrett wrote:
> >>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes
> >> <david.holmes at oracle.com>
> >>>> wrote:
> >>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote:
> >>>>>>> Hi Kim,
> >>>>>>> I didn't like using the OopStorage stuff directly, either. I just have
> not
> >>>> seen how to allocate a global handle and add the oop later.
> >>>>>>> Thanks for pointing me to JVMCI::make_global. I was not aware of
> >> that.
> >>>>>>> So I can imagine 3 ways to implement it:
> >>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just
> added
> >>>> that to
> >>
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> >>>> ev.01/
> >>>>>>> We may want to improve that further by setting the handle pointer
> to
> >>>> NULL and asserting that it is NULL before adding the new one.
> >>>>>>> I had been concerned about NULLs in the array, but looks like the
> >>>> existing code can deal with that.
> >>>>>> I think it would be cleaner to both destroy the global handle and
> NULL it
> >> in
> >>>> the array at the same time.
> >>>>>> This comment
> >>>>>>
> >>>>>> 325 // Old j.l.Thread object can die here.
> >>>>>>
> >>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct-
> >threadObj()
> >> so
> >>>> can't "die" until that is also cleared during the actual termination
> process.
> >>>>> I think if there is such a thread here that it can't die, because the
> >>>>> death predicate (the can_remove stuff) won't see that old thread as
> >>>>> the last thread in _compiler2_objects. That's what I meant by this:
> >>>>>
> >>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett <kim.barrett at oracle.com>
> >>>> wrote:
> >>>>>> I also think that here:
> >>>>>>
> >>>>>> 947 jobject thread_handle =
> >> JNIHandles::make_global(thread_oop);
> >>>>>> 948 _compiler2_objects[i] = thread_handle;
> >>>>>>
> >>>>>> should assert _compiler2_objects[i] == NULL. Or if that isn't a valid
> >>>>>> assertion then I think there are other problems.
> >>>>> I think either that comment about an old thread is wrong (and the
> NULL
> >>>>> assertion I suggested is okay), or I think the whole mechanism here
> >>>>> has problems. Or at least I was unable to figure out how it could
> work...
> >>>>>
> >>>> I'm not following sorry. You can't assert NULL unless it's actually set
> >>>> to NULL which it presently isn't. But it could be set NULL as Martin
> >>>> suggested:
> >>>>
> >>>> "We may want to improve that further by setting the handle pointer to
> >>>> NULL and asserting that it is NULL before adding the new one."
> >>>>
> >>>> and which I also supported. But that aside once the delete_global has
> >>>> been called that JNIHandle no longer references the j.l.Thread that it
> >>>> did, at which point it is only reachable via the threadObj() of the
> >>>> CompilerThread.
> >>>>
> >>>> David
More information about the hotspot-compiler-dev
mailing list