JDK-8230459: Test failed to resume JVMCI CompilerThread

Doerr, Martin martin.doerr at sap.com
Mon Nov 4 16:43:01 UTC 2019


Hi again,

it is possible to release the handles, but it comes with a much higher complexity:
http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webrev.04/

If we can replace oops in handles like
NativeAccess<>::oop_store((oop*)compiler2_object(i), thread_oop());
we only need to change one place (possibly_add_compiler_threads) and that's it.

Best regards,
Martin


> -----Original Message-----
> From: Doerr, Martin
> Sent: Montag, 4. November 2019 12:13
> To: 'dean.long at oracle.com' <dean.long at oracle.com>; David Holmes
> <david.holmes at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: RE: JDK-8230459: Test failed to resume JVMCI CompilerThread
> 
> Hi all,
> 
> @Dean
> > changing can_remove() to CompileBroker::can_remove()?
> Yes. That would be an option.
> 
> @Kim, David
> I think there's another problem with this implementation.
> It introduces a use-after-free pattern due to concurrency.
> Compiler threads may still read the oops from the handles after one of them
> has called destroy_global until next safepoint. It doesn't matter which values
> they get in this case, but the VM should not crash. I believe that OopStorage
> allows freeing storage without safepoints, so this may be unsafe. Right?
> 
> If so, I think replacing the oops in the handles (and keeping the handles alive)
> would be better. And also much more simple.
> 
> Best regards,
> Martin
> 
> 
> > -----Original Message-----
> > From: dean.long at oracle.com <dean.long at oracle.com>
> > Sent: Samstag, 2. November 2019 08:36
> > To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
> > <david.holmes at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
> > Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> > <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> > Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> >
> > Hi Martin,
> >
> > On 10/30/19 3:18 AM, Doerr, Martin wrote:
> > > Hi David,
> > >
> > >> I don't think factoring out CompileBroker::clear_compiler2_object when
> > >> it is only used once was warranted, but that's call for compiler team to
> > >> make.
> > > I did that because _compiler2_objects is private and there's currently no
> > setter available.
> > > But let's see what the compiler folks think.
> >
> > how about changing can_remove() to CompileBroker::can_remove()? Then
> > you
> > can access _compiler2_objects directly, right?
> >
> > dl
> > >> Otherwise changes seem fine and I have noted the use of the
> > >> MutexUnlocker as per your direct email.
> > > Thanks a lot for reviewing. It was not a trivial one ��
> > >
> > > You had noticed an incorrect usage of the CHECK macro. I've created a
> new
> > bug for that:
> > > https://bugs.openjdk.java.net/browse/JDK-8233193
> > > Would be great if you could take a look if that's what you meant and
> made
> > adaptions if needed.
> > >
> > > Best regards,
> > > Martin
> > >
> > >
> > >> -----Original Message-----
> > >> From: David Holmes <david.holmes at oracle.com>
> > >> Sent: Mittwoch, 30. Oktober 2019 05:47
> > >> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
> > >> <kim.barrett at oracle.com>
> > >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> > >> <vladimir.kozlov at oracle.com>; hotspot-compiler-
> dev at openjdk.java.net;
> > >> David Holmes <David.Holmes at oracle.com>
> > >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> > >>
> > >> Hi Martin,
> > >>
> > >> On 29/10/2019 12:06 am, Doerr, Martin wrote:
> > >>> Hi David and Kim,
> > >>>
> > >>> I think it's easier to talk about code. So here's a new webrev:
> > >>>
> > >>
> >
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> > >> ev.03/
> > >>
> > >> I don't think factoring out CompileBroker::clear_compiler2_object when
> > >> it is only used once was warranted, but that's call for compiler team to
> > >> make. Otherwise changes seem fine and I have noted the use of the
> > >> MutexUnlocker as per your direct email.
> > >>
> > >> Thanks,
> > >> David
> > >> -----
> > >>
> > >>> @Kim:
> > >>> Thanks for looking at the handle related parts. It's ok if you don't want
> to
> > >> be a reviewer of the whole change.
> > >>>> I think it's weird that can_remove() is a predicate with optional side
> > >>>> effects.  I think it would be simpler to have it be a pure predicate,
> > >>>> and have the one caller with do_it = true perform the updates.  That
> > >>>> should include NULLing out the handle pointer (perhaps debug-only,
> > but
> > >>>> it doesn't cost much to cleanly maintain the data structure).
> > >>> Nevertheless, it has the advantage that it enforces the update to be
> > >> consistent.
> > >>> A caller could use it without holding the lock or mess it up otherwise.
> > >>> In addition, I don't what to change that as part of this fix.
> > >>>
> > >>>> So far as I can tell, THREAD == NULL here.
> > >>> This is a very tricky part (not my invention):
> > >>> EXCEPTION_MARK contains an ExceptionMark constructor call which
> > sets
> > >> __the_thread__ to Thread::current().
> > >>> I don't want to publish my opinion about this ��
> > >>>
> > >>> @David:
> > >>> Seems like this option is preferred over option 3
> > >> (possibly_add_compiler_threads part of webrev.02 and leave the
> > >> initialization as is).
> > >>> So when you're ok with it, I'll request a 2nd review from the compiler
> > folks
> > >> (I should change the subject to contain RFR).
> > >>> Thanks,
> > >>> Martin
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: David Holmes <david.holmes at oracle.com>
> > >>>> Sent: Montag, 28. Oktober 2019 05:04
> > >>>> To: Kim Barrett <kim.barrett at oracle.com>
> > >>>> Cc: Doerr, Martin <martin.doerr at sap.com>; Vladimir Kozlov
> > >>>> (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>;
> hotspot-
> > >>>> compiler-dev at openjdk.java.net
> > >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
> CompilerThread
> > >>>>
> > >>>> On 28/10/2019 1:42 pm, Kim Barrett wrote:
> > >>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes
> > >> <david.holmes at oracle.com>
> > >>>> wrote:
> > >>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote:
> > >>>>>>> Hi Kim,
> > >>>>>>> I didn't like using the OopStorage stuff directly, either. I just have
> > not
> > >>>> seen how to allocate a global handle and add the oop later.
> > >>>>>>> Thanks for pointing me to JVMCI::make_global. I was not aware
> of
> > >> that.
> > >>>>>>> So I can imagine 3 ways to implement it:
> > >>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just
> > added
> > >>>> that to
> > >>
> >
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> > >>>> ev.01/
> > >>>>>>> We may want to improve that further by setting the handle
> pointer
> > to
> > >>>> NULL and asserting that it is NULL before adding the new one.
> > >>>>>>> I had been concerned about NULLs in the array, but looks like the
> > >>>> existing code can deal with that.
> > >>>>>> I think it would be cleaner to both destroy the global handle and
> > NULL it
> > >> in
> > >>>> the array at the same time.
> > >>>>>> This comment
> > >>>>>>
> > >>>>>> 325         // Old j.l.Thread object can die here.
> > >>>>>>
> > >>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct-
> > >threadObj()
> > >> so
> > >>>> can't "die" until that is also cleared during the actual termination
> > process.
> > >>>>> I think if there is such a thread here that it can't die, because the
> > >>>>> death predicate (the can_remove stuff) won't see that old thread as
> > >>>>> the last thread in _compiler2_objects.  That's what I meant by this:
> > >>>>>
> > >>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett
> <kim.barrett at oracle.com>
> > >>>> wrote:
> > >>>>>> I also think that here:
> > >>>>>>
> > >>>>>> 947         jobject thread_handle =
> > >> JNIHandles::make_global(thread_oop);
> > >>>>>> 948         _compiler2_objects[i] = thread_handle;
> > >>>>>>
> > >>>>>> should assert _compiler2_objects[i] == NULL.  Or if that isn't a valid
> > >>>>>> assertion then I think there are other problems.
> > >>>>> I think either that comment about an old thread is wrong (and the
> > NULL
> > >>>>> assertion I suggested is okay), or I think the whole mechanism here
> > >>>>> has problems.  Or at least I was unable to figure out how it could
> > work...
> > >>>>>
> > >>>> I'm not following sorry. You can't assert NULL unless it's actually set
> > >>>> to NULL which it presently isn't. But it could be set NULL as Martin
> > >>>> suggested:
> > >>>>
> > >>>> "We may want to improve that further by setting the handle pointer
> to
> > >>>> NULL and asserting that it is NULL before adding the new one."
> > >>>>
> > >>>> and which I also supported. But that aside once the delete_global has
> > >>>> been called that JNIHandle no longer references the j.l.Thread that it
> > >>>> did, at which point it is only reachable via the threadObj() of the
> > >>>> CompilerThread.
> > >>>>
> > >>>> David



More information about the hotspot-compiler-dev mailing list