JDK-8230459: Test failed to resume JVMCI CompilerThread

Doerr, Martin martin.doerr at sap.com
Tue Nov 5 11:07:17 UTC 2019


Btw. this seems to be the issue Vladimir has found by running more tests:
https://bugs.openjdk.java.net/secure/attachment/85306/hs_err_pid93932.log


> -----Original Message-----
> From: Doerr, Martin
> Sent: Dienstag, 5. November 2019 09:40
> To: David Holmes <david.holmes at oracle.com>; dean.long at oracle.com; Kim
> Barrett <kim.barrett at oracle.com>
> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: RE: JDK-8230459: Test failed to resume JVMCI CompilerThread
> 
> Hi David
> 
> > I don't understand what you mean. If a compiler thread holds an oop, any
> > oop, it must hold it in a Handle to ensure it can't be gc'd.
> 
> The problem is not related to gc.
> My change introduces destroy_global for the handles. This means that the
> OopStorage portion which has held the oop can get freed.
> However, other compiler threads are running concurrently. They may
> execute code which reads the oop from the handle which is freed by this
> thread. Reading stale data is not a problem here, but reading freed memory
> may assert or even crash in general.
> I can't see how OopStorage supports reading from handles which were freed
> by destroy_global.
> 
> I think it would be safe if the freeing only occurred at safepoints, but I don't
> think this is the case.
> 
> Best regards,
> Martin
> 
> 
> > -----Original Message-----
> > From: David Holmes <david.holmes at oracle.com>
> > Sent: Dienstag, 5. November 2019 00:19
> > To: Doerr, Martin <martin.doerr at sap.com>; dean.long at oracle.com; Kim
> > Barrett <kim.barrett at oracle.com>
> > Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> > <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net
> > Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> >
> > On 4/11/2019 9:12 pm, Doerr, Martin wrote:
> > > Hi all,
> > >
> > > @Dean
> > >> changing can_remove() to CompileBroker::can_remove()?
> > > Yes. That would be an option.
> > >
> > > @Kim, David
> > > I think there's another problem with this implementation.
> > > It introduces a use-after-free pattern due to concurrency.
> > > Compiler threads may still read the oops from the handles after one of
> > them has called destroy_global until next safepoint. It doesn't matter
> which
> > values they get in this case, but the VM should not crash. I believe that
> > OopStorage allows freeing storage without safepoints, so this may be
> > unsafe. Right?
> >
> > I don't understand what you mean. If a compiler thread holds an oop, any
> > oop, it must hold it in a Handle to ensure it can't be gc'd.
> >
> > David
> >
> > > If so, I think replacing the oops in the handles (and keeping the handles
> > alive) would be better. And also much more simple.
> > >
> > > Best regards,
> > > Martin
> > >
> > >
> > >> -----Original Message-----
> > >> From: dean.long at oracle.com <dean.long at oracle.com>
> > >> Sent: Samstag, 2. November 2019 08:36
> > >> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
> > >> <david.holmes at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
> > >> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> > >> <vladimir.kozlov at oracle.com>; hotspot-compiler-
> dev at openjdk.java.net
> > >> Subject: Re: JDK-8230459: Test failed to resume JVMCI CompilerThread
> > >>
> > >> Hi Martin,
> > >>
> > >> On 10/30/19 3:18 AM, Doerr, Martin wrote:
> > >>> Hi David,
> > >>>
> > >>>> I don't think factoring out CompileBroker::clear_compiler2_object
> > when
> > >>>> it is only used once was warranted, but that's call for compiler team to
> > >>>> make.
> > >>> I did that because _compiler2_objects is private and there's currently
> no
> > >> setter available.
> > >>> But let's see what the compiler folks think.
> > >>
> > >> how about changing can_remove() to CompileBroker::can_remove()?
> > Then
> > >> you
> > >> can access _compiler2_objects directly, right?
> > >>
> > >> dl
> > >>>> Otherwise changes seem fine and I have noted the use of the
> > >>>> MutexUnlocker as per your direct email.
> > >>> Thanks a lot for reviewing. It was not a trivial one ��
> > >>>
> > >>> You had noticed an incorrect usage of the CHECK macro. I've created a
> > new
> > >> bug for that:
> > >>> https://bugs.openjdk.java.net/browse/JDK-8233193
> > >>> Would be great if you could take a look if that's what you meant and
> > made
> > >> adaptions if needed.
> > >>>
> > >>> Best regards,
> > >>> Martin
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: David Holmes <david.holmes at oracle.com>
> > >>>> Sent: Mittwoch, 30. Oktober 2019 05:47
> > >>>> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett
> > >>>> <kim.barrett at oracle.com>
> > >>>> Cc: Vladimir Kozlov (vladimir.kozlov at oracle.com)
> > >>>> <vladimir.kozlov at oracle.com>; hotspot-compiler-
> > dev at openjdk.java.net;
> > >>>> David Holmes <David.Holmes at oracle.com>
> > >>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
> CompilerThread
> > >>>>
> > >>>> Hi Martin,
> > >>>>
> > >>>> On 29/10/2019 12:06 am, Doerr, Martin wrote:
> > >>>>> Hi David and Kim,
> > >>>>>
> > >>>>> I think it's easier to talk about code. So here's a new webrev:
> > >>>>>
> > >>>>
> > >>
> >
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> > >>>> ev.03/
> > >>>>
> > >>>> I don't think factoring out CompileBroker::clear_compiler2_object
> > when
> > >>>> it is only used once was warranted, but that's call for compiler team to
> > >>>> make. Otherwise changes seem fine and I have noted the use of the
> > >>>> MutexUnlocker as per your direct email.
> > >>>>
> > >>>> Thanks,
> > >>>> David
> > >>>> -----
> > >>>>
> > >>>>> @Kim:
> > >>>>> Thanks for looking at the handle related parts. It's ok if you don't
> want
> > to
> > >>>> be a reviewer of the whole change.
> > >>>>>> I think it's weird that can_remove() is a predicate with optional side
> > >>>>>> effects.  I think it would be simpler to have it be a pure predicate,
> > >>>>>> and have the one caller with do_it = true perform the updates.
> That
> > >>>>>> should include NULLing out the handle pointer (perhaps debug-
> only,
> > >> but
> > >>>>>> it doesn't cost much to cleanly maintain the data structure).
> > >>>>> Nevertheless, it has the advantage that it enforces the update to be
> > >>>> consistent.
> > >>>>> A caller could use it without holding the lock or mess it up otherwise.
> > >>>>> In addition, I don't what to change that as part of this fix.
> > >>>>>
> > >>>>>> So far as I can tell, THREAD == NULL here.
> > >>>>> This is a very tricky part (not my invention):
> > >>>>> EXCEPTION_MARK contains an ExceptionMark constructor call which
> > >> sets
> > >>>> __the_thread__ to Thread::current().
> > >>>>> I don't want to publish my opinion about this ��
> > >>>>>
> > >>>>> @David:
> > >>>>> Seems like this option is preferred over option 3
> > >>>> (possibly_add_compiler_threads part of webrev.02 and leave the
> > >>>> initialization as is).
> > >>>>> So when you're ok with it, I'll request a 2nd review from the
> compiler
> > >> folks
> > >>>> (I should change the subject to contain RFR).
> > >>>>> Thanks,
> > >>>>> Martin
> > >>>>>
> > >>>>>
> > >>>>>> -----Original Message-----
> > >>>>>> From: David Holmes <david.holmes at oracle.com>
> > >>>>>> Sent: Montag, 28. Oktober 2019 05:04
> > >>>>>> To: Kim Barrett <kim.barrett at oracle.com>
> > >>>>>> Cc: Doerr, Martin <martin.doerr at sap.com>; Vladimir Kozlov
> > >>>>>> (vladimir.kozlov at oracle.com) <vladimir.kozlov at oracle.com>;
> > hotspot-
> > >>>>>> compiler-dev at openjdk.java.net
> > >>>>>> Subject: Re: JDK-8230459: Test failed to resume JVMCI
> > CompilerThread
> > >>>>>>
> > >>>>>> On 28/10/2019 1:42 pm, Kim Barrett wrote:
> > >>>>>>>> On Oct 27, 2019, at 6:04 PM, David Holmes
> > >>>> <david.holmes at oracle.com>
> > >>>>>> wrote:
> > >>>>>>>> On 24/10/2019 12:47 am, Doerr, Martin wrote:
> > >>>>>>>>> Hi Kim,
> > >>>>>>>>> I didn't like using the OopStorage stuff directly, either. I just
> have
> > >> not
> > >>>>>> seen how to allocate a global handle and add the oop later.
> > >>>>>>>>> Thanks for pointing me to JVMCI::make_global. I was not
> aware
> > of
> > >>>> that.
> > >>>>>>>>> So I can imagine 3 ways to implement it:
> > >>>>>>>>> 1. Use JNIHandles::destroy_global to release the handles. I just
> > >> added
> > >>>>>> that to
> > >>>>
> > >>
> >
> http://cr.openjdk.java.net/~mdoerr/8230459_JVMCI_thread_objects/webr
> > >>>>>> ev.01/
> > >>>>>>>>> We may want to improve that further by setting the handle
> > pointer
> > >> to
> > >>>>>> NULL and asserting that it is NULL before adding the new one.
> > >>>>>>>>> I had been concerned about NULLs in the array, but looks like
> > the
> > >>>>>> existing code can deal with that.
> > >>>>>>>> I think it would be cleaner to both destroy the global handle and
> > >> NULL it
> > >>>> in
> > >>>>>> the array at the same time.
> > >>>>>>>> This comment
> > >>>>>>>>
> > >>>>>>>> 325         // Old j.l.Thread object can die here.
> > >>>>>>>>
> > >>>>>>>> Isn't quite accurate. The j.l.Thread is still referenced via ct-
> > >>> threadObj()
> > >>>> so
> > >>>>>> can't "die" until that is also cleared during the actual termination
> > >> process.
> > >>>>>>> I think if there is such a thread here that it can't die, because the
> > >>>>>>> death predicate (the can_remove stuff) won't see that old thread
> > as
> > >>>>>>> the last thread in _compiler2_objects.  That's what I meant by
> this:
> > >>>>>>>
> > >>>>>>>> On Oct 25, 2019, at 4:05 PM, Kim Barrett
> > <kim.barrett at oracle.com>
> > >>>>>> wrote:
> > >>>>>>>> I also think that here:
> > >>>>>>>>
> > >>>>>>>> 947         jobject thread_handle =
> > >>>> JNIHandles::make_global(thread_oop);
> > >>>>>>>> 948         _compiler2_objects[i] = thread_handle;
> > >>>>>>>>
> > >>>>>>>> should assert _compiler2_objects[i] == NULL.  Or if that isn't a
> > valid
> > >>>>>>>> assertion then I think there are other problems.
> > >>>>>>> I think either that comment about an old thread is wrong (and the
> > >> NULL
> > >>>>>>> assertion I suggested is okay), or I think the whole mechanism
> here
> > >>>>>>> has problems.  Or at least I was unable to figure out how it could
> > >> work...
> > >>>>>>>
> > >>>>>> I'm not following sorry. You can't assert NULL unless it's actually set
> > >>>>>> to NULL which it presently isn't. But it could be set NULL as Martin
> > >>>>>> suggested:
> > >>>>>>
> > >>>>>> "We may want to improve that further by setting the handle
> pointer
> > to
> > >>>>>> NULL and asserting that it is NULL before adding the new one."
> > >>>>>>
> > >>>>>> and which I also supported. But that aside once the delete_global
> > has
> > >>>>>> been called that JNIHandle no longer references the j.l.Thread that
> it
> > >>>>>> did, at which point it is only reachable via the threadObj() of the
> > >>>>>> CompilerThread.
> > >>>>>>
> > >>>>>> David
> > >


More information about the hotspot-compiler-dev mailing list