RFR (M) 8212160: JVMTI agent crashes with "assert(_value != 0LL) failed: resolving NULL _value"
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Tue Dec 3 13:35:58 UTC 2019
On 12/3/19 8:31 AM, David Holmes wrote:
> On 3/12/2019 11:08 pm, coleen.phillimore at oracle.com wrote:
>>
>>
>> On 12/2/19 11:52 PM, David Holmes wrote:
>>> Hi Coleen,
>>>
>>> On 3/12/2019 12:43 am, coleen.phillimore at oracle.com wrote:
>>>>
>>>>
>>>> On 11/26/19 7:03 PM, David Holmes wrote:
>>>>> (adding runtime as well)
>>>>>
>>>>> Hi Coleen,
>>>>>
>>>>> On 27/11/2019 12:22 am, coleen.phillimore at oracle.com wrote:
>>>>>> Summary: Add local deferred event list to thread to post events
>>>>>> outside CodeCache_lock.
>>>>>>
>>>>>> This patch builds on the patch for JDK-8173361. With this patch,
>>>>>> I made the JvmtiDeferredEventQueue an instance class (not
>>>>>> AllStatic) and have one per thread. The CodeBlob event that used
>>>>>> to drop the CodeCache_lock and raced with the sweeper thread,
>>>>>> adds the events it wants to post to its thread local list, and
>>>>>> processes it outside the lock. The list is walked in GC and by
>>>>>> the sweeper to keep the nmethods from being unloaded and zombied,
>>>>>> respectively.
>>>>>
>>>>> Sorry I don't understand why we would want/need a deferred event
>>>>> queue for every JavaThread? Isn't this only relevant for
>>>>> non-JavaThreads that need to have the ServiceThread process the
>>>>> deferred event?
>>>>
>>>> I thought I'd written this in the bug but I had only discussed this
>>>> with Erik. I've added a comment to the bug to explain why I added
>>>> the per-JavaThread queue. In order to process these events after
>>>> the CodeCache_lock is dropped, I have to queue them somewhere safe.
>>>> The ServiceThread queue is safe, *but* the ServiceThread can't keep
>>>> up with the events, especially from this test case. So the test
>>>> case gets a native OOM.
>>>>
>>>> So I've added the safe queue as a field to each JavaThread because
>>>> multiple JavaThreads could be posting these events at the same
>>>> time, and there didn't seem to be a better safe place to cache
>>>> them, without adding another layer of queuing code.
>>>
>>> I think I'm getting the picture now. At the time the events are
>>> generated we can't post them directly because the current thread is
>>> inside compiler code. Hence the events must be deferred. Using the
>>> ServiceThread to handle the deferred events is one way to deal with
>>> this - but it can't keep up in this scenario. So instead we store
>>> the events in the current thread and when the current thread returns
>>> to code where it is safe to post the events, it does so itself. Is
>>> that generally correct?
>>
>> Yes.
>>>
>>> I admit I'm not keen on adding this additional field per-thread just
>>> for a temporary usage. Some kind of stack allocated helper would be
>>> preferable, but would need to be passed through the call chain so
>>> that the events could be added to it.
>>
>> Right, and the GC and nmethods_do has to find it somehow. It wasn't
>> my first choice of where to put it also because there is too many
>> things in JavaThread. Might be time for a future cleanup of Thread.
>
> I see.
>
>>>
>>> Also I'm not clear why we aggressively delete the _jvmti_event_queue
>>> after posting the events. I'd be worried about the overhead we are
>>> introducing for creating and deleting this queue. When the
>>> JvmtiDeferredEventQueue data structure was intended only for use by
>>> the ServiceThread its dynamic node allocation may have made more
>>> sense. But now that seems like a liability to me - if
>>> JvmtiDeferredEvents could be linked directly we wouldn't need
>>> dynamic nodes, nor dynamic per-thread queues (just a per-thread
>>> pointer).
>>
>> I'm not following. The queue is for multiple events that might be
>> posted while in the CodeCache_lock, so they need to be in order and
>> linked together. While we post them and take them off, if the
>> callback safepoints (maybe calls back into the JVM), we don't want to
>> have GC or nmethods_do walk the one that's been posted already. So a
>> queue seems to make sense.
>
> Yes but you can make a queue just by having each event have a _next
> pointer, rather than dynamically creating nodes to hold the event.
> Each event is its own queue node implicitly.
>
>> One thing that I experimented with was to have the ServiceThread take
>> ownership of the queue in it's local thread queue and post them all,
>> which could be a future enhancement. It didn't help my OOM situation.
>
> Your OOM situation seems to be a basic case of overwhelming the
> ServiceThread. A single serviceThread will always have a limit on how
> many events it can handle. Maybe this test is being too unrealistic in
> its expectations of the current design?
I think the JVMTI API where you can generate an COMPILED_METHOD_LOAD for
all the events in the queue is going to be overwhelming unless it waits
for the events to be posted.
>
>> Deleting the queue after all the events are posted allows
>> JavaThread::oops_do and nmethods_do only a null check to deal with
>> this jvmti wart.
>
> If the nodes are not dynamically allocated you don't need to delete
> you just set the queue-head pointer to NULL - actually it will already
> be NULL once the last event has been processed.
I could revisit the data structure as a future RFE. The goal was to
reuse code that's already there, and I don't think there's a significant
difference in performance. I did some measurement of the stress case
and the times were equivalent, actually better in the new code.
Thanks,
Coleen
>
> David
> -----
>
>> Thanks,
>> Coleen
>>>
>>> Just some thoughts.
>>>
>>> Thanks,
>>> David
>>>
>>>> I did write comments to this effect here:
>>>>
>>>> http://cr.openjdk.java.net/~coleenp/2019/8212160.01/webrev/src/hotspot/share/prims/jvmtiCodeBlobEvents.cpp.udiff.html
>>>>
>>>>
>>>> Thanks,
>>>> Coleen
>>>>
>>>>>
>>>>> David
>>>>>
>>>>>> Also, the jmethod_id field in nmethod was only used as a boolean
>>>>>> so don't create a jmethod_id until needed for
>>>>>> post_compiled_method_unload.
>>>>>>
>>>>>> Ran hs tier1-8 on linux-x64-debug and the stress test that
>>>>>> crashed in the original bug report.
>>>>>>
>>>>>> open webrev at
>>>>>> http://cr.openjdk.java.net/~coleenp/2019/8212160.01/webrev
>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8212160
>>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>
>>
More information about the serviceability-dev
mailing list