RFR (M) 8212160: JVMTI agent crashes with "assert(_value != 0LL) failed: resolving NULL _value"

Wed Dec 4 04:39:20 UTC 2019

On 3/12/2019 11:35 pm, coleen.phillimore at oracle.com wrote:
> 
> 
> On 12/3/19 8:31 AM, David Holmes wrote:
>> On 3/12/2019 11:08 pm, coleen.phillimore at oracle.com wrote:
>>>
>>>
>>> On 12/2/19 11:52 PM, David Holmes wrote:
>>>> Hi Coleen,
>>>>
>>>> On 3/12/2019 12:43 am, coleen.phillimore at oracle.com wrote:
>>>>>
>>>>>
>>>>> On 11/26/19 7:03 PM, David Holmes wrote:
>>>>>> (adding runtime as well)
>>>>>>
>>>>>> Hi Coleen,
>>>>>>
>>>>>> On 27/11/2019 12:22 am, coleen.phillimore at oracle.com wrote:
>>>>>>> Summary: Add local deferred event list to thread to post events 
>>>>>>> outside CodeCache_lock.
>>>>>>>
>>>>>>> This patch builds on the patch for JDK-8173361.  With this patch, 
>>>>>>> I made the JvmtiDeferredEventQueue an instance class (not 
>>>>>>> AllStatic) and have one per thread. The CodeBlob event that used 
>>>>>>> to drop the CodeCache_lock and raced with the sweeper thread, 
>>>>>>> adds the events it wants to post to its thread local list, and 
>>>>>>> processes it outside the lock.  The list is walked in GC and by 
>>>>>>> the sweeper to keep the nmethods from being unloaded and zombied, 
>>>>>>> respectively.
>>>>>>
>>>>>> Sorry I don't understand why we would want/need a deferred event 
>>>>>> queue for every JavaThread? Isn't this only relevant for 
>>>>>> non-JavaThreads that need to have the ServiceThread process the 
>>>>>> deferred event?
>>>>>
>>>>> I thought I'd written this in the bug but I had only discussed this 
>>>>> with Erik.  I've added a comment to the bug to explain why I added 
>>>>> the per-JavaThread queue.  In order to process these events after 
>>>>> the CodeCache_lock is dropped, I have to queue them somewhere safe. 
>>>>> The ServiceThread queue is safe, *but* the ServiceThread can't keep 
>>>>> up with the events, especially from this test case.  So the test 
>>>>> case gets a native OOM.
>>>>>
>>>>> So I've added the safe queue as a field to each JavaThread because 
>>>>> multiple JavaThreads could be posting these events at the same 
>>>>> time, and there didn't seem to be a better safe place to cache 
>>>>> them, without adding another layer of queuing code.
>>>>
>>>> I think I'm getting the picture now. At the time the events are 
>>>> generated we can't post them directly because the current thread is 
>>>> inside compiler code. Hence the events must be deferred. Using the 
>>>> ServiceThread to handle the deferred events is one way to deal with 
>>>> this - but it can't keep up in this scenario. So instead we store 
>>>> the events in the current thread and when the current thread returns 
>>>> to code where it is safe to post the events, it does so itself. Is 
>>>> that generally correct?
>>>
>>> Yes.
>>>>
>>>> I admit I'm not keen on adding this additional field per-thread just 
>>>> for a temporary usage. Some kind of stack allocated helper would be 
>>>> preferable, but would need to be passed through the call chain so 
>>>> that the events could be added to it.
>>>
>>> Right, and the GC and nmethods_do has to find it somehow.  It wasn't 
>>> my first choice of where to put it also because there is too many 
>>> things in JavaThread.  Might be time for a future cleanup of Thread.
>>
>> I see.
>>
>>>>
>>>> Also I'm not clear why we aggressively delete the _jvmti_event_queue 
>>>> after posting the events. I'd be worried about the overhead we are 
>>>> introducing for creating and deleting this queue. When the 
>>>> JvmtiDeferredEventQueue data structure was intended only for use by 
>>>> the ServiceThread its dynamic node allocation may have made more 
>>>> sense. But now that seems like a liability to me - if 
>>>> JvmtiDeferredEvents could be linked directly we wouldn't need 
>>>> dynamic nodes, nor dynamic per-thread queues (just a per-thread 
>>>> pointer).
>>>
>>> I'm not following.  The queue is for multiple events that might be 
>>> posted while in the CodeCache_lock, so they need to be in order and 
>>> linked together.  While we post them and take them off, if the 
>>> callback safepoints (maybe calls back into the JVM), we don't want to 
>>> have GC or nmethods_do walk the one that's been posted already. So a 
>>> queue seems to make sense.
>>
>> Yes but you can make a queue just by having each event have a _next 
>> pointer, rather than dynamically creating nodes to hold the event. 
>> Each event is its own queue node implicitly.
>>
>>> One thing that I experimented with was to have the ServiceThread take 
>>> ownership of the queue in it's local thread queue and post them all, 
>>> which could be a future enhancement.  It didn't help my OOM situation.
>>
>> Your OOM situation seems to be a basic case of overwhelming the 
>> ServiceThread. A single serviceThread will always have a limit on how 
>> many events it can handle. Maybe this test is being too unrealistic in 
>> its expectations of the current design?
> 
> I think the JVMTI API where you can generate an COMPILED_METHOD_LOAD for 
> all the events in the queue is going to be overwhelming unless it waits 
> for the events to be posted.

Taking things off the service thread would seem to be a good thing then :)

>>
>>> Deleting the queue after all the events are posted allows 
>>> JavaThread::oops_do and nmethods_do only a null check to deal with 
>>> this jvmti wart.
>>
>> If the nodes are not dynamically allocated you don't need to delete 
>> you just set the queue-head pointer to NULL - actually it will already 
>> be NULL once the last event has been processed.
> 
> I could revisit the data structure as a future RFE.  The goal was to 
> reuse code that's already there, and I don't think there's a significant 
> difference in performance.  I did some measurement of the stress case 
> and the times were equivalent, actually better in the new code.

Okay.

Thanks,
David

> 
> Thanks,
> Coleen
>>
>> David
>> -----
>>
>>> Thanks,
>>> Coleen
>>>>
>>>> Just some thoughts.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> I did write comments to this effect here:
>>>>>
>>>>> http://cr.openjdk.java.net/~coleenp/2019/8212160.01/webrev/src/hotspot/share/prims/jvmtiCodeBlobEvents.cpp.udiff.html 
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>
>>>>>>
>>>>>> David
>>>>>>
>>>>>>> Also, the jmethod_id field in nmethod was only used as a boolean 
>>>>>>> so don't create a jmethod_id until needed for 
>>>>>>> post_compiled_method_unload.
>>>>>>>
>>>>>>> Ran hs tier1-8 on linux-x64-debug and the stress test that 
>>>>>>> crashed in the original bug report.
>>>>>>>
>>>>>>> open webrev at 
>>>>>>> http://cr.openjdk.java.net/~coleenp/2019/8212160.01/webrev
>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8212160
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>
>>>
>