RFR (M) 8212160: JVMTI agent crashes with "assert(_value != 0LL) failed: resolving NULL _value"

Wed Dec 4 12:21:49 UTC 2019

On 12/3/19 11:39 PM, David Holmes wrote:
>
>
> On 3/12/2019 11:35 pm, coleen.phillimore at oracle.com wrote:
>>
>>
>> On 12/3/19 8:31 AM, David Holmes wrote:
>>> On 3/12/2019 11:08 pm, coleen.phillimore at oracle.com wrote:
>>>>
>>>>
>>>> On 12/2/19 11:52 PM, David Holmes wrote:
>>>>> Hi Coleen,
>>>>>
>>>>> On 3/12/2019 12:43 am, coleen.phillimore at oracle.com wrote:
>>>>>>
>>>>>>
>>>>>> On 11/26/19 7:03 PM, David Holmes wrote:
>>>>>>> (adding runtime as well)
>>>>>>>
>>>>>>> Hi Coleen,
>>>>>>>
>>>>>>> On 27/11/2019 12:22 am, coleen.phillimore at oracle.com wrote:
>>>>>>>> Summary: Add local deferred event list to thread to post events 
>>>>>>>> outside CodeCache_lock.
>>>>>>>>
>>>>>>>> This patch builds on the patch for JDK-8173361. With this 
>>>>>>>> patch, I made the JvmtiDeferredEventQueue an instance class 
>>>>>>>> (not AllStatic) and have one per thread. The CodeBlob event 
>>>>>>>> that used to drop the CodeCache_lock and raced with the sweeper 
>>>>>>>> thread, adds the events it wants to post to its thread local 
>>>>>>>> list, and processes it outside the lock.  The list is walked in 
>>>>>>>> GC and by the sweeper to keep the nmethods from being unloaded 
>>>>>>>> and zombied, respectively.
>>>>>>>
>>>>>>> Sorry I don't understand why we would want/need a deferred event 
>>>>>>> queue for every JavaThread? Isn't this only relevant for 
>>>>>>> non-JavaThreads that need to have the ServiceThread process the 
>>>>>>> deferred event?
>>>>>>
>>>>>> I thought I'd written this in the bug but I had only discussed 
>>>>>> this with Erik.  I've added a comment to the bug to explain why I 
>>>>>> added the per-JavaThread queue.  In order to process these events 
>>>>>> after the CodeCache_lock is dropped, I have to queue them 
>>>>>> somewhere safe. The ServiceThread queue is safe, *but* the 
>>>>>> ServiceThread can't keep up with the events, especially from this 
>>>>>> test case.  So the test case gets a native OOM.
>>>>>>
>>>>>> So I've added the safe queue as a field to each JavaThread 
>>>>>> because multiple JavaThreads could be posting these events at the 
>>>>>> same time, and there didn't seem to be a better safe place to 
>>>>>> cache them, without adding another layer of queuing code.
>>>>>
>>>>> I think I'm getting the picture now. At the time the events are 
>>>>> generated we can't post them directly because the current thread 
>>>>> is inside compiler code. Hence the events must be deferred. Using 
>>>>> the ServiceThread to handle the deferred events is one way to deal 
>>>>> with this - but it can't keep up in this scenario. So instead we 
>>>>> store the events in the current thread and when the current thread 
>>>>> returns to code where it is safe to post the events, it does so 
>>>>> itself. Is that generally correct?
>>>>
>>>> Yes.
>>>>>
>>>>> I admit I'm not keen on adding this additional field per-thread 
>>>>> just for a temporary usage. Some kind of stack allocated helper 
>>>>> would be preferable, but would need to be passed through the call 
>>>>> chain so that the events could be added to it.
>>>>
>>>> Right, and the GC and nmethods_do has to find it somehow. It wasn't 
>>>> my first choice of where to put it also because there is too many 
>>>> things in JavaThread.  Might be time for a future cleanup of Thread.
>>>
>>> I see.
>>>
>>>>>
>>>>> Also I'm not clear why we aggressively delete the 
>>>>> _jvmti_event_queue after posting the events. I'd be worried about 
>>>>> the overhead we are introducing for creating and deleting this 
>>>>> queue. When the JvmtiDeferredEventQueue data structure was 
>>>>> intended only for use by the ServiceThread its dynamic node 
>>>>> allocation may have made more sense. But now that seems like a 
>>>>> liability to me - if JvmtiDeferredEvents could be linked directly 
>>>>> we wouldn't need dynamic nodes, nor dynamic per-thread queues 
>>>>> (just a per-thread pointer).
>>>>
>>>> I'm not following.  The queue is for multiple events that might be 
>>>> posted while in the CodeCache_lock, so they need to be in order and 
>>>> linked together.  While we post them and take them off, if the 
>>>> callback safepoints (maybe calls back into the JVM), we don't want 
>>>> to have GC or nmethods_do walk the one that's been posted already. 
>>>> So a queue seems to make sense.
>>>
>>> Yes but you can make a queue just by having each event have a _next 
>>> pointer, rather than dynamically creating nodes to hold the event. 
>>> Each event is its own queue node implicitly.
>>>
>>>> One thing that I experimented with was to have the ServiceThread 
>>>> take ownership of the queue in it's local thread queue and post 
>>>> them all, which could be a future enhancement.  It didn't help my 
>>>> OOM situation.
>>>
>>> Your OOM situation seems to be a basic case of overwhelming the 
>>> ServiceThread. A single serviceThread will always have a limit on 
>>> how many events it can handle. Maybe this test is being too 
>>> unrealistic in its expectations of the current design?
>>
>> I think the JVMTI API where you can generate an COMPILED_METHOD_LOAD 
>> for all the events in the queue is going to be overwhelming unless it 
>> waits for the events to be posted.
>
> Taking things off the service thread would seem to be a good thing 
> then :)
>
>>>
>>>> Deleting the queue after all the events are posted allows 
>>>> JavaThread::oops_do and nmethods_do only a null check to deal with 
>>>> this jvmti wart.
>>>
>>> If the nodes are not dynamically allocated you don't need to delete 
>>> you just set the queue-head pointer to NULL - actually it will 
>>> already be NULL once the last event has been processed.
>>
>> I could revisit the data structure as a future RFE.  The goal was to 
>> reuse code that's already there, and I don't think there's a 
>> significant difference in performance.  I did some measurement of the 
>> stress case and the times were equivalent, actually better in the new 
>> code.
>
> Okay.

Is this a code review then?  I think Serguei promised to review the code 
too.

thanks,
Coleen
>
> Thanks,
> David
>
>>
>> Thanks,
>> Coleen
>>>
>>> David
>>> -----
>>>
>>>> Thanks,
>>>> Coleen
>>>>>
>>>>> Just some thoughts.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> I did write comments to this effect here:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~coleenp/2019/8212160.01/webrev/src/hotspot/share/prims/jvmtiCodeBlobEvents.cpp.udiff.html 
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>> Also, the jmethod_id field in nmethod was only used as a 
>>>>>>>> boolean so don't create a jmethod_id until needed for 
>>>>>>>> post_compiled_method_unload.
>>>>>>>>
>>>>>>>> Ran hs tier1-8 on linux-x64-debug and the stress test that 
>>>>>>>> crashed in the original bug report.
>>>>>>>>
>>>>>>>> open webrev at 
>>>>>>>> http://cr.openjdk.java.net/~coleenp/2019/8212160.01/webrev
>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8212160
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Coleen
>>>>>>
>>>>
>>