RFR (M) 8212160: JVMTI agent crashes with "assert(_value != 0LL) failed: resolving NULL _value"

Wed Dec 4 13:45:12 UTC 2019

Thanks, David!
Coleen

On 12/4/19 8:06 AM, David Holmes wrote:
>
> On 4/12/2019 10:21 pm, coleen.phillimore at oracle.com wrote:
>>
>>
>> On 12/3/19 11:39 PM, David Holmes wrote:
>>>
>>>
>>> On 3/12/2019 11:35 pm, coleen.phillimore at oracle.com wrote:
>>>>
>>>>
>>>> On 12/3/19 8:31 AM, David Holmes wrote:
>>>>> On 3/12/2019 11:08 pm, coleen.phillimore at oracle.com wrote:
>>>>>>
>>>>>>
>>>>>> On 12/2/19 11:52 PM, David Holmes wrote:
>>>>>>> Hi Coleen,
>>>>>>>
>>>>>>> On 3/12/2019 12:43 am, coleen.phillimore at oracle.com wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/26/19 7:03 PM, David Holmes wrote:
>>>>>>>>> (adding runtime as well)
>>>>>>>>>
>>>>>>>>> Hi Coleen,
>>>>>>>>>
>>>>>>>>> On 27/11/2019 12:22 am, coleen.phillimore at oracle.com wrote:
>>>>>>>>>> Summary: Add local deferred event list to thread to post 
>>>>>>>>>> events outside CodeCache_lock.
>>>>>>>>>>
>>>>>>>>>> This patch builds on the patch for JDK-8173361. With this 
>>>>>>>>>> patch, I made the JvmtiDeferredEventQueue an instance class 
>>>>>>>>>> (not AllStatic) and have one per thread. The CodeBlob event 
>>>>>>>>>> that used to drop the CodeCache_lock and raced with the 
>>>>>>>>>> sweeper thread, adds the events it wants to post to its 
>>>>>>>>>> thread local list, and processes it outside the lock.  The 
>>>>>>>>>> list is walked in GC and by the sweeper to keep the nmethods 
>>>>>>>>>> from being unloaded and zombied, respectively.
>>>>>>>>>
>>>>>>>>> Sorry I don't understand why we would want/need a deferred 
>>>>>>>>> event queue for every JavaThread? Isn't this only relevant for 
>>>>>>>>> non-JavaThreads that need to have the ServiceThread process 
>>>>>>>>> the deferred event?
>>>>>>>>
>>>>>>>> I thought I'd written this in the bug but I had only discussed 
>>>>>>>> this with Erik.  I've added a comment to the bug to explain why 
>>>>>>>> I added the per-JavaThread queue.  In order to process these 
>>>>>>>> events after the CodeCache_lock is dropped, I have to queue 
>>>>>>>> them somewhere safe. The ServiceThread queue is safe, *but* the 
>>>>>>>> ServiceThread can't keep up with the events, especially from 
>>>>>>>> this test case.  So the test case gets a native OOM.
>>>>>>>>
>>>>>>>> So I've added the safe queue as a field to each JavaThread 
>>>>>>>> because multiple JavaThreads could be posting these events at 
>>>>>>>> the same time, and there didn't seem to be a better safe place 
>>>>>>>> to cache them, without adding another layer of queuing code.
>>>>>>>
>>>>>>> I think I'm getting the picture now. At the time the events are 
>>>>>>> generated we can't post them directly because the current thread 
>>>>>>> is inside compiler code. Hence the events must be deferred. 
>>>>>>> Using the ServiceThread to handle the deferred events is one way 
>>>>>>> to deal with this - but it can't keep up in this scenario. So 
>>>>>>> instead we store the events in the current thread and when the 
>>>>>>> current thread returns to code where it is safe to post the 
>>>>>>> events, it does so itself. Is that generally correct?
>>>>>>
>>>>>> Yes.
>>>>>>>
>>>>>>> I admit I'm not keen on adding this additional field per-thread 
>>>>>>> just for a temporary usage. Some kind of stack allocated helper 
>>>>>>> would be preferable, but would need to be passed through the 
>>>>>>> call chain so that the events could be added to it.
>>>>>>
>>>>>> Right, and the GC and nmethods_do has to find it somehow. It 
>>>>>> wasn't my first choice of where to put it also because there is 
>>>>>> too many things in JavaThread. Might be time for a future cleanup 
>>>>>> of Thread.
>>>>>
>>>>> I see.
>>>>>
>>>>>>>
>>>>>>> Also I'm not clear why we aggressively delete the 
>>>>>>> _jvmti_event_queue after posting the events. I'd be worried 
>>>>>>> about the overhead we are introducing for creating and deleting 
>>>>>>> this queue. When the JvmtiDeferredEventQueue data structure was 
>>>>>>> intended only for use by the ServiceThread its dynamic node 
>>>>>>> allocation may have made more sense. But now that seems like a 
>>>>>>> liability to me - if JvmtiDeferredEvents could be linked 
>>>>>>> directly we wouldn't need dynamic nodes, nor dynamic per-thread 
>>>>>>> queues (just a per-thread pointer).
>>>>>>
>>>>>> I'm not following.  The queue is for multiple events that might 
>>>>>> be posted while in the CodeCache_lock, so they need to be in 
>>>>>> order and linked together.  While we post them and take them off, 
>>>>>> if the callback safepoints (maybe calls back into the JVM), we 
>>>>>> don't want to have GC or nmethods_do walk the one that's been 
>>>>>> posted already. So a queue seems to make sense.
>>>>>
>>>>> Yes but you can make a queue just by having each event have a 
>>>>> _next pointer, rather than dynamically creating nodes to hold the 
>>>>> event. Each event is its own queue node implicitly.
>>>>>
>>>>>> One thing that I experimented with was to have the ServiceThread 
>>>>>> take ownership of the queue in it's local thread queue and post 
>>>>>> them all, which could be a future enhancement.  It didn't help my 
>>>>>> OOM situation.
>>>>>
>>>>> Your OOM situation seems to be a basic case of overwhelming the 
>>>>> ServiceThread. A single serviceThread will always have a limit on 
>>>>> how many events it can handle. Maybe this test is being too 
>>>>> unrealistic in its expectations of the current design?
>>>>
>>>> I think the JVMTI API where you can generate an 
>>>> COMPILED_METHOD_LOAD for all the events in the queue is going to be 
>>>> overwhelming unless it waits for the events to be posted.
>>>
>>> Taking things off the service thread would seem to be a good thing 
>>> then :)
>>>
>>>>>
>>>>>> Deleting the queue after all the events are posted allows 
>>>>>> JavaThread::oops_do and nmethods_do only a null check to deal 
>>>>>> with this jvmti wart.
>>>>>
>>>>> If the nodes are not dynamically allocated you don't need to 
>>>>> delete you just set the queue-head pointer to NULL - actually it 
>>>>> will already be NULL once the last event has been processed.
>>>>
>>>> I could revisit the data structure as a future RFE.  The goal was 
>>>> to reuse code that's already there, and I don't think there's a 
>>>> significant difference in performance.  I did some measurement of 
>>>> the stress case and the times were equivalent, actually better in 
>>>> the new code.
>>>
>>> Okay.
>>
>> Is this a code review then?  I think Serguei promised to review the 
>> code too.
>
> Yes this is a review.
>
> Thanks,
> David
>
>> thanks,
>> Coleen
>>>
>>> Thanks,
>>> David
>>>
>>>>
>>>> Thanks,
>>>> Coleen
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>>
>>>>>>> Just some thoughts.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>> I did write comments to this effect here:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~coleenp/2019/8212160.01/webrev/src/hotspot/share/prims/jvmtiCodeBlobEvents.cpp.udiff.html 
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Coleen
>>>>>>>>
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> Also, the jmethod_id field in nmethod was only used as a 
>>>>>>>>>> boolean so don't create a jmethod_id until needed for 
>>>>>>>>>> post_compiled_method_unload.
>>>>>>>>>>
>>>>>>>>>> Ran hs tier1-8 on linux-x64-debug and the stress test that 
>>>>>>>>>> crashed in the original bug report.
>>>>>>>>>>
>>>>>>>>>> open webrev at 
>>>>>>>>>> http://cr.openjdk.java.net/~coleenp/2019/8212160.01/webrev
>>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8212160
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Coleen
>>>>>>>>
>>>>>>
>>>>
>>