RFR (M) 8212160: JVMTI agent crashes with "assert(_value != 0LL) failed: resolving NULL _value"

Tue Dec 3 13:08:25 UTC 2019

On 12/2/19 11:52 PM, David Holmes wrote:
> Hi Coleen,
>
> On 3/12/2019 12:43 am, coleen.phillimore at oracle.com wrote:
>>
>>
>> On 11/26/19 7:03 PM, David Holmes wrote:
>>> (adding runtime as well)
>>>
>>> Hi Coleen,
>>>
>>> On 27/11/2019 12:22 am, coleen.phillimore at oracle.com wrote:
>>>> Summary: Add local deferred event list to thread to post events 
>>>> outside CodeCache_lock.
>>>>
>>>> This patch builds on the patch for JDK-8173361.  With this patch, I 
>>>> made the JvmtiDeferredEventQueue an instance class (not AllStatic) 
>>>> and have one per thread.  The CodeBlob event that used to drop the 
>>>> CodeCache_lock and raced with the sweeper thread, adds the events 
>>>> it wants to post to its thread local list, and processes it outside 
>>>> the lock.  The list is walked in GC and by the sweeper to keep the 
>>>> nmethods from being unloaded and zombied, respectively.
>>>
>>> Sorry I don't understand why we would want/need a deferred event 
>>> queue for every JavaThread? Isn't this only relevant for 
>>> non-JavaThreads that need to have the ServiceThread process the 
>>> deferred event?
>>
>> I thought I'd written this in the bug but I had only discussed this 
>> with Erik.  I've added a comment to the bug to explain why I added 
>> the per-JavaThread queue.  In order to process these events after the 
>> CodeCache_lock is dropped, I have to queue them somewhere safe. The 
>> ServiceThread queue is safe, *but* the ServiceThread can't keep up 
>> with the events, especially from this test case.  So the test case 
>> gets a native OOM.
>>
>> So I've added the safe queue as a field to each JavaThread because 
>> multiple JavaThreads could be posting these events at the same time, 
>> and there didn't seem to be a better safe place to cache them, 
>> without adding another layer of queuing code.
>
> I think I'm getting the picture now. At the time the events are 
> generated we can't post them directly because the current thread is 
> inside compiler code. Hence the events must be deferred. Using the 
> ServiceThread to handle the deferred events is one way to deal with 
> this - but it can't keep up in this scenario. So instead we store the 
> events in the current thread and when the current thread returns to 
> code where it is safe to post the events, it does so itself. Is that 
> generally correct?

Yes.
>
> I admit I'm not keen on adding this additional field per-thread just 
> for a temporary usage. Some kind of stack allocated helper would be 
> preferable, but would need to be passed through the call chain so that 
> the events could be added to it.

Right, and the GC and nmethods_do has to find it somehow.  It wasn't my 
first choice of where to put it also because there is too many things in 
JavaThread.  Might be time for a future cleanup of Thread.
>
> Also I'm not clear why we aggressively delete the _jvmti_event_queue 
> after posting the events. I'd be worried about the overhead we are 
> introducing for creating and deleting this queue. When the 
> JvmtiDeferredEventQueue data structure was intended only for use by 
> the ServiceThread its dynamic node allocation may have made more 
> sense. But now that seems like a liability to me - if 
> JvmtiDeferredEvents could be linked directly we wouldn't need dynamic 
> nodes, nor dynamic per-thread queues (just a per-thread pointer).

I'm not following.  The queue is for multiple events that might be 
posted while in the CodeCache_lock, so they need to be in order and 
linked together.  While we post them and take them off, if the callback 
safepoints (maybe calls back into the JVM), we don't want to have GC or 
nmethods_do walk the one that's been posted already. So a queue seems to 
make sense.

One thing that I experimented with was to have the ServiceThread take 
ownership of the queue in it's local thread queue and post them all, 
which could be a future enhancement.  It didn't help my OOM situation.

Deleting the queue after all the events are posted allows 
JavaThread::oops_do and nmethods_do only a null check to deal with this 
jvmti wart.

Thanks,
Coleen
>
> Just some thoughts.
>
> Thanks,
> David
>
>> I did write comments to this effect here:
>>
>> http://cr.openjdk.java.net/~coleenp/2019/8212160.01/webrev/src/hotspot/share/prims/jvmtiCodeBlobEvents.cpp.udiff.html 
>>
>>
>> Thanks,
>> Coleen
>>
>>>
>>> David
>>>
>>>> Also, the jmethod_id field in nmethod was only used as a boolean so 
>>>> don't create a jmethod_id until needed for 
>>>> post_compiled_method_unload.
>>>>
>>>> Ran hs tier1-8 on linux-x64-debug and the stress test that crashed 
>>>> in the original bug report.
>>>>
>>>> open webrev at 
>>>> http://cr.openjdk.java.net/~coleenp/2019/8212160.01/webrev
>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8212160
>>>>
>>>> Thanks,
>>>> Coleen
>>