RFR (M) 8212160: JVMTI agent crashes with "assert(_value != 0LL) failed: resolving NULL _value"

Tue Dec 3 13:31:22 UTC 2019

On 3/12/2019 11:08 pm, coleen.phillimore at oracle.com wrote:
> 
> 
> On 12/2/19 11:52 PM, David Holmes wrote:
>> Hi Coleen,
>>
>> On 3/12/2019 12:43 am, coleen.phillimore at oracle.com wrote:
>>>
>>>
>>> On 11/26/19 7:03 PM, David Holmes wrote:
>>>> (adding runtime as well)
>>>>
>>>> Hi Coleen,
>>>>
>>>> On 27/11/2019 12:22 am, coleen.phillimore at oracle.com wrote:
>>>>> Summary: Add local deferred event list to thread to post events 
>>>>> outside CodeCache_lock.
>>>>>
>>>>> This patch builds on the patch for JDK-8173361.  With this patch, I 
>>>>> made the JvmtiDeferredEventQueue an instance class (not AllStatic) 
>>>>> and have one per thread.  The CodeBlob event that used to drop the 
>>>>> CodeCache_lock and raced with the sweeper thread, adds the events 
>>>>> it wants to post to its thread local list, and processes it outside 
>>>>> the lock.  The list is walked in GC and by the sweeper to keep the 
>>>>> nmethods from being unloaded and zombied, respectively.
>>>>
>>>> Sorry I don't understand why we would want/need a deferred event 
>>>> queue for every JavaThread? Isn't this only relevant for 
>>>> non-JavaThreads that need to have the ServiceThread process the 
>>>> deferred event?
>>>
>>> I thought I'd written this in the bug but I had only discussed this 
>>> with Erik.  I've added a comment to the bug to explain why I added 
>>> the per-JavaThread queue.  In order to process these events after the 
>>> CodeCache_lock is dropped, I have to queue them somewhere safe. The 
>>> ServiceThread queue is safe, *but* the ServiceThread can't keep up 
>>> with the events, especially from this test case.  So the test case 
>>> gets a native OOM.
>>>
>>> So I've added the safe queue as a field to each JavaThread because 
>>> multiple JavaThreads could be posting these events at the same time, 
>>> and there didn't seem to be a better safe place to cache them, 
>>> without adding another layer of queuing code.
>>
>> I think I'm getting the picture now. At the time the events are 
>> generated we can't post them directly because the current thread is 
>> inside compiler code. Hence the events must be deferred. Using the 
>> ServiceThread to handle the deferred events is one way to deal with 
>> this - but it can't keep up in this scenario. So instead we store the 
>> events in the current thread and when the current thread returns to 
>> code where it is safe to post the events, it does so itself. Is that 
>> generally correct?
> 
> Yes.
>>
>> I admit I'm not keen on adding this additional field per-thread just 
>> for a temporary usage. Some kind of stack allocated helper would be 
>> preferable, but would need to be passed through the call chain so that 
>> the events could be added to it.
> 
> Right, and the GC and nmethods_do has to find it somehow.  It wasn't my 
> first choice of where to put it also because there is too many things in 
> JavaThread.  Might be time for a future cleanup of Thread.

I see.

>>
>> Also I'm not clear why we aggressively delete the _jvmti_event_queue 
>> after posting the events. I'd be worried about the overhead we are 
>> introducing for creating and deleting this queue. When the 
>> JvmtiDeferredEventQueue data structure was intended only for use by 
>> the ServiceThread its dynamic node allocation may have made more 
>> sense. But now that seems like a liability to me - if 
>> JvmtiDeferredEvents could be linked directly we wouldn't need dynamic 
>> nodes, nor dynamic per-thread queues (just a per-thread pointer).
> 
> I'm not following.  The queue is for multiple events that might be 
> posted while in the CodeCache_lock, so they need to be in order and 
> linked together.  While we post them and take them off, if the callback 
> safepoints (maybe calls back into the JVM), we don't want to have GC or 
> nmethods_do walk the one that's been posted already. So a queue seems to 
> make sense.

Yes but you can make a queue just by having each event have a _next 
pointer, rather than dynamically creating nodes to hold the event. Each 
event is its own queue node implicitly.

> One thing that I experimented with was to have the ServiceThread take 
> ownership of the queue in it's local thread queue and post them all, 
> which could be a future enhancement.  It didn't help my OOM situation.

Your OOM situation seems to be a basic case of overwhelming the 
ServiceThread. A single serviceThread will always have a limit on how 
many events it can handle. Maybe this test is being too unrealistic in 
its expectations of the current design?

> Deleting the queue after all the events are posted allows 
> JavaThread::oops_do and nmethods_do only a null check to deal with this 
> jvmti wart.

If the nodes are not dynamically allocated you don't need to delete you 
just set the queue-head pointer to NULL - actually it will already be 
NULL once the last event has been processed.

David
-----

> Thanks,
> Coleen
>>
>> Just some thoughts.
>>
>> Thanks,
>> David
>>
>>> I did write comments to this effect here:
>>>
>>> http://cr.openjdk.java.net/~coleenp/2019/8212160.01/webrev/src/hotspot/share/prims/jvmtiCodeBlobEvents.cpp.udiff.html 
>>>
>>>
>>> Thanks,
>>> Coleen
>>>
>>>>
>>>> David
>>>>
>>>>> Also, the jmethod_id field in nmethod was only used as a boolean so 
>>>>> don't create a jmethod_id until needed for 
>>>>> post_compiled_method_unload.
>>>>>
>>>>> Ran hs tier1-8 on linux-x64-debug and the stress test that crashed 
>>>>> in the original bug report.
>>>>>
>>>>> open webrev at 
>>>>> http://cr.openjdk.java.net/~coleenp/2019/8212160.01/webrev
>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8212160
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>
>