RFR (S) 8173361: various crashes in JvmtiExport::post_compiled_method_load

Sat Nov 16 12:55:10 UTC 2019

On 11/15/19 11:17 PM, serguei.spitsyn at oracle.com wrote:
> Hi Coleen,
>
> On 11/15/19 2:12 PM, coleen.phillimore at oracle.com wrote:
>>
>> Hi, I've been working on answers to these questions, so I'll start 
>> with this one.
>>
>> The nmethodLocker keeps the nmethod from being reclaimed (made_zombie 
>> or memory released) by the sweeper, but the nmethod could be 
>> unloaded.  Unloading the nmethod clears the Method* _method field.
>
> Yes, I see it is done in the nmethod::make_unloaded().
>
>> The post_compiled_method_load event needs the _method field to look 
>> at things like inlining and ScopeDesc fields.   If the nmethod is 
>> unloaded, some of the oops are dead.  There are "holder" oops that 
>> correspond to the metadata in the nmethod.  If these oops are dead, 
>> causing the nmethod to get unloaded, then the metadata may not be valid.
>>
>> So my change 02 looks for a NULL nmethod._method field to tell 
>> whether we can post information about the nmethod.
>>
>> There's code in nmethod.cpp like:
>>
>> jmethodID nmethod::get_and_cache_jmethod_id() {
>>   if (_jmethod_id == NULL) {
>>     // Cache the jmethod_id since it can no longer be looked up once the
>>     // method itself has been marked for unloading.
>>     _jmethod_id = method()->jmethod_id();
>>   }
>>   return _jmethod_id;
>> }
>>
>> Which was added when post_method_load and unload were turned into 
>> deferred events.
>
> Could we cache the jmethodID in the 
> JvmtiDeferredEvent::compiled_method_load_event
> similarly as we do in the 
> JvmtiDeferredEvent::compiled_method_unload_event?
> This would help to get rid of the dependency on the nmethod::_method.
> Do we depend on any other nmethod fields?

Yes, there are other nmethod metadata that we rely on to print inline 
information, and this function 
JvmtiCodeBlobEvents::build_jvmti_addr_location_map because it uses the 
ScopeDesc data in the nmethod.

We do cache the jmethodID but that's not good enough.  See my last 
comment in the bug report.  The jmethodID can point to an unloaded method.

I tried a version of keeping the nmethod alive, but the GC folks will 
hate it.  And it doesn't work and I hate it.

My version 01 is the best, with the caveat that maybe it should check 
for _method == NULL instead of nmethod->is_alive().  I have to talk to 
Erik to see if there's a race with concurrent class unloading.

Any application that depends on a compiled method loading event on a 
class that could be unloaded is a buggy application.  Applications 
should not rely on when the JIT compiler decides to compile a method!  
This happens to us for a stress test.  Most applications will get most 
of their compiled method loading events as they normally do.

Thanks,
Coleen

>
>
> Thanks,
> Serguei
>
>> I put more debugging in the bug to show this crash was from an 
>> unloaded nmethod.
>>
>> Coleen
>>
>>
>> On 11/15/19 4:45 PM, serguei.spitsyn at oracle.com wrote:
>>> Hi Coleen,
>>>
>>> I have some questions.
>>>
>>> Both the compiler method load and unload are posted as deferred events.
>>> Both events keep the nmethod alive until the ServiceThread processes 
>>> the event.
>>>
>>> The implementation is:
>>>
>>> JvmtiDeferredEvent 
>>> JvmtiDeferredEvent::compiled_method_load_event(nmethod* nm) {
>>>   . . .
>>>   // Keep the nmethod alive until the ServiceThread can process
>>>   // this deferred event.
>>>   nmethodLocker::lock_nmethod(nm);
>>>   return event;
>>> }
>>>
>>> JvmtiDeferredEvent 
>>> JvmtiDeferredEvent::compiled_method_unload_event(nmethod* nm, 
>>> jmethodID id, const void* code) {
>>>   . . .
>>>   // Keep the nmethod alive until the ServiceThread can process
>>>   // this deferred event. This will keep the memory for the
>>>   // generated code from being reused too early. We pass
>>>   // zombie_ok == true here so that our nmethod that was just
>>>   // made into a zombie can be locked.
>>>   nmethodLocker::lock_nmethod(nm, true /* zombie_ok */);
>>>   return event;
>>> }
>>>
>>> void JvmtiDeferredEvent::post() {
>>>   assert(ServiceThread::is_service_thread(Thread::current()),
>>>          "Service thread must post enqueued events");
>>>   switch(_type) {
>>>     case TYPE_COMPILED_METHOD_LOAD: {
>>>       nmethod* nm = _event_data.compiled_method_load;
>>>       JvmtiExport::post_compiled_method_load(nm);
>>>       // done with the deferred event so unlock the nmethod
>>>       nmethodLocker::unlock_nmethod(nm);
>>>       break;
>>>     }
>>>     case TYPE_COMPILED_METHOD_UNLOAD: {
>>>       nmethod* nm = _event_data.compiled_method_unload.nm;
>>>       JvmtiExport::post_compiled_method_unload(
>>>         _event_data.compiled_method_unload.method_id,
>>>         _event_data.compiled_method_unload.code_begin);
>>>       // done with the deferred event so unlock the nmethod
>>>       nmethodLocker::unlock_nmethod(nm);
>>>       break;
>>>     }
>>>     . . .
>>>   }
>>> }
>>>
>>> Then I wonder how is it possible for the nmethod to be not alive here?:
>>> 2168 void JvmtiExport::post_compiled_method_load(nmethod *nm) {
>>> . . .
>>> 2173 // It's not safe to look at metadata for unloaded methods.
>>> 2174 if (!nm->is_alive()) {
>>> 2175 return;
>>> 2176 }
>>> At least, it lokks like something else is broken.
>>> Do I miss something important here?
>>>
>>> Thanks,
>>> Serguei
>>>
>>>
>>> On 11/14/19 5:15 PM, coleen.phillimore at oracle.com wrote:
>>>> Summary: Don't post information which uses metadata from unloaded 
>>>> nmethods
>>>>
>>>> Tested tier1-3 and 100 times with test that failed (reproduced 
>>>> failure without the fix).
>>>>
>>>> open webrev at 
>>>> http://cr.openjdk.java.net/~coleenp/2019/8173361.01/webrev
>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8173361
>>>>
>>>> Thanks,
>>>> Coleen
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20191116/0ed02ab4/attachment.html>