RFR (S) 8173361: various crashes in JvmtiExport::post_compiled_method_load

Mon Nov 25 13:47:48 UTC 2019

Thanks for the code review, Serguei!
Coleen

On 11/22/19 6:34 PM, serguei.spitsyn at oracle.com wrote:
> Hi Coleen,
>
> +1
>
> Thanks,
> Serguei
>
>
> On 11/22/19 14:53, Daniel D. Daugherty wrote:
>>> http://cr.openjdk.java.net/~coleenp/2019/8173361.04.incr/webrev
>>
>> src/hotspot/share/prims/jvmtiImpl.cpp
>>     No comments.
>>
>> src/hotspot/share/prims/jvmtiImpl.hpp
>>     No comments.
>>
>> src/hotspot/share/runtime/serviceThread.cpp
>>     No comments.
>>
>> Thumbs up.
>>
>> Dan
>>
>>
>> On 11/22/19 2:15 PM, coleen.phillimore at oracle.com wrote:
>>>
>>> Dan, Thank you for reviewing this!
>>>
>>> On 11/22/19 12:49 PM, Daniel D. Daugherty wrote:
>>>> Hi Coleen,
>>>>
>>>> Sorry for the delay in getting back to this re-review.
>>>>
>>>>
>>>> On 11/21/19 9:12 AM, coleen.phillimore at oracle.com wrote:
>>>>>
>>>>> Please review a new version of this change that keeps the nmethod 
>>>>> from being unloaded, after it is added to the deferred event queue:
>>>>>
>>>>> http://cr.openjdk.java.net/~coleenp/2019/8173361.03/webrev/index.html
>>>>
>>>> src/hotspot/share/code/nmethod.cpp
>>>>     No comments.
>>>>
>>>> src/hotspot/share/oops/instanceKlass.cpp
>>>>     No comments.
>>>>
>>>> src/hotspot/share/prims/jvmtiExport.cpp
>>>>     No comments.
>>>>
>>>> src/hotspot/share/prims/jvmtiImpl.cpp
>>>>     Nice solution with the new oops_do() and nmethods_do() functions!
>>> Erik's insistance!
>>>>
>>>>     old L988: void JvmtiDeferredEventQueue::enqueue(const 
>>>> JvmtiDeferredEvent& event) {
>>>>     new L998: void 
>>>> JvmtiDeferredEventQueue::enqueue(JvmtiDeferredEvent event) {
>>>>         Not sure why this was changed.
>>>>
>>>>         Update: Looks like Serguei raised the issue and Coleen has 
>>>> already
>>>>         resolved it.
>>>
>>> Yes.
>>>>
>>>> src/hotspot/share/prims/jvmtiImpl.hpp
>>>>     old L494:     QueueNode(const JvmtiDeferredEvent& event)
>>>>     new L498:     QueueNode(JvmtiDeferredEvent& event)
>>>>         Why was this changed?
>>>>
>>>>         Update: Not clear if this was covered by Coleen's reply to 
>>>> Serguei.
>>>>
>>>>     old L497:     const JvmtiDeferredEvent& event() const { return 
>>>> _event; }
>>>>     new L501:     JvmtiDeferredEvent& event() { return _event; }
>>>>         Why was this changed?
>>>>
>>>>         Update: Coleen's reply to Serguei explained this. Perhaps add:
>>>>                   // Not const because of oops_do() and nmethods_do().
>>>>
>>>>     old L509:   static void enqueue(const JvmtiDeferredEvent& 
>>>> event) NOT_JVMTI_RETURN;
>>>>     new L513:   static void enqueue(JvmtiDeferredEvent event) 
>>>> NOT_JVMTI_RETURN;
>>>>         Why was this changed?
>>>>
>>>>         Update: Looks like Serguei raised the issue and Coleen has 
>>>> already
>>>>         resolved it.
>>>
>>> Yes, I fixed these.
>>>>
>>>> src/hotspot/share/runtime/mutexLocker.cpp
>>>>     This change is going to require some testing to make sure we don't
>>>>     have any new deadlock scenarios.
>>>
>>> Luckily, I've previously added an implicit NoSafepointVerifier to 
>>> locks that are _allow_vm_block = true, like this one.
>>> + def(JmethodIdCreation_lock , PaddedMutex , leaf, true, 
>>> _safepoint_check_never); // used for creating jmethodIDs.
>>> which prevents one class of deadlock. If we take out another lock 
>>> with a higher rank, we'll get the ranking assert.
>>>
>>> This lock prevents insertion into an array, and has little outside 
>>> calls.
>>>
>>> I'm running tests in tier 1-6 but any code that travels through this 
>>> should get these assertion checks, rather than deadlocking.
>>>
>>>>
>>>> src/hotspot/share/runtime/serviceThread.cpp
>>>>     L50 - nit - why the extra blank line?
>>>
>>> To separate static data member definitions from functions.  I 
>>> removed it.
>>>>
>>>> src/hotspot/share/runtime/serviceThread.hpp
>>>>     Thanks for cleaning up the static:
>>>>
>>>>       ServiceThread::is_service_thread(Thread* thread)
>>>>
>>>>     stuff. Having it be different than the other threads was
>>>>     a bit jarring.
>>>>
>>>> src/hotspot/share/runtime/thread.hpp
>>>>     No comments.
>>>>
>>>> Thumbs up. My only comments are nits so I don't need to see a
>>>> new webrev if you decide to fix them.
>>>
>>> So it turns out that in stress testing my fix 
>>> forhttps://bugs.openjdk.java.net/browse/JDK-8212160
>>>
>>> Because I was in the area and thought this was a duplicate of that 
>>> bug (it is not).   I found that calling oops_do and nmethods_do the 
>>> ServiceThread  needs to hold the Service_lock, because other threads 
>>> can be adding things to the global queue while the sweeper thread is 
>>> calling this in a handshake.
>>>
>>> I am now retesting this change with the changes above, and with the 
>>> Service_lock.   So far my stress tests for JDK-81212160 and the 
>>> stress test for this bug pass, but I'm going to run through all the 
>>> tiers 1-6 over the weekend.
>>>
>>> Please have a look at the changes in the meantime.
>>>
>>> http://cr.openjdk.java.net/~coleenp/2019/8173361.04.incr/webrev
>>> http://cr.openjdk.java.net/~coleenp/2019/8173361.04/webrev
>>>
>>> Thanks,
>>> Coleen
>>>>
>>>> Dan
>>>>
>>>>>
>>>>> Ran the test that failed 100 times without failure, tier1 on 
>>>>> Oracle supported platforms, and tier2-3 including jvmti and jdi 
>>>>> tests locally.
>>>>>
>>>>> See bug for more details about the crash.
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8173361
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>
>>>>> On 11/18/19 10:09 PM, coleen.phillimore at oracle.com wrote:
>>>>>>
>>>>>> Hi Serguei,
>>>>>>
>>>>>> Sorry for not sending an update.  I talked to Erik and am working 
>>>>>> on a version that keeps the nmethod from being unloaded while 
>>>>>> it's in the deferred event queue, with a version that the GC 
>>>>>> people will like, and I like.  I'm testing it out now.
>>>>>>
>>>>>> Thanks!
>>>>>> Coleen
>>>>>>
>>>>>>
>>>>>> On 11/18/19 10:03 PM, serguei.spitsyn at oracle.com wrote:
>>>>>>> Hi Coleen,
>>>>>>>
>>>>>>> Sorry for the latency, I had to investigate it a little bit.
>>>>>>> I still have some doubt your fix is right thing to do.
>>>>>>>
>>>>>>>
>>>>>>> On 11/16/19 04:55, coleen.phillimore at oracle.com wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/15/19 11:17 PM, serguei.spitsyn at oracle.com wrote:
>>>>>>>>> Hi Coleen,
>>>>>>>>>
>>>>>>>>> On 11/15/19 2:12 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>>>>
>>>>>>>>>> Hi, I've been working on answers to these questions, so I'll 
>>>>>>>>>> start with this one.
>>>>>>>>>>
>>>>>>>>>> The nmethodLocker keeps the nmethod from being reclaimed 
>>>>>>>>>> (made_zombie or memory released) by the sweeper, but the 
>>>>>>>>>> nmethod could be unloaded. Unloading the nmethod clears the 
>>>>>>>>>> Method* _method field.
>>>>>>>>>
>>>>>>>>> Yes, I see it is done in the nmethod::make_unloaded().
>>>>>>>>>
>>>>>>>>>> The post_compiled_method_load event needs the _method field 
>>>>>>>>>> to look at things like inlining and ScopeDesc fields.   If 
>>>>>>>>>> the nmethod is unloaded, some of the oops are dead.  There 
>>>>>>>>>> are "holder" oops that correspond to the metadata in the 
>>>>>>>>>> nmethod.  If these oops are dead, causing the nmethod to get 
>>>>>>>>>> unloaded, then the metadata may not be valid.
>>>>>>>>>>
>>>>>>>>>> So my change 02 looks for a NULL nmethod._method field to 
>>>>>>>>>> tell whether we can post information about the nmethod.
>>>>>>>>>>
>>>>>>>>>> There's code in nmethod.cpp like:
>>>>>>>>>>
>>>>>>>>>> jmethodID nmethod::get_and_cache_jmethod_id() {
>>>>>>>>>>   if (_jmethod_id == NULL) {
>>>>>>>>>>     // Cache the jmethod_id since it can no longer be looked 
>>>>>>>>>> up once the
>>>>>>>>>>     // method itself has been marked for unloading.
>>>>>>>>>>     _jmethod_id = method()->jmethod_id();
>>>>>>>>>>   }
>>>>>>>>>>   return _jmethod_id;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Which was added when post_method_load and unload were turned 
>>>>>>>>>> into deferred events.
>>>>>>>>>
>>>>>>>>> Could we cache the jmethodID in the 
>>>>>>>>> JvmtiDeferredEvent::compiled_method_load_event
>>>>>>>>> similarly as we do in the 
>>>>>>>>> JvmtiDeferredEvent::compiled_method_unload_event?
>>>>>>>>> This would help to get rid of the dependency on the 
>>>>>>>>> nmethod::_method.
>>>>>>>>> Do we depend on any other nmethod fields?
>>>>>>>>
>>>>>>>> Yes, there are other nmethod metadata that we rely on to print 
>>>>>>>> inline information, and this function 
>>>>>>>> JvmtiCodeBlobEvents::build_jvmti_addr_location_map because it 
>>>>>>>> uses the ScopeDesc data in the nmethod.
>>>>>>>
>>>>>>> One possible approach is to prepare and cache all this information
>>>>>>> in the nmethod::post_compiled_method_load_event() before the
>>>>>>> JvmtiDeferredEvent::compiled_method_load_event() is called.
>>>>>>> The event parameters are:
>>>>>>> typedef struct {
>>>>>>>      const void* start_address;
>>>>>>>      jlocation location;
>>>>>>> } jvmtiAddrLocationMap;
>>>>>>> CompiledMethodLoad(jvmtiEnv *jvmti_env,
>>>>>>>              jmethodID method,
>>>>>>>              jint code_size,
>>>>>>>              const void* code_addr,
>>>>>>>              jint map_length,
>>>>>>>              const jvmtiAddrLocationMap* map,
>>>>>>>              const void* compile_info)
>>>>>>> Some of these addresses above could be not accessible when an 
>>>>>>> event is posted.
>>>>>>> Not sure yet if it is Okay.
>>>>>>> The question is if this kind of refactoring is worth and right 
>>>>>>> thing to do.
>>>>>>>
>>>>>>>>
>>>>>>>> We do cache the jmethodID but that's not good enough.  See my 
>>>>>>>> last comment in the bug report.  The jmethodID can point to an 
>>>>>>>> unloaded method.
>>>>>>>
>>>>>>> This looks like it is done a little bit late.
>>>>>>> It'd better to do it before the event is deferred (see above).
>>>>>>>
>>>>>>>> I tried a version of keeping the nmethod alive, but the GC 
>>>>>>>> folks will hate it.  And it doesn't work and I hate it.
>>>>>>>
>>>>>>> From serviceability point of view this is the best and most 
>>>>>>> consistent approach.
>>>>>>> I seems to me, it was initially designed this way.
>>>>>>> The downside is it adds some extra complexity to the GC.
>>>>>>>
>>>>>>>> My version 01 is the best, with the caveat that maybe it should 
>>>>>>>> check for _method == NULL instead of nmethod->is_alive().  I 
>>>>>>>> have to talk to Erik to see if there's a race with concurrent 
>>>>>>>> class unloading.
>>>>>>>>
>>>>>>>> Any application that depends on a compiled method loading event 
>>>>>>>> on a class that could be unloaded is a buggy application.  
>>>>>>>> Applications should not rely on when the JIT compiler decides 
>>>>>>>> to compile a method! This happens to us for a stress test.  
>>>>>>>> Most applications will get most of their compiled method 
>>>>>>>> loading events as they normally do.
>>>>>>>
>>>>>>> It is not an application that relies on the compiled method 
>>>>>>> loading event.
>>>>>>> It is about profiling tools to be able to get correct 
>>>>>>> information about what is going on with compilations.
>>>>>>> My concern is that if we skip such compiled method load events 
>>>>>>> then profilers have no way
>>>>>>> to find out there many unneeded compilations that are thrown 
>>>>>>> away without any real use.
>>>>>>> Also, it is not clear what happens with the subsequent compiled 
>>>>>>> method unload events.
>>>>>>> Are they going to be skipped as well or they can appear and 
>>>>>>> confuse profilers?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Serguei
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Coleen
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Serguei
>>>>>>>>>
>>>>>>>>>> I put more debugging in the bug to show this crash was from 
>>>>>>>>>> an unloaded nmethod.
>>>>>>>>>>
>>>>>>>>>> Coleen
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/15/19 4:45 PM, serguei.spitsyn at oracle.com wrote:
>>>>>>>>>>> Hi Coleen,
>>>>>>>>>>>
>>>>>>>>>>> I have some questions.
>>>>>>>>>>>
>>>>>>>>>>> Both the compiler method load and unload are posted as 
>>>>>>>>>>> deferred events.
>>>>>>>>>>> Both events keep the nmethod alive until the ServiceThread 
>>>>>>>>>>> processes the event.
>>>>>>>>>>>
>>>>>>>>>>> The implementation is:
>>>>>>>>>>>
>>>>>>>>>>> JvmtiDeferredEvent 
>>>>>>>>>>> JvmtiDeferredEvent::compiled_method_load_event(nmethod* nm) {
>>>>>>>>>>>   . . .
>>>>>>>>>>>   // Keep the nmethod alive until the ServiceThread can process
>>>>>>>>>>>   // this deferred event.
>>>>>>>>>>>   nmethodLocker::lock_nmethod(nm);
>>>>>>>>>>>   return event;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> JvmtiDeferredEvent 
>>>>>>>>>>> JvmtiDeferredEvent::compiled_method_unload_event(nmethod* 
>>>>>>>>>>> nm, jmethodID id, const void* code) {
>>>>>>>>>>>   . . .
>>>>>>>>>>>   // Keep the nmethod alive until the ServiceThread can process
>>>>>>>>>>>   // this deferred event. This will keep the memory for the
>>>>>>>>>>>   // generated code from being reused too early. We pass
>>>>>>>>>>>   // zombie_ok == true here so that our nmethod that was just
>>>>>>>>>>>   // made into a zombie can be locked.
>>>>>>>>>>>   nmethodLocker::lock_nmethod(nm, true /* zombie_ok */);
>>>>>>>>>>>   return event;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> void JvmtiDeferredEvent::post() {
>>>>>>>>>>> assert(ServiceThread::is_service_thread(Thread::current()),
>>>>>>>>>>>          "Service thread must post enqueued events");
>>>>>>>>>>>   switch(_type) {
>>>>>>>>>>>     case TYPE_COMPILED_METHOD_LOAD: {
>>>>>>>>>>>       nmethod* nm = _event_data.compiled_method_load;
>>>>>>>>>>> JvmtiExport::post_compiled_method_load(nm);
>>>>>>>>>>>       // done with the deferred event so unlock the nmethod
>>>>>>>>>>>       nmethodLocker::unlock_nmethod(nm);
>>>>>>>>>>>       break;
>>>>>>>>>>>     }
>>>>>>>>>>>     case TYPE_COMPILED_METHOD_UNLOAD: {
>>>>>>>>>>>       nmethod* nm = _event_data.compiled_method_unload.nm;
>>>>>>>>>>> JvmtiExport::post_compiled_method_unload(
>>>>>>>>>>> _event_data.compiled_method_unload.method_id,
>>>>>>>>>>> _event_data.compiled_method_unload.code_begin);
>>>>>>>>>>>       // done with the deferred event so unlock the nmethod
>>>>>>>>>>>       nmethodLocker::unlock_nmethod(nm);
>>>>>>>>>>>       break;
>>>>>>>>>>>     }
>>>>>>>>>>>     . . .
>>>>>>>>>>>   }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Then I wonder how is it possible for the nmethod to be not 
>>>>>>>>>>> alive here?:
>>>>>>>>>>> 2168 void JvmtiExport::post_compiled_method_load(nmethod *nm) {
>>>>>>>>>>> . . .
>>>>>>>>>>> 2173 // It's not safe to look at metadata for unloaded methods.
>>>>>>>>>>> 2174 if (!nm->is_alive()) {
>>>>>>>>>>> 2175 return;
>>>>>>>>>>> 2176 }
>>>>>>>>>>> At least, it lokks like something else is broken.
>>>>>>>>>>> Do I miss something important here?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Serguei
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11/14/19 5:15 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>>>>>> Summary: Don't post information which uses metadata from 
>>>>>>>>>>>> unloaded nmethods
>>>>>>>>>>>>
>>>>>>>>>>>> Tested tier1-3 and 100 times with test that failed 
>>>>>>>>>>>> (reproduced failure without the fix).
>>>>>>>>>>>>
>>>>>>>>>>>> open webrev at 
>>>>>>>>>>>> http://cr.openjdk.java.net/~coleenp/2019/8173361.01/webrev
>>>>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8173361
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Coleen
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20191125/5ac37905/attachment-0001.html>