RFR (S) 8173361: various crashes in JvmtiExport::post_compiled_method_load

Daniel D. Daugherty daniel.daugherty at oracle.com
Fri Nov 22 17:49:24 UTC 2019


Hi Coleen,

Sorry for the delay in getting back to this re-review.


On 11/21/19 9:12 AM, coleen.phillimore at oracle.com wrote:
>
> Please review a new version of this change that keeps the nmethod from 
> being unloaded, after it is added to the deferred event queue:
>
> http://cr.openjdk.java.net/~coleenp/2019/8173361.03/webrev/index.html

src/hotspot/share/code/nmethod.cpp
     No comments.

src/hotspot/share/oops/instanceKlass.cpp
     No comments.

src/hotspot/share/prims/jvmtiExport.cpp
     No comments.

src/hotspot/share/prims/jvmtiImpl.cpp
     Nice solution with the new oops_do() and nmethods_do() functions!

     old L988: void JvmtiDeferredEventQueue::enqueue(const 
JvmtiDeferredEvent& event) {
     new L998: void JvmtiDeferredEventQueue::enqueue(JvmtiDeferredEvent 
event) {
         Not sure why this was changed.

         Update: Looks like Serguei raised the issue and Coleen has already
         resolved it.

src/hotspot/share/prims/jvmtiImpl.hpp
     old L494:     QueueNode(const JvmtiDeferredEvent& event)
     new L498:     QueueNode(JvmtiDeferredEvent& event)
         Why was this changed?

         Update: Not clear if this was covered by Coleen's reply to Serguei.

     old L497:     const JvmtiDeferredEvent& event() const { return 
_event; }
     new L501:     JvmtiDeferredEvent& event() { return _event; }
         Why was this changed?

         Update: Coleen's reply to Serguei explained this. Perhaps add:
                   // Not const because of oops_do() and nmethods_do().

     old L509:   static void enqueue(const JvmtiDeferredEvent& event) 
NOT_JVMTI_RETURN;
     new L513:   static void enqueue(JvmtiDeferredEvent event) 
NOT_JVMTI_RETURN;
         Why was this changed?

         Update: Looks like Serguei raised the issue and Coleen has already
         resolved it.

src/hotspot/share/runtime/mutexLocker.cpp
     This change is going to require some testing to make sure we don't
     have any new deadlock scenarios.

src/hotspot/share/runtime/serviceThread.cpp
     L50 - nit - why the extra blank line?

src/hotspot/share/runtime/serviceThread.hpp
     Thanks for cleaning up the static:

       ServiceThread::is_service_thread(Thread* thread)

     stuff. Having it be different than the other threads was
     a bit jarring.

src/hotspot/share/runtime/thread.hpp
     No comments.

Thumbs up. My only comments are nits so I don't need to see a
new webrev if you decide to fix them.

Dan

>
> Ran the test that failed 100 times without failure, tier1 on Oracle 
> supported platforms, and tier2-3 including jvmti and jdi tests locally.
>
> See bug for more details about the crash.
>
> https://bugs.openjdk.java.net/browse/JDK-8173361
>
> Thanks,
> Coleen
>
> On 11/18/19 10:09 PM, coleen.phillimore at oracle.com wrote:
>>
>> Hi Serguei,
>>
>> Sorry for not sending an update.  I talked to Erik and am working on 
>> a version that keeps the nmethod from being unloaded while it's in 
>> the deferred event queue, with a version that the GC people will 
>> like, and I like.  I'm testing it out now.
>>
>> Thanks!
>> Coleen
>>
>>
>> On 11/18/19 10:03 PM, serguei.spitsyn at oracle.com wrote:
>>> Hi Coleen,
>>>
>>> Sorry for the latency, I had to investigate it a little bit.
>>> I still have some doubt your fix is right thing to do.
>>>
>>>
>>> On 11/16/19 04:55, coleen.phillimore at oracle.com wrote:
>>>>
>>>>
>>>> On 11/15/19 11:17 PM, serguei.spitsyn at oracle.com wrote:
>>>>> Hi Coleen,
>>>>>
>>>>> On 11/15/19 2:12 PM, coleen.phillimore at oracle.com wrote:
>>>>>>
>>>>>> Hi, I've been working on answers to these questions, so I'll 
>>>>>> start with this one.
>>>>>>
>>>>>> The nmethodLocker keeps the nmethod from being reclaimed 
>>>>>> (made_zombie or memory released) by the sweeper, but the nmethod 
>>>>>> could be unloaded.  Unloading the nmethod clears the Method* 
>>>>>> _method field.
>>>>>
>>>>> Yes, I see it is done in the nmethod::make_unloaded().
>>>>>
>>>>>> The post_compiled_method_load event needs the _method field to 
>>>>>> look at things like inlining and ScopeDesc fields.   If the 
>>>>>> nmethod is unloaded, some of the oops are dead.  There are 
>>>>>> "holder" oops that correspond to the metadata in the nmethod.  If 
>>>>>> these oops are dead, causing the nmethod to get unloaded, then 
>>>>>> the metadata may not be valid.
>>>>>>
>>>>>> So my change 02 looks for a NULL nmethod._method field to tell 
>>>>>> whether we can post information about the nmethod.
>>>>>>
>>>>>> There's code in nmethod.cpp like:
>>>>>>
>>>>>> jmethodID nmethod::get_and_cache_jmethod_id() {
>>>>>>   if (_jmethod_id == NULL) {
>>>>>>     // Cache the jmethod_id since it can no longer be looked up 
>>>>>> once the
>>>>>>     // method itself has been marked for unloading.
>>>>>>     _jmethod_id = method()->jmethod_id();
>>>>>>   }
>>>>>>   return _jmethod_id;
>>>>>> }
>>>>>>
>>>>>> Which was added when post_method_load and unload were turned into 
>>>>>> deferred events.
>>>>>
>>>>> Could we cache the jmethodID in the 
>>>>> JvmtiDeferredEvent::compiled_method_load_event
>>>>> similarly as we do in the 
>>>>> JvmtiDeferredEvent::compiled_method_unload_event?
>>>>> This would help to get rid of the dependency on the nmethod::_method.
>>>>> Do we depend on any other nmethod fields?
>>>>
>>>> Yes, there are other nmethod metadata that we rely on to print 
>>>> inline information, and this function 
>>>> JvmtiCodeBlobEvents::build_jvmti_addr_location_map because it uses 
>>>> the ScopeDesc data in the nmethod.
>>>
>>> One possible approach is to prepare and cache all this information
>>> in the nmethod::post_compiled_method_load_event() before the
>>> JvmtiDeferredEvent::compiled_method_load_event() is called.
>>> The event parameters are:
>>> typedef struct {
>>>      const void* start_address;
>>>      jlocation location;
>>> } jvmtiAddrLocationMap;
>>> CompiledMethodLoad(jvmtiEnv *jvmti_env,
>>>              jmethodID method,
>>>              jint code_size,
>>>              const void* code_addr,
>>>              jint map_length,
>>>              const jvmtiAddrLocationMap* map,
>>>              const void* compile_info)
>>> Some of these addresses above could be not accessible when an event 
>>> is posted.
>>> Not sure yet if it is Okay.
>>> The question is if this kind of refactoring is worth and right thing 
>>> to do.
>>>
>>>>
>>>> We do cache the jmethodID but that's not good enough.  See my last 
>>>> comment in the bug report.  The jmethodID can point to an unloaded 
>>>> method.
>>>
>>> This looks like it is done a little bit late.
>>> It'd better to do it before the event is deferred (see above).
>>>
>>>> I tried a version of keeping the nmethod alive, but the GC folks 
>>>> will hate it.  And it doesn't work and I hate it.
>>>
>>> From serviceability point of view this is the best and most 
>>> consistent approach.
>>> I seems to me, it was initially designed this way.
>>> The downside is it adds some extra complexity to the GC.
>>>
>>>> My version 01 is the best, with the caveat that maybe it should 
>>>> check for _method == NULL instead of nmethod->is_alive().  I have 
>>>> to talk to Erik to see if there's a race with concurrent class 
>>>> unloading.
>>>>
>>>> Any application that depends on a compiled method loading event on 
>>>> a class that could be unloaded is a buggy application.  
>>>> Applications should not rely on when the JIT compiler decides to 
>>>> compile a method!  This happens to us for a stress test.  Most 
>>>> applications will get most of their compiled method loading events 
>>>> as they normally do.
>>>
>>> It is not an application that relies on the compiled method loading 
>>> event.
>>> It is about profiling tools to be able to get correct information 
>>> about what is going on with compilations.
>>> My concern is that if we skip such compiled method load events then 
>>> profilers have no way
>>> to find out there many unneeded compilations that are thrown away 
>>> without any real use.
>>> Also, it is not clear what happens with the subsequent compiled 
>>> method unload events.
>>> Are they going to be skipped as well or they can appear and confuse 
>>> profilers?
>>>
>>>
>>> Thanks,
>>> Serguei
>>>>
>>>> Thanks,
>>>> Coleen
>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Serguei
>>>>>
>>>>>> I put more debugging in the bug to show this crash was from an 
>>>>>> unloaded nmethod.
>>>>>>
>>>>>> Coleen
>>>>>>
>>>>>>
>>>>>> On 11/15/19 4:45 PM, serguei.spitsyn at oracle.com wrote:
>>>>>>> Hi Coleen,
>>>>>>>
>>>>>>> I have some questions.
>>>>>>>
>>>>>>> Both the compiler method load and unload are posted as deferred 
>>>>>>> events.
>>>>>>> Both events keep the nmethod alive until the ServiceThread 
>>>>>>> processes the event.
>>>>>>>
>>>>>>> The implementation is:
>>>>>>>
>>>>>>> JvmtiDeferredEvent 
>>>>>>> JvmtiDeferredEvent::compiled_method_load_event(nmethod* nm) {
>>>>>>>   . . .
>>>>>>>   // Keep the nmethod alive until the ServiceThread can process
>>>>>>>   // this deferred event.
>>>>>>>   nmethodLocker::lock_nmethod(nm);
>>>>>>>   return event;
>>>>>>> }
>>>>>>>
>>>>>>> JvmtiDeferredEvent 
>>>>>>> JvmtiDeferredEvent::compiled_method_unload_event(nmethod* nm, 
>>>>>>> jmethodID id, const void* code) {
>>>>>>>   . . .
>>>>>>>   // Keep the nmethod alive until the ServiceThread can process
>>>>>>>   // this deferred event. This will keep the memory for the
>>>>>>>   // generated code from being reused too early. We pass
>>>>>>>   // zombie_ok == true here so that our nmethod that was just
>>>>>>>   // made into a zombie can be locked.
>>>>>>>   nmethodLocker::lock_nmethod(nm, true /* zombie_ok */);
>>>>>>>   return event;
>>>>>>> }
>>>>>>>
>>>>>>> void JvmtiDeferredEvent::post() {
>>>>>>> assert(ServiceThread::is_service_thread(Thread::current()),
>>>>>>>          "Service thread must post enqueued events");
>>>>>>>   switch(_type) {
>>>>>>>     case TYPE_COMPILED_METHOD_LOAD: {
>>>>>>>       nmethod* nm = _event_data.compiled_method_load;
>>>>>>>       JvmtiExport::post_compiled_method_load(nm);
>>>>>>>       // done with the deferred event so unlock the nmethod
>>>>>>>       nmethodLocker::unlock_nmethod(nm);
>>>>>>>       break;
>>>>>>>     }
>>>>>>>     case TYPE_COMPILED_METHOD_UNLOAD: {
>>>>>>>       nmethod* nm = _event_data.compiled_method_unload.nm;
>>>>>>>       JvmtiExport::post_compiled_method_unload(
>>>>>>>         _event_data.compiled_method_unload.method_id,
>>>>>>> _event_data.compiled_method_unload.code_begin);
>>>>>>>       // done with the deferred event so unlock the nmethod
>>>>>>>       nmethodLocker::unlock_nmethod(nm);
>>>>>>>       break;
>>>>>>>     }
>>>>>>>     . . .
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> Then I wonder how is it possible for the nmethod to be not alive 
>>>>>>> here?:
>>>>>>> 2168 void JvmtiExport::post_compiled_method_load(nmethod *nm) {
>>>>>>> . . .
>>>>>>> 2173 // It's not safe to look at metadata for unloaded methods.
>>>>>>> 2174 if (!nm->is_alive()) {
>>>>>>> 2175 return;
>>>>>>> 2176 }
>>>>>>> At least, it lokks like something else is broken.
>>>>>>> Do I miss something important here?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Serguei
>>>>>>>
>>>>>>>
>>>>>>> On 11/14/19 5:15 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>> Summary: Don't post information which uses metadata from 
>>>>>>>> unloaded nmethods
>>>>>>>>
>>>>>>>> Tested tier1-3 and 100 times with test that failed (reproduced 
>>>>>>>> failure without the fix).
>>>>>>>>
>>>>>>>> open webrev at 
>>>>>>>> http://cr.openjdk.java.net/~coleenp/2019/8173361.01/webrev
>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8173361
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Coleen
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20191122/1162cd63/attachment-0001.html>


More information about the serviceability-dev mailing list