RFR (S) 8173361: various crashes in JvmtiExport::post_compiled_method_load
Daniel D. Daugherty
daniel.daugherty at oracle.com
Fri Nov 22 22:53:12 UTC 2019
> http://cr.openjdk.java.net/~coleenp/2019/8173361.04.incr/webrev
src/hotspot/share/prims/jvmtiImpl.cpp
No comments.
src/hotspot/share/prims/jvmtiImpl.hpp
No comments.
src/hotspot/share/runtime/serviceThread.cpp
No comments.
Thumbs up.
Dan
On 11/22/19 2:15 PM, coleen.phillimore at oracle.com wrote:
>
> Dan, Thank you for reviewing this!
>
> On 11/22/19 12:49 PM, Daniel D. Daugherty wrote:
>> Hi Coleen,
>>
>> Sorry for the delay in getting back to this re-review.
>>
>>
>> On 11/21/19 9:12 AM, coleen.phillimore at oracle.com wrote:
>>>
>>> Please review a new version of this change that keeps the nmethod
>>> from being unloaded, after it is added to the deferred event queue:
>>>
>>> http://cr.openjdk.java.net/~coleenp/2019/8173361.03/webrev/index.html
>>
>> src/hotspot/share/code/nmethod.cpp
>> No comments.
>>
>> src/hotspot/share/oops/instanceKlass.cpp
>> No comments.
>>
>> src/hotspot/share/prims/jvmtiExport.cpp
>> No comments.
>>
>> src/hotspot/share/prims/jvmtiImpl.cpp
>> Nice solution with the new oops_do() and nmethods_do() functions!
> Erik's insistance!
>>
>> old L988: void JvmtiDeferredEventQueue::enqueue(const
>> JvmtiDeferredEvent& event) {
>> new L998: void
>> JvmtiDeferredEventQueue::enqueue(JvmtiDeferredEvent event) {
>> Not sure why this was changed.
>>
>> Update: Looks like Serguei raised the issue and Coleen has
>> already
>> resolved it.
>
> Yes.
>>
>> src/hotspot/share/prims/jvmtiImpl.hpp
>> old L494: QueueNode(const JvmtiDeferredEvent& event)
>> new L498: QueueNode(JvmtiDeferredEvent& event)
>> Why was this changed?
>>
>> Update: Not clear if this was covered by Coleen's reply to
>> Serguei.
>>
>> old L497: const JvmtiDeferredEvent& event() const { return
>> _event; }
>> new L501: JvmtiDeferredEvent& event() { return _event; }
>> Why was this changed?
>>
>> Update: Coleen's reply to Serguei explained this. Perhaps add:
>> // Not const because of oops_do() and nmethods_do().
>>
>> old L509: static void enqueue(const JvmtiDeferredEvent& event)
>> NOT_JVMTI_RETURN;
>> new L513: static void enqueue(JvmtiDeferredEvent event)
>> NOT_JVMTI_RETURN;
>> Why was this changed?
>>
>> Update: Looks like Serguei raised the issue and Coleen has
>> already
>> resolved it.
>
> Yes, I fixed these.
>>
>> src/hotspot/share/runtime/mutexLocker.cpp
>> This change is going to require some testing to make sure we don't
>> have any new deadlock scenarios.
>
> Luckily, I've previously added an implicit NoSafepointVerifier to
> locks that are _allow_vm_block = true, like this one.
> + def(JmethodIdCreation_lock , PaddedMutex , leaf, true,
> _safepoint_check_never); // used for creating jmethodIDs.
> which prevents one class of deadlock. If we take out another lock with
> a higher rank, we'll get the ranking assert.
>
> This lock prevents insertion into an array, and has little outside calls.
>
> I'm running tests in tier 1-6 but any code that travels through this
> should get these assertion checks, rather than deadlocking.
>
>>
>> src/hotspot/share/runtime/serviceThread.cpp
>> L50 - nit - why the extra blank line?
>
> To separate static data member definitions from functions. I removed it.
>>
>> src/hotspot/share/runtime/serviceThread.hpp
>> Thanks for cleaning up the static:
>>
>> ServiceThread::is_service_thread(Thread* thread)
>>
>> stuff. Having it be different than the other threads was
>> a bit jarring.
>>
>> src/hotspot/share/runtime/thread.hpp
>> No comments.
>>
>> Thumbs up. My only comments are nits so I don't need to see a
>> new webrev if you decide to fix them.
>
> So it turns out that in stress testing my fix
> forhttps://bugs.openjdk.java.net/browse/JDK-8212160
>
> Because I was in the area and thought this was a duplicate of that bug
> (it is not). I found that calling oops_do and nmethods_do the
> ServiceThread needs to hold the Service_lock, because other threads
> can be adding things to the global queue while the sweeper thread is
> calling this in a handshake.
>
> I am now retesting this change with the changes above, and with the
> Service_lock. So far my stress tests for JDK-81212160 and the stress
> test for this bug pass, but I'm going to run through all the tiers 1-6
> over the weekend.
>
> Please have a look at the changes in the meantime.
>
> http://cr.openjdk.java.net/~coleenp/2019/8173361.04.incr/webrev
> http://cr.openjdk.java.net/~coleenp/2019/8173361.04/webrev
>
> Thanks,
> Coleen
>>
>> Dan
>>
>>>
>>> Ran the test that failed 100 times without failure, tier1 on Oracle
>>> supported platforms, and tier2-3 including jvmti and jdi tests locally.
>>>
>>> See bug for more details about the crash.
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8173361
>>>
>>> Thanks,
>>> Coleen
>>>
>>> On 11/18/19 10:09 PM, coleen.phillimore at oracle.com wrote:
>>>>
>>>> Hi Serguei,
>>>>
>>>> Sorry for not sending an update. I talked to Erik and am working
>>>> on a version that keeps the nmethod from being unloaded while it's
>>>> in the deferred event queue, with a version that the GC people will
>>>> like, and I like. I'm testing it out now.
>>>>
>>>> Thanks!
>>>> Coleen
>>>>
>>>>
>>>> On 11/18/19 10:03 PM, serguei.spitsyn at oracle.com wrote:
>>>>> Hi Coleen,
>>>>>
>>>>> Sorry for the latency, I had to investigate it a little bit.
>>>>> I still have some doubt your fix is right thing to do.
>>>>>
>>>>>
>>>>> On 11/16/19 04:55, coleen.phillimore at oracle.com wrote:
>>>>>>
>>>>>>
>>>>>> On 11/15/19 11:17 PM, serguei.spitsyn at oracle.com wrote:
>>>>>>> Hi Coleen,
>>>>>>>
>>>>>>> On 11/15/19 2:12 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>>
>>>>>>>> Hi, I've been working on answers to these questions, so I'll
>>>>>>>> start with this one.
>>>>>>>>
>>>>>>>> The nmethodLocker keeps the nmethod from being reclaimed
>>>>>>>> (made_zombie or memory released) by the sweeper, but the
>>>>>>>> nmethod could be unloaded. Unloading the nmethod clears the
>>>>>>>> Method* _method field.
>>>>>>>
>>>>>>> Yes, I see it is done in the nmethod::make_unloaded().
>>>>>>>
>>>>>>>> The post_compiled_method_load event needs the _method field to
>>>>>>>> look at things like inlining and ScopeDesc fields. If the
>>>>>>>> nmethod is unloaded, some of the oops are dead. There are
>>>>>>>> "holder" oops that correspond to the metadata in the nmethod.
>>>>>>>> If these oops are dead, causing the nmethod to get unloaded,
>>>>>>>> then the metadata may not be valid.
>>>>>>>>
>>>>>>>> So my change 02 looks for a NULL nmethod._method field to tell
>>>>>>>> whether we can post information about the nmethod.
>>>>>>>>
>>>>>>>> There's code in nmethod.cpp like:
>>>>>>>>
>>>>>>>> jmethodID nmethod::get_and_cache_jmethod_id() {
>>>>>>>> if (_jmethod_id == NULL) {
>>>>>>>> // Cache the jmethod_id since it can no longer be looked up
>>>>>>>> once the
>>>>>>>> // method itself has been marked for unloading.
>>>>>>>> _jmethod_id = method()->jmethod_id();
>>>>>>>> }
>>>>>>>> return _jmethod_id;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Which was added when post_method_load and unload were turned
>>>>>>>> into deferred events.
>>>>>>>
>>>>>>> Could we cache the jmethodID in the
>>>>>>> JvmtiDeferredEvent::compiled_method_load_event
>>>>>>> similarly as we do in the
>>>>>>> JvmtiDeferredEvent::compiled_method_unload_event?
>>>>>>> This would help to get rid of the dependency on the
>>>>>>> nmethod::_method.
>>>>>>> Do we depend on any other nmethod fields?
>>>>>>
>>>>>> Yes, there are other nmethod metadata that we rely on to print
>>>>>> inline information, and this function
>>>>>> JvmtiCodeBlobEvents::build_jvmti_addr_location_map because it
>>>>>> uses the ScopeDesc data in the nmethod.
>>>>>
>>>>> One possible approach is to prepare and cache all this information
>>>>> in the nmethod::post_compiled_method_load_event() before the
>>>>> JvmtiDeferredEvent::compiled_method_load_event() is called.
>>>>> The event parameters are:
>>>>> typedef struct {
>>>>> const void* start_address;
>>>>> jlocation location;
>>>>> } jvmtiAddrLocationMap;
>>>>> CompiledMethodLoad(jvmtiEnv *jvmti_env,
>>>>> jmethodID method,
>>>>> jint code_size,
>>>>> const void* code_addr,
>>>>> jint map_length,
>>>>> const jvmtiAddrLocationMap* map,
>>>>> const void* compile_info)
>>>>> Some of these addresses above could be not accessible when an
>>>>> event is posted.
>>>>> Not sure yet if it is Okay.
>>>>> The question is if this kind of refactoring is worth and right
>>>>> thing to do.
>>>>>
>>>>>>
>>>>>> We do cache the jmethodID but that's not good enough. See my last
>>>>>> comment in the bug report. The jmethodID can point to an
>>>>>> unloaded method.
>>>>>
>>>>> This looks like it is done a little bit late.
>>>>> It'd better to do it before the event is deferred (see above).
>>>>>
>>>>>> I tried a version of keeping the nmethod alive, but the GC folks
>>>>>> will hate it. And it doesn't work and I hate it.
>>>>>
>>>>> From serviceability point of view this is the best and most
>>>>> consistent approach.
>>>>> I seems to me, it was initially designed this way.
>>>>> The downside is it adds some extra complexity to the GC.
>>>>>
>>>>>> My version 01 is the best, with the caveat that maybe it should
>>>>>> check for _method == NULL instead of nmethod->is_alive(). I have
>>>>>> to talk to Erik to see if there's a race with concurrent class
>>>>>> unloading.
>>>>>>
>>>>>> Any application that depends on a compiled method loading event
>>>>>> on a class that could be unloaded is a buggy application.
>>>>>> Applications should not rely on when the JIT compiler decides to
>>>>>> compile a method! This happens to us for a stress test. Most
>>>>>> applications will get most of their compiled method loading
>>>>>> events as they normally do.
>>>>>
>>>>> It is not an application that relies on the compiled method
>>>>> loading event.
>>>>> It is about profiling tools to be able to get correct information
>>>>> about what is going on with compilations.
>>>>> My concern is that if we skip such compiled method load events
>>>>> then profilers have no way
>>>>> to find out there many unneeded compilations that are thrown away
>>>>> without any real use.
>>>>> Also, it is not clear what happens with the subsequent compiled
>>>>> method unload events.
>>>>> Are they going to be skipped as well or they can appear and
>>>>> confuse profilers?
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Serguei
>>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Serguei
>>>>>>>
>>>>>>>> I put more debugging in the bug to show this crash was from an
>>>>>>>> unloaded nmethod.
>>>>>>>>
>>>>>>>> Coleen
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/15/19 4:45 PM, serguei.spitsyn at oracle.com wrote:
>>>>>>>>> Hi Coleen,
>>>>>>>>>
>>>>>>>>> I have some questions.
>>>>>>>>>
>>>>>>>>> Both the compiler method load and unload are posted as
>>>>>>>>> deferred events.
>>>>>>>>> Both events keep the nmethod alive until the ServiceThread
>>>>>>>>> processes the event.
>>>>>>>>>
>>>>>>>>> The implementation is:
>>>>>>>>>
>>>>>>>>> JvmtiDeferredEvent
>>>>>>>>> JvmtiDeferredEvent::compiled_method_load_event(nmethod* nm) {
>>>>>>>>> . . .
>>>>>>>>> // Keep the nmethod alive until the ServiceThread can process
>>>>>>>>> // this deferred event.
>>>>>>>>> nmethodLocker::lock_nmethod(nm);
>>>>>>>>> return event;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> JvmtiDeferredEvent
>>>>>>>>> JvmtiDeferredEvent::compiled_method_unload_event(nmethod* nm,
>>>>>>>>> jmethodID id, const void* code) {
>>>>>>>>> . . .
>>>>>>>>> // Keep the nmethod alive until the ServiceThread can process
>>>>>>>>> // this deferred event. This will keep the memory for the
>>>>>>>>> // generated code from being reused too early. We pass
>>>>>>>>> // zombie_ok == true here so that our nmethod that was just
>>>>>>>>> // made into a zombie can be locked.
>>>>>>>>> nmethodLocker::lock_nmethod(nm, true /* zombie_ok */);
>>>>>>>>> return event;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> void JvmtiDeferredEvent::post() {
>>>>>>>>> assert(ServiceThread::is_service_thread(Thread::current()),
>>>>>>>>> "Service thread must post enqueued events");
>>>>>>>>> switch(_type) {
>>>>>>>>> case TYPE_COMPILED_METHOD_LOAD: {
>>>>>>>>> nmethod* nm = _event_data.compiled_method_load;
>>>>>>>>> JvmtiExport::post_compiled_method_load(nm);
>>>>>>>>> // done with the deferred event so unlock the nmethod
>>>>>>>>> nmethodLocker::unlock_nmethod(nm);
>>>>>>>>> break;
>>>>>>>>> }
>>>>>>>>> case TYPE_COMPILED_METHOD_UNLOAD: {
>>>>>>>>> nmethod* nm = _event_data.compiled_method_unload.nm;
>>>>>>>>> JvmtiExport::post_compiled_method_unload(
>>>>>>>>> _event_data.compiled_method_unload.method_id,
>>>>>>>>> _event_data.compiled_method_unload.code_begin);
>>>>>>>>> // done with the deferred event so unlock the nmethod
>>>>>>>>> nmethodLocker::unlock_nmethod(nm);
>>>>>>>>> break;
>>>>>>>>> }
>>>>>>>>> . . .
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Then I wonder how is it possible for the nmethod to be not
>>>>>>>>> alive here?:
>>>>>>>>> 2168 void JvmtiExport::post_compiled_method_load(nmethod *nm) {
>>>>>>>>> . . .
>>>>>>>>> 2173 // It's not safe to look at metadata for unloaded methods.
>>>>>>>>> 2174 if (!nm->is_alive()) {
>>>>>>>>> 2175 return;
>>>>>>>>> 2176 }
>>>>>>>>> At least, it lokks like something else is broken.
>>>>>>>>> Do I miss something important here?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Serguei
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/14/19 5:15 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>>>> Summary: Don't post information which uses metadata from
>>>>>>>>>> unloaded nmethods
>>>>>>>>>>
>>>>>>>>>> Tested tier1-3 and 100 times with test that failed
>>>>>>>>>> (reproduced failure without the fix).
>>>>>>>>>>
>>>>>>>>>> open webrev at
>>>>>>>>>> http://cr.openjdk.java.net/~coleenp/2019/8173361.01/webrev
>>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8173361
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Coleen
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20191122/0b994815/attachment-0001.html>
More information about the serviceability-dev
mailing list