RFR (S) 8173361: various crashes in JvmtiExport::post_compiled_method_load
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Thu Nov 21 18:22:53 UTC 2019
On 11/21/19 10:32 AM, erik.osterlund at oracle.com wrote:
> Hi Coleen,
>
> Thanks for removing the nmethodLocker. I'm on a mission to remove all
> nmethod lockers, and this one is really nasty.
Thanks Erik, and thank you for the help. I'm trying to help you get rid
of nmethodLockers but there's another nasty one.
Coleen
> Looks good.
>
> Thanks,
> /Erik
>
> On 11/21/19 3:12 PM, coleen.phillimore at oracle.com wrote:
>>
>> Please review a new version of this change that keeps the nmethod
>> from being unloaded, after it is added to the deferred event queue:
>>
>> http://cr.openjdk.java.net/~coleenp/2019/8173361.03/webrev/index.html
>>
>> Ran the test that failed 100 times without failure, tier1 on Oracle
>> supported platforms, and tier2-3 including jvmti and jdi tests locally.
>>
>> See bug for more details about the crash.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8173361
>>
>> Thanks,
>> Coleen
>>
>> On 11/18/19 10:09 PM, coleen.phillimore at oracle.com wrote:
>>>
>>> Hi Serguei,
>>>
>>> Sorry for not sending an update. I talked to Erik and am working on
>>> a version that keeps the nmethod from being unloaded while it's in
>>> the deferred event queue, with a version that the GC people will
>>> like, and I like. I'm testing it out now.
>>>
>>> Thanks!
>>> Coleen
>>>
>>>
>>> On 11/18/19 10:03 PM, serguei.spitsyn at oracle.com wrote:
>>>> Hi Coleen,
>>>>
>>>> Sorry for the latency, I had to investigate it a little bit.
>>>> I still have some doubt your fix is right thing to do.
>>>>
>>>>
>>>> On 11/16/19 04:55, coleen.phillimore at oracle.com wrote:
>>>>>
>>>>>
>>>>> On 11/15/19 11:17 PM, serguei.spitsyn at oracle.com wrote:
>>>>>> Hi Coleen,
>>>>>>
>>>>>> On 11/15/19 2:12 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>
>>>>>>> Hi, I've been working on answers to these questions, so I'll
>>>>>>> start with this one.
>>>>>>>
>>>>>>> The nmethodLocker keeps the nmethod from being reclaimed
>>>>>>> (made_zombie or memory released) by the sweeper, but the nmethod
>>>>>>> could be unloaded. Unloading the nmethod clears the Method*
>>>>>>> _method field.
>>>>>>
>>>>>> Yes, I see it is done in the nmethod::make_unloaded().
>>>>>>
>>>>>>> The post_compiled_method_load event needs the _method field to
>>>>>>> look at things like inlining and ScopeDesc fields. If the
>>>>>>> nmethod is unloaded, some of the oops are dead. There are
>>>>>>> "holder" oops that correspond to the metadata in the nmethod.
>>>>>>> If these oops are dead, causing the nmethod to get unloaded,
>>>>>>> then the metadata may not be valid.
>>>>>>>
>>>>>>> So my change 02 looks for a NULL nmethod._method field to tell
>>>>>>> whether we can post information about the nmethod.
>>>>>>>
>>>>>>> There's code in nmethod.cpp like:
>>>>>>>
>>>>>>> jmethodID nmethod::get_and_cache_jmethod_id() {
>>>>>>> if (_jmethod_id == NULL) {
>>>>>>> // Cache the jmethod_id since it can no longer be looked up
>>>>>>> once the
>>>>>>> // method itself has been marked for unloading.
>>>>>>> _jmethod_id = method()->jmethod_id();
>>>>>>> }
>>>>>>> return _jmethod_id;
>>>>>>> }
>>>>>>>
>>>>>>> Which was added when post_method_load and unload were turned
>>>>>>> into deferred events.
>>>>>>
>>>>>> Could we cache the jmethodID in the
>>>>>> JvmtiDeferredEvent::compiled_method_load_event
>>>>>> similarly as we do in the
>>>>>> JvmtiDeferredEvent::compiled_method_unload_event?
>>>>>> This would help to get rid of the dependency on the nmethod::_method.
>>>>>> Do we depend on any other nmethod fields?
>>>>>
>>>>> Yes, there are other nmethod metadata that we rely on to print
>>>>> inline information, and this function
>>>>> JvmtiCodeBlobEvents::build_jvmti_addr_location_map because it uses
>>>>> the ScopeDesc data in the nmethod.
>>>>
>>>> One possible approach is to prepare and cache all this information
>>>> in the nmethod::post_compiled_method_load_event() before the
>>>> JvmtiDeferredEvent::compiled_method_load_event() is called.
>>>> The event parameters are:
>>>> typedef struct {
>>>> const void* start_address;
>>>> jlocation location;
>>>> } jvmtiAddrLocationMap;
>>>> CompiledMethodLoad(jvmtiEnv *jvmti_env,
>>>> jmethodID method,
>>>> jint code_size,
>>>> const void* code_addr,
>>>> jint map_length,
>>>> const jvmtiAddrLocationMap* map,
>>>> const void* compile_info)
>>>> Some of these addresses above could be not accessible when an event
>>>> is posted.
>>>> Not sure yet if it is Okay.
>>>> The question is if this kind of refactoring is worth and right
>>>> thing to do.
>>>>
>>>>>
>>>>> We do cache the jmethodID but that's not good enough. See my last
>>>>> comment in the bug report. The jmethodID can point to an unloaded
>>>>> method.
>>>>
>>>> This looks like it is done a little bit late.
>>>> It'd better to do it before the event is deferred (see above).
>>>>
>>>>> I tried a version of keeping the nmethod alive, but the GC folks
>>>>> will hate it. And it doesn't work and I hate it.
>>>>
>>>> From serviceability point of view this is the best and most
>>>> consistent approach.
>>>> I seems to me, it was initially designed this way.
>>>> The downside is it adds some extra complexity to the GC.
>>>>
>>>>> My version 01 is the best, with the caveat that maybe it should
>>>>> check for _method == NULL instead of nmethod->is_alive(). I have
>>>>> to talk to Erik to see if there's a race with concurrent class
>>>>> unloading.
>>>>>
>>>>> Any application that depends on a compiled method loading event on
>>>>> a class that could be unloaded is a buggy application.
>>>>> Applications should not rely on when the JIT compiler decides to
>>>>> compile a method! This happens to us for a stress test. Most
>>>>> applications will get most of their compiled method loading events
>>>>> as they normally do.
>>>>
>>>> It is not an application that relies on the compiled method loading
>>>> event.
>>>> It is about profiling tools to be able to get correct information
>>>> about what is going on with compilations.
>>>> My concern is that if we skip such compiled method load events then
>>>> profilers have no way
>>>> to find out there many unneeded compilations that are thrown away
>>>> without any real use.
>>>> Also, it is not clear what happens with the subsequent compiled
>>>> method unload events.
>>>> Are they going to be skipped as well or they can appear and confuse
>>>> profilers?
>>>>
>>>>
>>>> Thanks,
>>>> Serguei
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Serguei
>>>>>>
>>>>>>> I put more debugging in the bug to show this crash was from an
>>>>>>> unloaded nmethod.
>>>>>>>
>>>>>>> Coleen
>>>>>>>
>>>>>>>
>>>>>>> On 11/15/19 4:45 PM, serguei.spitsyn at oracle.com wrote:
>>>>>>>> Hi Coleen,
>>>>>>>>
>>>>>>>> I have some questions.
>>>>>>>>
>>>>>>>> Both the compiler method load and unload are posted as deferred
>>>>>>>> events.
>>>>>>>> Both events keep the nmethod alive until the ServiceThread
>>>>>>>> processes the event.
>>>>>>>>
>>>>>>>> The implementation is:
>>>>>>>>
>>>>>>>> JvmtiDeferredEvent
>>>>>>>> JvmtiDeferredEvent::compiled_method_load_event(nmethod* nm) {
>>>>>>>> . . .
>>>>>>>> // Keep the nmethod alive until the ServiceThread can process
>>>>>>>> // this deferred event.
>>>>>>>> nmethodLocker::lock_nmethod(nm);
>>>>>>>> return event;
>>>>>>>> }
>>>>>>>>
>>>>>>>> JvmtiDeferredEvent
>>>>>>>> JvmtiDeferredEvent::compiled_method_unload_event(nmethod* nm,
>>>>>>>> jmethodID id, const void* code) {
>>>>>>>> . . .
>>>>>>>> // Keep the nmethod alive until the ServiceThread can process
>>>>>>>> // this deferred event. This will keep the memory for the
>>>>>>>> // generated code from being reused too early. We pass
>>>>>>>> // zombie_ok == true here so that our nmethod that was just
>>>>>>>> // made into a zombie can be locked.
>>>>>>>> nmethodLocker::lock_nmethod(nm, true /* zombie_ok */);
>>>>>>>> return event;
>>>>>>>> }
>>>>>>>>
>>>>>>>> void JvmtiDeferredEvent::post() {
>>>>>>>> assert(ServiceThread::is_service_thread(Thread::current()),
>>>>>>>> "Service thread must post enqueued events");
>>>>>>>> switch(_type) {
>>>>>>>> case TYPE_COMPILED_METHOD_LOAD: {
>>>>>>>> nmethod* nm = _event_data.compiled_method_load;
>>>>>>>> JvmtiExport::post_compiled_method_load(nm);
>>>>>>>> // done with the deferred event so unlock the nmethod
>>>>>>>> nmethodLocker::unlock_nmethod(nm);
>>>>>>>> break;
>>>>>>>> }
>>>>>>>> case TYPE_COMPILED_METHOD_UNLOAD: {
>>>>>>>> nmethod* nm = _event_data.compiled_method_unload.nm;
>>>>>>>> JvmtiExport::post_compiled_method_unload(
>>>>>>>> _event_data.compiled_method_unload.method_id,
>>>>>>>> _event_data.compiled_method_unload.code_begin);
>>>>>>>> // done with the deferred event so unlock the nmethod
>>>>>>>> nmethodLocker::unlock_nmethod(nm);
>>>>>>>> break;
>>>>>>>> }
>>>>>>>> . . .
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>> Then I wonder how is it possible for the nmethod to be not
>>>>>>>> alive here?:
>>>>>>>> 2168 void JvmtiExport::post_compiled_method_load(nmethod *nm) {
>>>>>>>> . . .
>>>>>>>> 2173 // It's not safe to look at metadata for unloaded methods.
>>>>>>>> 2174 if (!nm->is_alive()) {
>>>>>>>> 2175 return;
>>>>>>>> 2176 }
>>>>>>>> At least, it lokks like something else is broken.
>>>>>>>> Do I miss something important here?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Serguei
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/14/19 5:15 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>>> Summary: Don't post information which uses metadata from
>>>>>>>>> unloaded nmethods
>>>>>>>>>
>>>>>>>>> Tested tier1-3 and 100 times with test that failed (reproduced
>>>>>>>>> failure without the fix).
>>>>>>>>>
>>>>>>>>> open webrev at
>>>>>>>>> http://cr.openjdk.java.net/~coleenp/2019/8173361.01/webrev
>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8173361
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Coleen
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20191121/b89cdc33/attachment-0001.html>
More information about the serviceability-dev
mailing list