C1 code installation and JNIHandle::deleted_handle() oop

Tue Nov 14 17:26:35 UTC 2017

Am 14.11.2017 um 18:00 schrieb Roman Kennke:
> Am 14.11.2017 um 17:04 schrieb Vladimir Ivanov:
>> Thanks, now I see that as well: OopRecorder::find_index() can 
>> delegate to ObjectLookup::find_index() which does resolve the handle 
>> w/o transitioning to VM.
>>
>> But I don't believe you hit that path: ObjectLookup was added as part 
>> of JVMCI and is guarded by a flag (deduplicate) which is turned on 
>> only for JVMCI.
> Ah ok. Didn't know that.
>
> However, as Aleksey pointed out, we hit JNIHandles::resolve() in 
> product path, JVMCI or not, and this touches the naked oop by 
> comparing it with another oop. This doesn't sound like a reliable 
> thing to do.
Scratch that. Looking at the code paths again, this doesn't seem to be 
true. I.e. we hit JNIHandles::resolve() only in assert and ObjectLookup 
(I trust you that it's only JVMCI). Not sure if it can be robust to 
compare oops in assert paths. It sure is a race and doesn't feel very well.

I wonder if we should introduce a CollectedHeap::is_in_or_null(jobject) 
method, and let the GC figure it out. It might actually have a way to 
check it (simple address range check) without sending the thread to VM 
state.

Roman

>
> This simple change seems to fix it:
> https://paste.fedoraproject.org/paste/poQ5caTCuN6jHSGbK1n0iQ
>
> Doing more testing...
>
> Roman
>
>> Anyway, I'll file a bug to investigate ObjectLookup::find_index().
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 11/14/17 6:45 PM, Roman Kennke wrote:
>>> The code below the assert also unwraps the oop and does lookups with 
>>> it. I'm not on my computer but I can dig out the relevant parts when 
>>> I'm back at work...
>>>
>>> Roman
>>>
>>>
>>> Am 14. November 2017 16:36:10 MEZ schrieb Vladimir Ivanov 
>>> <vladimir.x.ivanov at oracle.com>:
>>>
>>>     Aleksey,
>>>
>>>     I agree with your & Roman analysis: compilers shouldn't touch 
>>> naked oops
>>>     unless the thread is in _thread_in_vm mode.
>>>
>>>     Looking at the crash log, the problematic code is under assert:
>>>
>>>     void ConstantOopWriteValue::write_on(DebugInfoWriteStream* 
>>> stream) {
>>>         assert(JNIHandles::resolve(value()) == NULL ||
>>> Universe::heap()->is_in_reserved(JNIHandles::resolve(value())),
>>>                "Should be in heap");
>>>         stream->write_int(CONSTANT_OOP_CODE);
>>>         stream->write_handle(value());
>>>     }
>>>
>>>     So, the proper fix would be to make the verification code more 
>>> robust.
>>>
>>>     Best regards,
>>>     Vladimir Ivanov
>>>
>>>     On 11/14/17 5:16 PM, Aleksey Shipilev wrote:
>>>
>>>         Hi,
>>>
>>>         In some of our aggressive test configurations for 
>>> Shenandoah, we
>>>         sometimes see the following failure:
>>> http://cr.openjdk.java.net/~shade/shenandoah/c1-race-fail-hs_err.log
>>>
>>>         It seems to happen when C1 code installation is happening 
>>> during
>>>         Full GC.
>>>
>>>         The actual failure is caused by touching the
>>>         JNIHandles::deleted_handle() oop in
>>>         JNIHandles::guard_value() during JNIHandles::resolve() against
>>>         the constant oop handle when we are
>>>         recording the debugging information for C1-generated Java call:
>>> http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220 
>>>
>>>
>>>         The C1 thread is in _thread_in_native state, and so the runtime
>>>         thinks the thread is at safepoint,
>>>         but the thread touches the deleted_handle oop(). When 
>>> Shenandoah
>>>         dives into Full GC and moves that
>>>         object at the same time, everything crashes and burns.
>>>
>>>         Is C1 (and any other compiler thread) supposed to transit to
>>>         _vm_state when touching the naked oops,
>>>         and thus coordinate with safepoints? I see VM_ENTRY_MARK all
>>>         over ci* that seems to transit there
>>>         before accessing the heap. Does that mean we need the same
>>>         everywhere around JNIHandles::resolve too?
>>>
>>>         Or is there some other mechanism that is supposed to get
>>>         compiler threads to coordinate with GC?
>>>
>>>         Thanks,
>>>         -Aleksey
>>>
>>>
>>> -- 
>>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
>
>