C1 code installation and JNIHandle::deleted_handle() oop

Tue Nov 14 17:21:46 UTC 2017

On 11/14/17 8:00 PM, Roman Kennke wrote:
> Am 14.11.2017 um 17:04 schrieb Vladimir Ivanov:
>> Thanks, now I see that as well: OopRecorder::find_index() can delegate 
>> to ObjectLookup::find_index() which does resolve the handle w/o 
>> transitioning to VM.
>>
>> But I don't believe you hit that path: ObjectLookup was added as part 
>> of JVMCI and is guarded by a flag (deduplicate) which is turned on 
>> only for JVMCI.
> Ah ok. Didn't know that.
> 
> However, as Aleksey pointed out, we hit JNIHandles::resolve() in product 
> path, JVMCI or not, and this touches the naked oop by comparing it with 
> another oop. This doesn't sound like a reliable thing to do.

Can you double-check you observe the crash with product binaries as 
well? My current understanding is that it happens only with debug builds.

Best regards,
Vladimir Ivanov

> This simple change seems to fix it:
> https://paste.fedoraproject.org/paste/poQ5caTCuN6jHSGbK1n0iQ
> 
> Doing more testing...
> 
> Roman
> 
>> Anyway, I'll file a bug to investigate ObjectLookup::find_index().
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 11/14/17 6:45 PM, Roman Kennke wrote:
>>> The code below the assert also unwraps the oop and does lookups with 
>>> it. I'm not on my computer but I can dig out the relevant parts when 
>>> I'm back at work...
>>>
>>> Roman
>>>
>>>
>>> Am 14. November 2017 16:36:10 MEZ schrieb Vladimir Ivanov 
>>> <vladimir.x.ivanov at oracle.com>:
>>>
>>>     Aleksey,
>>>
>>>     I agree with your & Roman analysis: compilers shouldn't touch 
>>> naked oops
>>>     unless the thread is in _thread_in_vm mode.
>>>
>>>     Looking at the crash log, the problematic code is under assert:
>>>
>>>     void ConstantOopWriteValue::write_on(DebugInfoWriteStream* stream) {
>>>         assert(JNIHandles::resolve(value()) == NULL ||
>>> Universe::heap()->is_in_reserved(JNIHandles::resolve(value())),
>>>                "Should be in heap");
>>>         stream->write_int(CONSTANT_OOP_CODE);
>>>         stream->write_handle(value());
>>>     }
>>>
>>>     So, the proper fix would be to make the verification code more 
>>> robust.
>>>
>>>     Best regards,
>>>     Vladimir Ivanov
>>>
>>>     On 11/14/17 5:16 PM, Aleksey Shipilev wrote:
>>>
>>>         Hi,
>>>
>>>         In some of our aggressive test configurations for Shenandoah, we
>>>         sometimes see the following failure:
>>> http://cr.openjdk.java.net/~shade/shenandoah/c1-race-fail-hs_err.log
>>>
>>>         It seems to happen when C1 code installation is happening during
>>>         Full GC.
>>>
>>>         The actual failure is caused by touching the
>>>         JNIHandles::deleted_handle() oop in
>>>         JNIHandles::guard_value() during JNIHandles::resolve() against
>>>         the constant oop handle when we are
>>>         recording the debugging information for C1-generated Java call:
>>> http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220 
>>>
>>>
>>>         The C1 thread is in _thread_in_native state, and so the runtime
>>>         thinks the thread is at safepoint,
>>>         but the thread touches the deleted_handle oop(). When Shenandoah
>>>         dives into Full GC and moves that
>>>         object at the same time, everything crashes and burns.
>>>
>>>         Is C1 (and any other compiler thread) supposed to transit to
>>>         _vm_state when touching the naked oops,
>>>         and thus coordinate with safepoints? I see VM_ENTRY_MARK all
>>>         over ci* that seems to transit there
>>>         before accessing the heap. Does that mean we need the same
>>>         everywhere around JNIHandles::resolve too?
>>>
>>>         Or is there some other mechanism that is supposed to get
>>>         compiler threads to coordinate with GC?
>>>
>>>         Thanks,
>>>         -Aleksey
>>>
>>>
>>> -- 
>>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
> 
>