RFR: JDK-8203172: Primitive heap access for interpreter BarrierSetAssembler/aarch64

Mon Jun 4 16:43:47 UTC 2018

Hi Roman,

On 2018-06-04 17:24, Roman Kennke wrote:
> Ok, right. Very good catch!
>
> This should do it, right? Sorry, I couldn't easily make an incremental diff:
>
> http://cr.openjdk.java.net/~rkennke/JDK-8203172/webrev.01/

Unfortunately, I think there is one more problem for you.
The signal handler is supposed to catch SIGSEGV caused by speculative 
loads shot from the fantastic jni fast get field code. But it currently 
expects an exact PC match:

address JNI_FastGetField::find_slowcase_pc(address pc) {
   for (int i=0; i<count; i++) {
     if (speculative_load_pclist[i] == pc) {
       return slowcase_entry_pclist[i];
     }
   }
   return (address)-1;
}

This means that the way this is written now, speculative_load_pclist 
registers the __ pc() right before the access_load_at call. This puts 
constraints on whatever is done inside of access_load_at to only 
speculatively load on the first assembled instruction.

If you imagine a scenario where you have a GC with Brooks pointers that 
also uncommits memory (like Shenandoah I presume), then I imagine you 
would need something more here. If you start with a forwarding pointer 
load, then that can trap (which is probably caught by the exact PC 
match). But then there will be a subsequent load of the value in the 
to-space object, which will not be protected. But this is also loaded 
speculatively (as the subsequent safepoint counter check could 
invalidate the result), and could therefore crash the VM unless 
protected, as the signal handler code fails to recognize this is a 
speculative load from jni fast get field.

I imagine the solution to this would be to let speculative_load_pclist 
specify a range for fuzzy SIGSEGV matching in the signal handler, rather 
than an exact PC (i.e. speculative_load_pclist_start and 
speculative_load_pclist_end). That would give you enough freedom to use 
Brooks pointers in there. Sometimes I wonder if the lengths we go to 
maintain jni fast get field is *really* worth it.

> Unfortunately, I cannot really test it because of:
> http://mail.openjdk.java.net/pipermail/aarch64-port-dev/2018-May/005843.html

That is unfortunate. If I were you, I would not dare to change anything 
in jni fast get field without testing it - it is very error prone.

Thanks,
/Erik

> Roman
>
>
>> Hi Roman,
>>
>> Oh man, I was hoping I would never have to look at jni fast get field
>> again. Here goes...
>>
>>   93   speculative_load_pclist[count] = __ pc();   // Used by the
>> segfault handler
>>   94   __ access_load_at(type, IN_HEAP, noreg /* tos: r0/v0 */,
>> Address(robj, roffset), noreg, noreg);
>>   95
>>
>> I see that here you load straight to tos, which is r0 for integral
>> types. But r0 is also c_rarg0. So it seems like if after loading the
>> primitive to r0, the subsequent safepoint counter check fails, then the
>> code will revert back to a slowpath call, but this time with c_rarg0
>> clobbered, leading to a broken JNI env pointer being passed in to the
>> slow path C function. That does not seem right to me.
>>
>> This JNI fast get field code is so error prone. :(
>>
>> Unfortunately, the proposed API can not load floating point numbers to
>> anything but ToS, which seems like a problem in the jni fast get field
>> code.
>> I think to make this work properly, you need to load integral types to
>> result and not ToS, so that you do not clobber r0, and rely on ToS being
>> v0 for floating point types, which does not clobber r0. That way we can
>> dance around the issue for now I suppose.
>>
>> Thanks,
>> /Erik
>>
>> On 2018-05-14 22:23, Roman Kennke wrote:
>>> Similar to x86
>>> (http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-May/032114.html)
>>> here comes the primitive heap access changes for aarch64:
>>>
>>> http://cr.openjdk.java.net/~rkennke/JDK-8203172/webrev.00/
>>>
>>> Some notes:
>>> - array access used to compute base_obj + index, and then use indexed
>>> addressing with base_offset. This means we cannot get base_obj in the
>>> BarrierSetAssembler API, but we need that, e.g. for resolving the target
>>> object via forwarding pointer. I changed (base_obj+index)+base_offset to
>>> base_obj+(index+base_offset) in all the relevant places.
>>>
>>> - in jniFastGetField_aarch64.cpp, we are using a trick to ensure correct
>>> ordering field-load with the load of the safepoint counter: we make them
>>> address dependend. For float and double loads this meant to load the
>>> value as int/long, and then later moving those into v0. This doesn't
>>> work when going through the BarrierSetAssembler API: it loads straight
>>> to v0. Instead I am inserting a LoadLoad membar for float/double (which
>>> should be rare enough anyway).
>>>
>>> Other than that it's pretty much analogous to x86.
>>>
>>> Testing: no regressions in hotspot/tier1
>>>
>>> Can I please get a review?
>>>
>>> Thanks, Roman
>>>
>