RFR: JDK-8203172: Primitive heap access for interpreter BarrierSetAssembler/aarch64

Erik Österlund erik.osterlund at oracle.com
Mon Jun 4 20:16:05 UTC 2018


Hi Roman,

On 2018-06-04 21:42, Roman Kennke wrote:
> Am 04.06.2018 um 18:43 schrieb Erik Österlund:
>> Hi Roman,
>>
>> On 2018-06-04 17:24, Roman Kennke wrote:
>>> Ok, right. Very good catch!
>>>
>>> This should do it, right? Sorry, I couldn't easily make an incremental
>>> diff:
>>>
>>> http://cr.openjdk.java.net/~rkennke/JDK-8203172/webrev.01/
>> Unfortunately, I think there is one more problem for you.
>> The signal handler is supposed to catch SIGSEGV caused by speculative
>> loads shot from the fantastic jni fast get field code. But it currently
>> expects an exact PC match:
>>
>> address JNI_FastGetField::find_slowcase_pc(address pc) {
>>    for (int i=0; i<count; i++) {
>>      if (speculative_load_pclist[i] == pc) {
>>        return slowcase_entry_pclist[i];
>>      }
>>    }
>>    return (address)-1;
>> }
>>
>> This means that the way this is written now, speculative_load_pclist
>> registers the __ pc() right before the access_load_at call. This puts
>> constraints on whatever is done inside of access_load_at to only
>> speculatively load on the first assembled instruction.
>>
>> If you imagine a scenario where you have a GC with Brooks pointers that
>> also uncommits memory (like Shenandoah I presume), then I imagine you
>> would need something more here. If you start with a forwarding pointer
>> load, then that can trap (which is probably caught by the exact PC
>> match). But then there will be a subsequent load of the value in the
>> to-space object, which will not be protected. But this is also loaded
>> speculatively (as the subsequent safepoint counter check could
>> invalidate the result), and could therefore crash the VM unless
>> protected, as the signal handler code fails to recognize this is a
>> speculative load from jni fast get field.
>>
>> I imagine the solution to this would be to let speculative_load_pclist
>> specify a range for fuzzy SIGSEGV matching in the signal handler, rather
>> than an exact PC (i.e. speculative_load_pclist_start and
>> speculative_load_pclist_end). That would give you enough freedom to use
>> Brooks pointers in there. Sometimes I wonder if the lengths we go to
>> maintain jni fast get field is *really* worth it.
> I are probably right in general. But I also think we are fine with
> Shenandoah. Both the fwd ptr load and the field load are constructed
> with the same base operand. If the oop is NULL (or invalid memory) it
> will blow up on fwdptr load just the same as it would blow up on field
> load. We maintain an invariant that the fwd ptr of a valid oop results
> in a valid (and equivalent) oop. I therefore think we are fine for now.
> Should a GC ever need anything else here, I'd worry about it then. Until
> this happens, let's just hope to never need to touch this code again ;-)

No I'm afraid that is not safe. After loading the forwarding pointer, 
the thread could be preempted, then any number of GC cycles could pass, 
which means that the address that the at some point read forwarding 
pointer points to, could be uncommitted memory. In fact it is unsafe 
even without uncommitted memory. Because after resolving the jobject to 
some address in the heap, the thread could get preempted, and any number 
of GC cycles could pass, causing the forwarding pointer to be read from 
some address in the heap that no longer is the forwarding pointer of an 
object, but rather a random integer. This causes the second load to blow 
up, even without uncommitting memory.

Here is an attempt at showing different things that can go wrong:

obj = *jobject
// preempted for N GC cycles, meaning obj might 1) be a valid pointer to 
an object, or 2) be a random pointer inside of the heap or outside of 
the heap

forward_pointer = *obj // may 1) crash with SIGSEGV, 2) read a random 
pointer, no longer representing the forwarding pointer, or 3) read a 
consistent forwarding pointer

// preempted for N GC cycles, causing forward_pointer to point at pretty 
much anything

result = *(forward_pointer + offset) // may 1) read a valid primitive 
value, if previous two loads were not messed up, or 2) read some random 
value that no longer corresponds to the object field, or 3) crash 
because either the forwarding pointer did point at something valid that 
subsequently got relocated and uncommitted before the load hits, or 
because the forwarding pointer never pointed to anything valid in the 
first place, because the forwarding pointer load read a random pointer 
due to the object relocating after the jobject was resolved.

The summary is that both loads need protection due to how the thread in 
native state runs freely without necessarily caring about the GC running 
any number of GC cycles concurrently, making the memory super slippery, 
which risks crashing the VM without the proper protection.

>
>>> Unfortunately, I cannot really test it because of:
>>> http://mail.openjdk.java.net/pipermail/aarch64-port-dev/2018-May/005843.html
>>>
>> That is unfortunate. If I were you, I would not dare to change anything
>> in jni fast get field without testing it - it is very error prone.
>
> Yeah. I guess I'll just wait with testing until this is resolved. Or
> else resolve it myself.

Yeah.

> Can I consider this change reviewed by you?

I think we should agree about the safety of doing this for Shenandoah in 
particular first. I still think we need the PC range as opposed to exact 
PC to be caught in the signal handler for this to be safe for your GC 
algorithm.

Thanks,
/Erik

> Thanks,
> Roman
>
>
>> Thanks,
>> /Erik
>>
>>> Roman
>>>
>>>
>>>> Hi Roman,
>>>>
>>>> Oh man, I was hoping I would never have to look at jni fast get field
>>>> again. Here goes...
>>>>
>>>>    93   speculative_load_pclist[count] = __ pc();   // Used by the
>>>> segfault handler
>>>>    94   __ access_load_at(type, IN_HEAP, noreg /* tos: r0/v0 */,
>>>> Address(robj, roffset), noreg, noreg);
>>>>    95
>>>>
>>>> I see that here you load straight to tos, which is r0 for integral
>>>> types. But r0 is also c_rarg0. So it seems like if after loading the
>>>> primitive to r0, the subsequent safepoint counter check fails, then the
>>>> code will revert back to a slowpath call, but this time with c_rarg0
>>>> clobbered, leading to a broken JNI env pointer being passed in to the
>>>> slow path C function. That does not seem right to me.
>>>>
>>>> This JNI fast get field code is so error prone. :(
>>>>
>>>> Unfortunately, the proposed API can not load floating point numbers to
>>>> anything but ToS, which seems like a problem in the jni fast get field
>>>> code.
>>>> I think to make this work properly, you need to load integral types to
>>>> result and not ToS, so that you do not clobber r0, and rely on ToS being
>>>> v0 for floating point types, which does not clobber r0. That way we can
>>>> dance around the issue for now I suppose.
>>>>
>>>> Thanks,
>>>> /Erik
>>>>
>>>> On 2018-05-14 22:23, Roman Kennke wrote:
>>>>> Similar to x86
>>>>> (http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-May/032114.html)
>>>>>
>>>>> here comes the primitive heap access changes for aarch64:
>>>>>
>>>>> http://cr.openjdk.java.net/~rkennke/JDK-8203172/webrev.00/
>>>>>
>>>>> Some notes:
>>>>> - array access used to compute base_obj + index, and then use indexed
>>>>> addressing with base_offset. This means we cannot get base_obj in the
>>>>> BarrierSetAssembler API, but we need that, e.g. for resolving the
>>>>> target
>>>>> object via forwarding pointer. I changed
>>>>> (base_obj+index)+base_offset to
>>>>> base_obj+(index+base_offset) in all the relevant places.
>>>>>
>>>>> - in jniFastGetField_aarch64.cpp, we are using a trick to ensure
>>>>> correct
>>>>> ordering field-load with the load of the safepoint counter: we make
>>>>> them
>>>>> address dependend. For float and double loads this meant to load the
>>>>> value as int/long, and then later moving those into v0. This doesn't
>>>>> work when going through the BarrierSetAssembler API: it loads straight
>>>>> to v0. Instead I am inserting a LoadLoad membar for float/double (which
>>>>> should be rare enough anyway).
>>>>>
>>>>> Other than that it's pretty much analogous to x86.
>>>>>
>>>>> Testing: no regressions in hotspot/tier1
>>>>>
>>>>> Can I please get a review?
>>>>>
>>>>> Thanks, Roman
>>>>>
>



More information about the hotspot-dev mailing list