RFR: JDK-8203172: Primitive heap access for interpreter BarrierSetAssembler/aarch64

Tue Jun 5 10:31:17 UTC 2018

Hi Erik,

We will set -UseFastJNIAccessors in Shenandoah for now. We might get
back to extending the PC list to a range later, but we have more
pressing issues to solve at this moment. Interesting question regarding
this patch is, do we want to keep the BarrierSetAssembler stuff in JNI
fast-get-field code, or do we want to rip it out. I tend to leave it in
anyway.

Roman

> Hi Roman,
> 
> On 2018-06-04 22:49, Roman Kennke wrote:
>> Am 04.06.2018 um 22:16 schrieb Erik Österlund:
>>> Hi Roman,
>>>
>>> On 2018-06-04 21:42, Roman Kennke wrote:
>>>> Am 04.06.2018 um 18:43 schrieb Erik Österlund:
>>>>> Hi Roman,
>>>>>
>>>>> On 2018-06-04 17:24, Roman Kennke wrote:
>>>>>> Ok, right. Very good catch!
>>>>>>
>>>>>> This should do it, right? Sorry, I couldn't easily make an
>>>>>> incremental
>>>>>> diff:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~rkennke/JDK-8203172/webrev.01/
>>>>> Unfortunately, I think there is one more problem for you.
>>>>> The signal handler is supposed to catch SIGSEGV caused by speculative
>>>>> loads shot from the fantastic jni fast get field code. But it
>>>>> currently
>>>>> expects an exact PC match:
>>>>>
>>>>> address JNI_FastGetField::find_slowcase_pc(address pc) {
>>>>>     for (int i=0; i<count; i++) {
>>>>>       if (speculative_load_pclist[i] == pc) {
>>>>>         return slowcase_entry_pclist[i];
>>>>>       }
>>>>>     }
>>>>>     return (address)-1;
>>>>> }
>>>>>
>>>>> This means that the way this is written now, speculative_load_pclist
>>>>> registers the __ pc() right before the access_load_at call. This puts
>>>>> constraints on whatever is done inside of access_load_at to only
>>>>> speculatively load on the first assembled instruction.
>>>>>
>>>>> If you imagine a scenario where you have a GC with Brooks pointers
>>>>> that
>>>>> also uncommits memory (like Shenandoah I presume), then I imagine you
>>>>> would need something more here. If you start with a forwarding pointer
>>>>> load, then that can trap (which is probably caught by the exact PC
>>>>> match). But then there will be a subsequent load of the value in the
>>>>> to-space object, which will not be protected. But this is also loaded
>>>>> speculatively (as the subsequent safepoint counter check could
>>>>> invalidate the result), and could therefore crash the VM unless
>>>>> protected, as the signal handler code fails to recognize this is a
>>>>> speculative load from jni fast get field.
>>>>>
>>>>> I imagine the solution to this would be to let speculative_load_pclist
>>>>> specify a range for fuzzy SIGSEGV matching in the signal handler,
>>>>> rather
>>>>> than an exact PC (i.e. speculative_load_pclist_start and
>>>>> speculative_load_pclist_end). That would give you enough freedom to
>>>>> use
>>>>> Brooks pointers in there. Sometimes I wonder if the lengths we go to
>>>>> maintain jni fast get field is *really* worth it.
>>>> I are probably right in general. But I also think we are fine with
>>>> Shenandoah. Both the fwd ptr load and the field load are constructed
>>>> with the same base operand. If the oop is NULL (or invalid memory) it
>>>> will blow up on fwdptr load just the same as it would blow up on field
>>>> load. We maintain an invariant that the fwd ptr of a valid oop results
>>>> in a valid (and equivalent) oop. I therefore think we are fine for now.
>>>> Should a GC ever need anything else here, I'd worry about it then.
>>>> Until
>>>> this happens, let's just hope to never need to touch this code again
>>>> ;-)
>>> No I'm afraid that is not safe. After loading the forwarding pointer,
>>> the thread could be preempted, then any number of GC cycles could pass,
>>> which means that the address that the at some point read forwarding
>>> pointer points to, could be uncommitted memory. In fact it is unsafe
>>> even without uncommitted memory. Because after resolving the jobject to
>>> some address in the heap, the thread could get preempted, and any number
>>> of GC cycles could pass, causing the forwarding pointer to be read from
>>> some address in the heap that no longer is the forwarding pointer of an
>>> object, but rather a random integer. This causes the second load to blow
>>> up, even without uncommitting memory.
>>>
>>> Here is an attempt at showing different things that can go wrong:
>>>
>>> obj = *jobject
>>> // preempted for N GC cycles, meaning obj might 1) be a valid pointer to
>>> an object, or 2) be a random pointer inside of the heap or outside of
>>> the heap
>>>
>>> forward_pointer = *obj // may 1) crash with SIGSEGV, 2) read a random
>>> pointer, no longer representing the forwarding pointer, or 3) read a
>>> consistent forwarding pointer
>>>
>>> // preempted for N GC cycles, causing forward_pointer to point at pretty
>>> much anything
>>>
>>> result = *(forward_pointer + offset) // may 1) read a valid primitive
>>> value, if previous two loads were not messed up, or 2) read some random
>>> value that no longer corresponds to the object field, or 3) crash
>>> because either the forwarding pointer did point at something valid that
>>> subsequently got relocated and uncommitted before the load hits, or
>>> because the forwarding pointer never pointed to anything valid in the
>>> first place, because the forwarding pointer load read a random pointer
>>> due to the object relocating after the jobject was resolved.
>>>
>>> The summary is that both loads need protection due to how the thread in
>>> native state runs freely without necessarily caring about the GC running
>>> any number of GC cycles concurrently, making the memory super slippery,
>>> which risks crashing the VM without the proper protection.
>> AWW WTF!? We are in native state in this code?
> 
> Yes. This is one of the most dangerous code paths we have in the VM I
> think.
> 
>> It might be easier to just call bsa->resolve_for_read() (which emits the
>> fwd ptr load), then issue another:
>>
>> speculative_load_pclist[count] = __ pc();
>>
>> need to juggle with the counter and double-emit slowcase_entry_pclist,
>> and all this conditionally for Shenandoah. Gaa.
> 
> I think that by just having the speculative load PC list take a range as
> opposed to a precise PC, and check that a given PC is in that range, and
> not just exactly equal to a PC, the problem is solved for everyone.
> 
>> Or just FLAG_SET_DEFAULT(UseFastJNIAccessors,false) in Shenandoah.
> 
> Yeah, sometimes you wonder if it's really worth the maintenance to keep
> this thing.
> 
>> Funny how we had this code in Shenandoah literally for years, and
>> nobody's ever tripped over it.
> 
> Yeah it is a rather nasty race to detect.
> 
>> It's one of those cases where I almost suspect it's been done in Java1.0
>> when lots of JNI code was in use because some stuff couldn't be done in
>> fast in Java, but nowadays doesn't really make a difference. *Sigh*
> 
> :)
> 
>>>>>> Unfortunately, I cannot really test it because of:
>>>>>> http://mail.openjdk.java.net/pipermail/aarch64-port-dev/2018-May/005843.html
>>>>>>
>>>>>>
>>>>>>
>>>>> That is unfortunate. If I were you, I would not dare to change
>>>>> anything
>>>>> in jni fast get field without testing it - it is very error prone.
>>>> Yeah. I guess I'll just wait with testing until this is resolved. Or
>>>> else resolve it myself.
>>> Yeah.
>>>
>>>> Can I consider this change reviewed by you?
>>> I think we should agree about the safety of doing this for Shenandoah in
>>> particular first. I still think we need the PC range as opposed to exact
>>> PC to be caught in the signal handler for this to be safe for your GC
>>> algorithm.
>>
>> Yeah, I agree. I need to think this through a little bit.
> 
> Yeah. Still think the PC range check solution should do the trick.
> 
>> Thanks for pointing out this bug. I can already see nightly builds
>> suddenly starting to fail over it, now that it's known :-)
> 
> No problem!
> 
> Thanks,
> /Erik
> 
>> Roman
>>
>>
>