RFR: JDK-8203172: Primitive heap access for interpreter BarrierSetAssembler/aarch64

Tue Jun 5 19:26:25 UTC 2018

As mentioned in another thread, we in Shenandoah have decided to skip
JNI fast getfield stuff for now. We'll probably address it and implement
the extended range speculative PC thing later, in a separate RFE. I
ripped out the jniFastGetField changes from the patch:

http://cr.openjdk.java.net/~rkennke/JDK-8203172/webrev.02/

Is it good now to push?

Roman

> Hi Roman,
> 
> On 2018-06-04 22:49, Roman Kennke wrote:
>> Am 04.06.2018 um 22:16 schrieb Erik Österlund:
>>> Hi Roman,
>>>
>>> On 2018-06-04 21:42, Roman Kennke wrote:
>>>> Am 04.06.2018 um 18:43 schrieb Erik Österlund:
>>>>> Hi Roman,
>>>>>
>>>>> On 2018-06-04 17:24, Roman Kennke wrote:
>>>>>> Ok, right. Very good catch!
>>>>>>
>>>>>> This should do it, right? Sorry, I couldn't easily make an
>>>>>> incremental
>>>>>> diff:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~rkennke/JDK-8203172/webrev.01/
>>>>> Unfortunately, I think there is one more problem for you.
>>>>> The signal handler is supposed to catch SIGSEGV caused by speculative
>>>>> loads shot from the fantastic jni fast get field code. But it
>>>>> currently
>>>>> expects an exact PC match:
>>>>>
>>>>> address JNI_FastGetField::find_slowcase_pc(address pc) {
>>>>>     for (int i=0; i<count; i++) {
>>>>>       if (speculative_load_pclist[i] == pc) {
>>>>>         return slowcase_entry_pclist[i];
>>>>>       }
>>>>>     }
>>>>>     return (address)-1;
>>>>> }
>>>>>
>>>>> This means that the way this is written now, speculative_load_pclist
>>>>> registers the __ pc() right before the access_load_at call. This puts
>>>>> constraints on whatever is done inside of access_load_at to only
>>>>> speculatively load on the first assembled instruction.
>>>>>
>>>>> If you imagine a scenario where you have a GC with Brooks pointers
>>>>> that
>>>>> also uncommits memory (like Shenandoah I presume), then I imagine you
>>>>> would need something more here. If you start with a forwarding pointer
>>>>> load, then that can trap (which is probably caught by the exact PC
>>>>> match). But then there will be a subsequent load of the value in the
>>>>> to-space object, which will not be protected. But this is also loaded
>>>>> speculatively (as the subsequent safepoint counter check could
>>>>> invalidate the result), and could therefore crash the VM unless
>>>>> protected, as the signal handler code fails to recognize this is a
>>>>> speculative load from jni fast get field.
>>>>>
>>>>> I imagine the solution to this would be to let speculative_load_pclist
>>>>> specify a range for fuzzy SIGSEGV matching in the signal handler,
>>>>> rather
>>>>> than an exact PC (i.e. speculative_load_pclist_start and
>>>>> speculative_load_pclist_end). That would give you enough freedom to
>>>>> use
>>>>> Brooks pointers in there. Sometimes I wonder if the lengths we go to
>>>>> maintain jni fast get field is *really* worth it.
>>>> I are probably right in general. But I also think we are fine with
>>>> Shenandoah. Both the fwd ptr load and the field load are constructed
>>>> with the same base operand. If the oop is NULL (or invalid memory) it
>>>> will blow up on fwdptr load just the same as it would blow up on field
>>>> load. We maintain an invariant that the fwd ptr of a valid oop results
>>>> in a valid (and equivalent) oop. I therefore think we are fine for now.
>>>> Should a GC ever need anything else here, I'd worry about it then.
>>>> Until
>>>> this happens, let's just hope to never need to touch this code again
>>>> ;-)
>>> No I'm afraid that is not safe. After loading the forwarding pointer,
>>> the thread could be preempted, then any number of GC cycles could pass,
>>> which means that the address that the at some point read forwarding
>>> pointer points to, could be uncommitted memory. In fact it is unsafe
>>> even without uncommitted memory. Because after resolving the jobject to
>>> some address in the heap, the thread could get preempted, and any number
>>> of GC cycles could pass, causing the forwarding pointer to be read from
>>> some address in the heap that no longer is the forwarding pointer of an
>>> object, but rather a random integer. This causes the second load to blow
>>> up, even without uncommitting memory.
>>>
>>> Here is an attempt at showing different things that can go wrong:
>>>
>>> obj = *jobject
>>> // preempted for N GC cycles, meaning obj might 1) be a valid pointer to
>>> an object, or 2) be a random pointer inside of the heap or outside of
>>> the heap
>>>
>>> forward_pointer = *obj // may 1) crash with SIGSEGV, 2) read a random
>>> pointer, no longer representing the forwarding pointer, or 3) read a
>>> consistent forwarding pointer
>>>
>>> // preempted for N GC cycles, causing forward_pointer to point at pretty
>>> much anything
>>>
>>> result = *(forward_pointer + offset) // may 1) read a valid primitive
>>> value, if previous two loads were not messed up, or 2) read some random
>>> value that no longer corresponds to the object field, or 3) crash
>>> because either the forwarding pointer did point at something valid that
>>> subsequently got relocated and uncommitted before the load hits, or
>>> because the forwarding pointer never pointed to anything valid in the
>>> first place, because the forwarding pointer load read a random pointer
>>> due to the object relocating after the jobject was resolved.
>>>
>>> The summary is that both loads need protection due to how the thread in
>>> native state runs freely without necessarily caring about the GC running
>>> any number of GC cycles concurrently, making the memory super slippery,
>>> which risks crashing the VM without the proper protection.
>> AWW WTF!? We are in native state in this code?
> 
> Yes. This is one of the most dangerous code paths we have in the VM I
> think.
> 
>> It might be easier to just call bsa->resolve_for_read() (which emits the
>> fwd ptr load), then issue another:
>>
>> speculative_load_pclist[count] = __ pc();
>>
>> need to juggle with the counter and double-emit slowcase_entry_pclist,
>> and all this conditionally for Shenandoah. Gaa.
> 
> I think that by just having the speculative load PC list take a range as
> opposed to a precise PC, and check that a given PC is in that range, and
> not just exactly equal to a PC, the problem is solved for everyone.
> 
>> Or just FLAG_SET_DEFAULT(UseFastJNIAccessors,false) in Shenandoah.
> 
> Yeah, sometimes you wonder if it's really worth the maintenance to keep
> this thing.
> 
>> Funny how we had this code in Shenandoah literally for years, and
>> nobody's ever tripped over it.
> 
> Yeah it is a rather nasty race to detect.
> 
>> It's one of those cases where I almost suspect it's been done in Java1.0
>> when lots of JNI code was in use because some stuff couldn't be done in
>> fast in Java, but nowadays doesn't really make a difference. *Sigh*
> 
> :)
> 
>>>>>> Unfortunately, I cannot really test it because of:
>>>>>> http://mail.openjdk.java.net/pipermail/aarch64-port-dev/2018-May/005843.html
>>>>>>
>>>>>>
>>>>>>
>>>>> That is unfortunate. If I were you, I would not dare to change
>>>>> anything
>>>>> in jni fast get field without testing it - it is very error prone.
>>>> Yeah. I guess I'll just wait with testing until this is resolved. Or
>>>> else resolve it myself.
>>> Yeah.
>>>
>>>> Can I consider this change reviewed by you?
>>> I think we should agree about the safety of doing this for Shenandoah in
>>> particular first. I still think we need the PC range as opposed to exact
>>> PC to be caught in the signal handler for this to be safe for your GC
>>> algorithm.
>>
>> Yeah, I agree. I need to think this through a little bit.
> 
> Yeah. Still think the PC range check solution should do the trick.
> 
>> Thanks for pointing out this bug. I can already see nightly builds
>> suddenly starting to fail over it, now that it's known :-)
> 
> No problem!
> 
> Thanks,
> /Erik
> 
>> Roman
>>
>>
>