RFR: JDK-8203172: Primitive heap access for interpreter BarrierSetAssembler/aarch64

Mon Jun 4 21:21:44 UTC 2018

Hi Roman,

On 2018-06-04 22:49, Roman Kennke wrote:
> Am 04.06.2018 um 22:16 schrieb Erik Österlund:
>> Hi Roman,
>>
>> On 2018-06-04 21:42, Roman Kennke wrote:
>>> Am 04.06.2018 um 18:43 schrieb Erik Österlund:
>>>> Hi Roman,
>>>>
>>>> On 2018-06-04 17:24, Roman Kennke wrote:
>>>>> Ok, right. Very good catch!
>>>>>
>>>>> This should do it, right? Sorry, I couldn't easily make an incremental
>>>>> diff:
>>>>>
>>>>> http://cr.openjdk.java.net/~rkennke/JDK-8203172/webrev.01/
>>>> Unfortunately, I think there is one more problem for you.
>>>> The signal handler is supposed to catch SIGSEGV caused by speculative
>>>> loads shot from the fantastic jni fast get field code. But it currently
>>>> expects an exact PC match:
>>>>
>>>> address JNI_FastGetField::find_slowcase_pc(address pc) {
>>>>     for (int i=0; i<count; i++) {
>>>>       if (speculative_load_pclist[i] == pc) {
>>>>         return slowcase_entry_pclist[i];
>>>>       }
>>>>     }
>>>>     return (address)-1;
>>>> }
>>>>
>>>> This means that the way this is written now, speculative_load_pclist
>>>> registers the __ pc() right before the access_load_at call. This puts
>>>> constraints on whatever is done inside of access_load_at to only
>>>> speculatively load on the first assembled instruction.
>>>>
>>>> If you imagine a scenario where you have a GC with Brooks pointers that
>>>> also uncommits memory (like Shenandoah I presume), then I imagine you
>>>> would need something more here. If you start with a forwarding pointer
>>>> load, then that can trap (which is probably caught by the exact PC
>>>> match). But then there will be a subsequent load of the value in the
>>>> to-space object, which will not be protected. But this is also loaded
>>>> speculatively (as the subsequent safepoint counter check could
>>>> invalidate the result), and could therefore crash the VM unless
>>>> protected, as the signal handler code fails to recognize this is a
>>>> speculative load from jni fast get field.
>>>>
>>>> I imagine the solution to this would be to let speculative_load_pclist
>>>> specify a range for fuzzy SIGSEGV matching in the signal handler, rather
>>>> than an exact PC (i.e. speculative_load_pclist_start and
>>>> speculative_load_pclist_end). That would give you enough freedom to use
>>>> Brooks pointers in there. Sometimes I wonder if the lengths we go to
>>>> maintain jni fast get field is *really* worth it.
>>> I are probably right in general. But I also think we are fine with
>>> Shenandoah. Both the fwd ptr load and the field load are constructed
>>> with the same base operand. If the oop is NULL (or invalid memory) it
>>> will blow up on fwdptr load just the same as it would blow up on field
>>> load. We maintain an invariant that the fwd ptr of a valid oop results
>>> in a valid (and equivalent) oop. I therefore think we are fine for now.
>>> Should a GC ever need anything else here, I'd worry about it then. Until
>>> this happens, let's just hope to never need to touch this code again ;-)
>> No I'm afraid that is not safe. After loading the forwarding pointer,
>> the thread could be preempted, then any number of GC cycles could pass,
>> which means that the address that the at some point read forwarding
>> pointer points to, could be uncommitted memory. In fact it is unsafe
>> even without uncommitted memory. Because after resolving the jobject to
>> some address in the heap, the thread could get preempted, and any number
>> of GC cycles could pass, causing the forwarding pointer to be read from
>> some address in the heap that no longer is the forwarding pointer of an
>> object, but rather a random integer. This causes the second load to blow
>> up, even without uncommitting memory.
>>
>> Here is an attempt at showing different things that can go wrong:
>>
>> obj = *jobject
>> // preempted for N GC cycles, meaning obj might 1) be a valid pointer to
>> an object, or 2) be a random pointer inside of the heap or outside of
>> the heap
>>
>> forward_pointer = *obj // may 1) crash with SIGSEGV, 2) read a random
>> pointer, no longer representing the forwarding pointer, or 3) read a
>> consistent forwarding pointer
>>
>> // preempted for N GC cycles, causing forward_pointer to point at pretty
>> much anything
>>
>> result = *(forward_pointer + offset) // may 1) read a valid primitive
>> value, if previous two loads were not messed up, or 2) read some random
>> value that no longer corresponds to the object field, or 3) crash
>> because either the forwarding pointer did point at something valid that
>> subsequently got relocated and uncommitted before the load hits, or
>> because the forwarding pointer never pointed to anything valid in the
>> first place, because the forwarding pointer load read a random pointer
>> due to the object relocating after the jobject was resolved.
>>
>> The summary is that both loads need protection due to how the thread in
>> native state runs freely without necessarily caring about the GC running
>> any number of GC cycles concurrently, making the memory super slippery,
>> which risks crashing the VM without the proper protection.
> AWW WTF!? We are in native state in this code?

Yes. This is one of the most dangerous code paths we have in the VM I 
think.

> It might be easier to just call bsa->resolve_for_read() (which emits the
> fwd ptr load), then issue another:
>
> speculative_load_pclist[count] = __ pc();
>
> need to juggle with the counter and double-emit slowcase_entry_pclist,
> and all this conditionally for Shenandoah. Gaa.

I think that by just having the speculative load PC list take a range as 
opposed to a precise PC, and check that a given PC is in that range, and 
not just exactly equal to a PC, the problem is solved for everyone.

> Or just FLAG_SET_DEFAULT(UseFastJNIAccessors,false) in Shenandoah.

Yeah, sometimes you wonder if it's really worth the maintenance to keep 
this thing.

> Funny how we had this code in Shenandoah literally for years, and
> nobody's ever tripped over it.

Yeah it is a rather nasty race to detect.

> It's one of those cases where I almost suspect it's been done in Java1.0
> when lots of JNI code was in use because some stuff couldn't be done in
> fast in Java, but nowadays doesn't really make a difference. *Sigh*

:)

>>>>> Unfortunately, I cannot really test it because of:
>>>>> http://mail.openjdk.java.net/pipermail/aarch64-port-dev/2018-May/005843.html
>>>>>
>>>>>
>>>> That is unfortunate. If I were you, I would not dare to change anything
>>>> in jni fast get field without testing it - it is very error prone.
>>> Yeah. I guess I'll just wait with testing until this is resolved. Or
>>> else resolve it myself.
>> Yeah.
>>
>>> Can I consider this change reviewed by you?
>> I think we should agree about the safety of doing this for Shenandoah in
>> particular first. I still think we need the PC range as opposed to exact
>> PC to be caught in the signal handler for this to be safe for your GC
>> algorithm.
>
> Yeah, I agree. I need to think this through a little bit.

Yeah. Still think the PC range check solution should do the trick.

> Thanks for pointing out this bug. I can already see nightly builds
> suddenly starting to fail over it, now that it's known :-)

No problem!

Thanks,
/Erik

> Roman
>
>