RFR: JDK-8203172: Primitive heap access for interpreter BarrierSetAssembler/aarch64

Fri Jun 8 20:19:54 UTC 2018

Ping?

> As mentioned in another thread, we in Shenandoah have decided to skip
> JNI fast getfield stuff for now. We'll probably address it and implement
> the extended range speculative PC thing later, in a separate RFE. I
> ripped out the jniFastGetField changes from the patch:
> 
> http://cr.openjdk.java.net/~rkennke/JDK-8203172/webrev.02/
> 
> Is it good now to push?
> 
> Roman
> 
>> Hi Roman,
>>
>> On 2018-06-04 22:49, Roman Kennke wrote:
>>> Am 04.06.2018 um 22:16 schrieb Erik Österlund:
>>>> Hi Roman,
>>>>
>>>> On 2018-06-04 21:42, Roman Kennke wrote:
>>>>> Am 04.06.2018 um 18:43 schrieb Erik Österlund:
>>>>>> Hi Roman,
>>>>>>
>>>>>> On 2018-06-04 17:24, Roman Kennke wrote:
>>>>>>> Ok, right. Very good catch!
>>>>>>>
>>>>>>> This should do it, right? Sorry, I couldn't easily make an
>>>>>>> incremental
>>>>>>> diff:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~rkennke/JDK-8203172/webrev.01/
>>>>>> Unfortunately, I think there is one more problem for you.
>>>>>> The signal handler is supposed to catch SIGSEGV caused by speculative
>>>>>> loads shot from the fantastic jni fast get field code. But it
>>>>>> currently
>>>>>> expects an exact PC match:
>>>>>>
>>>>>> address JNI_FastGetField::find_slowcase_pc(address pc) {
>>>>>>     for (int i=0; i<count; i++) {
>>>>>>       if (speculative_load_pclist[i] == pc) {
>>>>>>         return slowcase_entry_pclist[i];
>>>>>>       }
>>>>>>     }
>>>>>>     return (address)-1;
>>>>>> }
>>>>>>
>>>>>> This means that the way this is written now, speculative_load_pclist
>>>>>> registers the __ pc() right before the access_load_at call. This puts
>>>>>> constraints on whatever is done inside of access_load_at to only
>>>>>> speculatively load on the first assembled instruction.
>>>>>>
>>>>>> If you imagine a scenario where you have a GC with Brooks pointers
>>>>>> that
>>>>>> also uncommits memory (like Shenandoah I presume), then I imagine you
>>>>>> would need something more here. If you start with a forwarding pointer
>>>>>> load, then that can trap (which is probably caught by the exact PC
>>>>>> match). But then there will be a subsequent load of the value in the
>>>>>> to-space object, which will not be protected. But this is also loaded
>>>>>> speculatively (as the subsequent safepoint counter check could
>>>>>> invalidate the result), and could therefore crash the VM unless
>>>>>> protected, as the signal handler code fails to recognize this is a
>>>>>> speculative load from jni fast get field.
>>>>>>
>>>>>> I imagine the solution to this would be to let speculative_load_pclist
>>>>>> specify a range for fuzzy SIGSEGV matching in the signal handler,
>>>>>> rather
>>>>>> than an exact PC (i.e. speculative_load_pclist_start and
>>>>>> speculative_load_pclist_end). That would give you enough freedom to
>>>>>> use
>>>>>> Brooks pointers in there. Sometimes I wonder if the lengths we go to
>>>>>> maintain jni fast get field is *really* worth it.
>>>>> I are probably right in general. But I also think we are fine with
>>>>> Shenandoah. Both the fwd ptr load and the field load are constructed
>>>>> with the same base operand. If the oop is NULL (or invalid memory) it
>>>>> will blow up on fwdptr load just the same as it would blow up on field
>>>>> load. We maintain an invariant that the fwd ptr of a valid oop results
>>>>> in a valid (and equivalent) oop. I therefore think we are fine for now.
>>>>> Should a GC ever need anything else here, I'd worry about it then.
>>>>> Until
>>>>> this happens, let's just hope to never need to touch this code again
>>>>> ;-)
>>>> No I'm afraid that is not safe. After loading the forwarding pointer,
>>>> the thread could be preempted, then any number of GC cycles could pass,
>>>> which means that the address that the at some point read forwarding
>>>> pointer points to, could be uncommitted memory. In fact it is unsafe
>>>> even without uncommitted memory. Because after resolving the jobject to
>>>> some address in the heap, the thread could get preempted, and any number
>>>> of GC cycles could pass, causing the forwarding pointer to be read from
>>>> some address in the heap that no longer is the forwarding pointer of an
>>>> object, but rather a random integer. This causes the second load to blow
>>>> up, even without uncommitting memory.
>>>>
>>>> Here is an attempt at showing different things that can go wrong:
>>>>
>>>> obj = *jobject
>>>> // preempted for N GC cycles, meaning obj might 1) be a valid pointer to
>>>> an object, or 2) be a random pointer inside of the heap or outside of
>>>> the heap
>>>>
>>>> forward_pointer = *obj // may 1) crash with SIGSEGV, 2) read a random
>>>> pointer, no longer representing the forwarding pointer, or 3) read a
>>>> consistent forwarding pointer
>>>>
>>>> // preempted for N GC cycles, causing forward_pointer to point at pretty
>>>> much anything
>>>>
>>>> result = *(forward_pointer + offset) // may 1) read a valid primitive
>>>> value, if previous two loads were not messed up, or 2) read some random
>>>> value that no longer corresponds to the object field, or 3) crash
>>>> because either the forwarding pointer did point at something valid that
>>>> subsequently got relocated and uncommitted before the load hits, or
>>>> because the forwarding pointer never pointed to anything valid in the
>>>> first place, because the forwarding pointer load read a random pointer
>>>> due to the object relocating after the jobject was resolved.
>>>>
>>>> The summary is that both loads need protection due to how the thread in
>>>> native state runs freely without necessarily caring about the GC running
>>>> any number of GC cycles concurrently, making the memory super slippery,
>>>> which risks crashing the VM without the proper protection.
>>> AWW WTF!? We are in native state in this code?
>>
>> Yes. This is one of the most dangerous code paths we have in the VM I
>> think.
>>
>>> It might be easier to just call bsa->resolve_for_read() (which emits the
>>> fwd ptr load), then issue another:
>>>
>>> speculative_load_pclist[count] = __ pc();
>>>
>>> need to juggle with the counter and double-emit slowcase_entry_pclist,
>>> and all this conditionally for Shenandoah. Gaa.
>>
>> I think that by just having the speculative load PC list take a range as
>> opposed to a precise PC, and check that a given PC is in that range, and
>> not just exactly equal to a PC, the problem is solved for everyone.
>>
>>> Or just FLAG_SET_DEFAULT(UseFastJNIAccessors,false) in Shenandoah.
>>
>> Yeah, sometimes you wonder if it's really worth the maintenance to keep
>> this thing.
>>
>>> Funny how we had this code in Shenandoah literally for years, and
>>> nobody's ever tripped over it.
>>
>> Yeah it is a rather nasty race to detect.
>>
>>> It's one of those cases where I almost suspect it's been done in Java1.0
>>> when lots of JNI code was in use because some stuff couldn't be done in
>>> fast in Java, but nowadays doesn't really make a difference. *Sigh*
>>
>> :)
>>
>>>>>>> Unfortunately, I cannot really test it because of:
>>>>>>> http://mail.openjdk.java.net/pipermail/aarch64-port-dev/2018-May/005843.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> That is unfortunate. If I were you, I would not dare to change
>>>>>> anything
>>>>>> in jni fast get field without testing it - it is very error prone.
>>>>> Yeah. I guess I'll just wait with testing until this is resolved. Or
>>>>> else resolve it myself.
>>>> Yeah.
>>>>
>>>>> Can I consider this change reviewed by you?
>>>> I think we should agree about the safety of doing this for Shenandoah in
>>>> particular first. I still think we need the PC range as opposed to exact
>>>> PC to be caught in the signal handler for this to be safe for your GC
>>>> algorithm.
>>>
>>> Yeah, I agree. I need to think this through a little bit.
>>
>> Yeah. Still think the PC range check solution should do the trick.
>>
>>> Thanks for pointing out this bug. I can already see nightly builds
>>> suddenly starting to fail over it, now that it's known :-)
>>
>> No problem!
>>
>> Thanks,
>> /Erik
>>
>>> Roman
>>>
>>>
>>
> 
>