RFR: JDK-8203172: Primitive heap access for interpreter BarrierSetAssembler/aarch64

Fri Jun 8 21:29:55 UTC 2018

Hi Erik,

thanks for reviewing, and especially for pointing out the nasty
jnifastgetfield bug :-)

Cheers, Roman

> Hi Roman,
> 
> Looks good.
> 
> Thanks,
> /Erik
> 
> On 2018-06-08 22:19, Roman Kennke wrote:
>> Ping?
>>
>>> As mentioned in another thread, we in Shenandoah have decided to skip
>>> JNI fast getfield stuff for now. We'll probably address it and implement
>>> the extended range speculative PC thing later, in a separate RFE. I
>>> ripped out the jniFastGetField changes from the patch:
>>>
>>> http://cr.openjdk.java.net/~rkennke/JDK-8203172/webrev.02/
>>>
>>> Is it good now to push?
>>>
>>> Roman
>>>
>>>> Hi Roman,
>>>>
>>>> On 2018-06-04 22:49, Roman Kennke wrote:
>>>>> Am 04.06.2018 um 22:16 schrieb Erik Österlund:
>>>>>> Hi Roman,
>>>>>>
>>>>>> On 2018-06-04 21:42, Roman Kennke wrote:
>>>>>>> Am 04.06.2018 um 18:43 schrieb Erik Österlund:
>>>>>>>> Hi Roman,
>>>>>>>>
>>>>>>>> On 2018-06-04 17:24, Roman Kennke wrote:
>>>>>>>>> Ok, right. Very good catch!
>>>>>>>>>
>>>>>>>>> This should do it, right? Sorry, I couldn't easily make an
>>>>>>>>> incremental
>>>>>>>>> diff:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~rkennke/JDK-8203172/webrev.01/
>>>>>>>> Unfortunately, I think there is one more problem for you.
>>>>>>>> The signal handler is supposed to catch SIGSEGV caused by
>>>>>>>> speculative
>>>>>>>> loads shot from the fantastic jni fast get field code. But it
>>>>>>>> currently
>>>>>>>> expects an exact PC match:
>>>>>>>>
>>>>>>>> address JNI_FastGetField::find_slowcase_pc(address pc) {
>>>>>>>>      for (int i=0; i<count; i++) {
>>>>>>>>        if (speculative_load_pclist[i] == pc) {
>>>>>>>>          return slowcase_entry_pclist[i];
>>>>>>>>        }
>>>>>>>>      }
>>>>>>>>      return (address)-1;
>>>>>>>> }
>>>>>>>>
>>>>>>>> This means that the way this is written now,
>>>>>>>> speculative_load_pclist
>>>>>>>> registers the __ pc() right before the access_load_at call. This
>>>>>>>> puts
>>>>>>>> constraints on whatever is done inside of access_load_at to only
>>>>>>>> speculatively load on the first assembled instruction.
>>>>>>>>
>>>>>>>> If you imagine a scenario where you have a GC with Brooks pointers
>>>>>>>> that
>>>>>>>> also uncommits memory (like Shenandoah I presume), then I
>>>>>>>> imagine you
>>>>>>>> would need something more here. If you start with a forwarding
>>>>>>>> pointer
>>>>>>>> load, then that can trap (which is probably caught by the exact PC
>>>>>>>> match). But then there will be a subsequent load of the value in
>>>>>>>> the
>>>>>>>> to-space object, which will not be protected. But this is also
>>>>>>>> loaded
>>>>>>>> speculatively (as the subsequent safepoint counter check could
>>>>>>>> invalidate the result), and could therefore crash the VM unless
>>>>>>>> protected, as the signal handler code fails to recognize this is a
>>>>>>>> speculative load from jni fast get field.
>>>>>>>>
>>>>>>>> I imagine the solution to this would be to let
>>>>>>>> speculative_load_pclist
>>>>>>>> specify a range for fuzzy SIGSEGV matching in the signal handler,
>>>>>>>> rather
>>>>>>>> than an exact PC (i.e. speculative_load_pclist_start and
>>>>>>>> speculative_load_pclist_end). That would give you enough freedom to
>>>>>>>> use
>>>>>>>> Brooks pointers in there. Sometimes I wonder if the lengths we
>>>>>>>> go to
>>>>>>>> maintain jni fast get field is *really* worth it.
>>>>>>> I are probably right in general. But I also think we are fine with
>>>>>>> Shenandoah. Both the fwd ptr load and the field load are constructed
>>>>>>> with the same base operand. If the oop is NULL (or invalid
>>>>>>> memory) it
>>>>>>> will blow up on fwdptr load just the same as it would blow up on
>>>>>>> field
>>>>>>> load. We maintain an invariant that the fwd ptr of a valid oop
>>>>>>> results
>>>>>>> in a valid (and equivalent) oop. I therefore think we are fine
>>>>>>> for now.
>>>>>>> Should a GC ever need anything else here, I'd worry about it then.
>>>>>>> Until
>>>>>>> this happens, let's just hope to never need to touch this code again
>>>>>>> ;-)
>>>>>> No I'm afraid that is not safe. After loading the forwarding pointer,
>>>>>> the thread could be preempted, then any number of GC cycles could
>>>>>> pass,
>>>>>> which means that the address that the at some point read forwarding
>>>>>> pointer points to, could be uncommitted memory. In fact it is unsafe
>>>>>> even without uncommitted memory. Because after resolving the
>>>>>> jobject to
>>>>>> some address in the heap, the thread could get preempted, and any
>>>>>> number
>>>>>> of GC cycles could pass, causing the forwarding pointer to be read
>>>>>> from
>>>>>> some address in the heap that no longer is the forwarding pointer
>>>>>> of an
>>>>>> object, but rather a random integer. This causes the second load
>>>>>> to blow
>>>>>> up, even without uncommitting memory.
>>>>>>
>>>>>> Here is an attempt at showing different things that can go wrong:
>>>>>>
>>>>>> obj = *jobject
>>>>>> // preempted for N GC cycles, meaning obj might 1) be a valid
>>>>>> pointer to
>>>>>> an object, or 2) be a random pointer inside of the heap or outside of
>>>>>> the heap
>>>>>>
>>>>>> forward_pointer = *obj // may 1) crash with SIGSEGV, 2) read a random
>>>>>> pointer, no longer representing the forwarding pointer, or 3) read a
>>>>>> consistent forwarding pointer
>>>>>>
>>>>>> // preempted for N GC cycles, causing forward_pointer to point at
>>>>>> pretty
>>>>>> much anything
>>>>>>
>>>>>> result = *(forward_pointer + offset) // may 1) read a valid primitive
>>>>>> value, if previous two loads were not messed up, or 2) read some
>>>>>> random
>>>>>> value that no longer corresponds to the object field, or 3) crash
>>>>>> because either the forwarding pointer did point at something valid
>>>>>> that
>>>>>> subsequently got relocated and uncommitted before the load hits, or
>>>>>> because the forwarding pointer never pointed to anything valid in the
>>>>>> first place, because the forwarding pointer load read a random
>>>>>> pointer
>>>>>> due to the object relocating after the jobject was resolved.
>>>>>>
>>>>>> The summary is that both loads need protection due to how the
>>>>>> thread in
>>>>>> native state runs freely without necessarily caring about the GC
>>>>>> running
>>>>>> any number of GC cycles concurrently, making the memory super
>>>>>> slippery,
>>>>>> which risks crashing the VM without the proper protection.
>>>>> AWW WTF!? We are in native state in this code?
>>>> Yes. This is one of the most dangerous code paths we have in the VM I
>>>> think.
>>>>
>>>>> It might be easier to just call bsa->resolve_for_read() (which
>>>>> emits the
>>>>> fwd ptr load), then issue another:
>>>>>
>>>>> speculative_load_pclist[count] = __ pc();
>>>>>
>>>>> need to juggle with the counter and double-emit slowcase_entry_pclist,
>>>>> and all this conditionally for Shenandoah. Gaa.
>>>> I think that by just having the speculative load PC list take a
>>>> range as
>>>> opposed to a precise PC, and check that a given PC is in that range,
>>>> and
>>>> not just exactly equal to a PC, the problem is solved for everyone.
>>>>
>>>>> Or just FLAG_SET_DEFAULT(UseFastJNIAccessors,false) in Shenandoah.
>>>> Yeah, sometimes you wonder if it's really worth the maintenance to keep
>>>> this thing.
>>>>
>>>>> Funny how we had this code in Shenandoah literally for years, and
>>>>> nobody's ever tripped over it.
>>>> Yeah it is a rather nasty race to detect.
>>>>
>>>>> It's one of those cases where I almost suspect it's been done in
>>>>> Java1.0
>>>>> when lots of JNI code was in use because some stuff couldn't be
>>>>> done in
>>>>> fast in Java, but nowadays doesn't really make a difference. *Sigh*
>>>> :)
>>>>
>>>>>>>>> Unfortunately, I cannot really test it because of:
>>>>>>>>> http://mail.openjdk.java.net/pipermail/aarch64-port-dev/2018-May/005843.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> That is unfortunate. If I were you, I would not dare to change
>>>>>>>> anything
>>>>>>>> in jni fast get field without testing it - it is very error prone.
>>>>>>> Yeah. I guess I'll just wait with testing until this is resolved. Or
>>>>>>> else resolve it myself.
>>>>>> Yeah.
>>>>>>
>>>>>>> Can I consider this change reviewed by you?
>>>>>> I think we should agree about the safety of doing this for
>>>>>> Shenandoah in
>>>>>> particular first. I still think we need the PC range as opposed to
>>>>>> exact
>>>>>> PC to be caught in the signal handler for this to be safe for your GC
>>>>>> algorithm.
>>>>> Yeah, I agree. I need to think this through a little bit.
>>>> Yeah. Still think the PC range check solution should do the trick.
>>>>
>>>>> Thanks for pointing out this bug. I can already see nightly builds
>>>>> suddenly starting to fail over it, now that it's known :-)
>>>> No problem!
>>>>
>>>> Thanks,
>>>> /Erik
>>>>
>>>>> Roman
>>>>>
>>>>>
>>>
>>
>