RFR: JDK-8213199: GC abstraction for Assembler::needs_explicit_null_check()

Erik Österlund erik.osterlund at oracle.com
Mon Nov 5 13:08:54 UTC 2018


Hi Roman,

So I get what you are saying right, we are using the bool 
needs_explicit_null_check(intptr_t offset) function both with offsets 
(what the parameter seems to suggest should be passed in), but also with 
raw addresses from the signal handler, posing as offsets. And we 
happened to get away with that because the check was written in such a 
clever way that it would return the right answer regardless of whether 
it was an offset or an address, saving us from writing those ~3 lines 
extra code for a new function expecting an address.

Except now with negative offsets, that also additionally get masked by 
the AArch64 linux signal handler, this suddenly falls apart and we can 
no longer seamlessly pass in offsets and addresses to get consistent 
answers. The same function suddenly handles different combinations of 
masked/unmasked positive/negative addresses, with/without heap base, as 
well as positive/negative offsets.

The first obvious suggestion that comes to mind is to sign extend the 
high order byte inside of the signal handler on AArch64, to fix the 
address before it gets passed in to needs_explicit_null_check, as 
opposed to letting the shared needs_explicit_null_check function take 
that responsibility, handling not just offsets and virtual addresses, 
but additionally also masked virtual addresses from the signal handler.

So with the AArch64 stuff gone from this function, I still can't help 
but think that the conditions in this code would be possible to write in 
a trivially understandable way that doesn't require me to scratch my 
head, if there was one function expecting addresses and one function 
expecting offsets. And in that case, I think it would look trivial to 
have a cell header check in those functions, which would fit in well. I 
think.

Thoughts?

Thanks,
/Erik

On 2018-11-02 23:31, Roman Kennke wrote:
> Hi Erik,
>
> I had a discussion with Andrew Dinn (who actually wrote that code back
> then) and will copy his explanation here, I think it covers the situation:
>
> ""
> Let's just recap without considering Shenandoah:
>
> MacroAssembler::needs_explicit_null_check is called from several places,
> not just the SEGV handler.
>
> In those other cases offset is a putative field offset for an object.
>
> With uncompressed pointers the method has to decide whether this offset
> will lie within the read-protected page at address 0 i.e. 0 <= offset <
> os::vm_page_size(). If that is true then a null check can be omitted for
> a read or write operation. If not an explicit check is needed.
>
> With compressed pointers when Universe::narrow_oop_base() != NULL the
> test also checks whether this offset may lie within the read-protected
> page at the heap base only -- oh -- that's the same test! Of course,
> we'll see that this is different when the SEGV handler makes the call.
>
> n.b. I think this case is included because there is the possibility that
> a null pointer can be represented by the narrow oop base address.
> However, I think in reality special case checks in all encode decode
> operations ensure that a null oop is always represented as zero (for
> AArch64 at least). I'm not absolutely sure of that though.
>
> Now, when called from the SEGV handler offset is actually the offending
> read/write address that caused the SEGV. In that case a null pointer
> will only have been dereferenced if the address lies in the protected
> zero page or in the protected heap base page. The first if clause
> handles the compressed pointer case and reduces it to the uncompressed
> pointer case by subtracting base when offset >= base.
>
> So, either way the offending address is potentially a null pointer
> dereference under exactly the same conditions as for those other calls.
>
> Next, how does Shenandoah modify this? Well, it looks like that depends
> on how this method is called. If it can be invoked from any of those
> other places to check whether a Brooks pointer read or write is ok to
> validate with an implicit null check then the offset may be passed as -8
> i.e. 0xfffffffffffffff8. However, that's actually not enough.
>
> When invoked from the handler because of an access at of (0 + offset)
> then offset may legitimately be 0x00fffffffffffff8 or lie between 0 and
> os::vm_page_size().
>
> When invoked form the handler because of a dereference of
> (narrow_oop_base + offset) then the subtract i the if clause means that
> offset may legitimately be 0xfffffffffffffff8 or lie between 0 and
> os::vm_page_size().
>
> So, the masking in the Shenandoah case is to reduce those two special
> cases into one.
> ""
>
> Roman
>
>
>
>> Hi Roman,
>>
>> On 2018-11-01 17:58, Roman Kennke wrote:
>>> Hi Erik,
>>>
>>>> Would you mind explaining how you need to override this and why? I'm
>>>> afraid I did not quite get it from your description in the RFC (which is
>>>> also why I didn't come up with a solution either).
>>>>
>>>> Am I correct that you need to return false if the passed in offset is
>>>> the cell header offset -8, or is there anything else at all required to
>>>> that logic?
>>> No, it's really just that. Plus take care of it in the case of combined
>>> narrow-oops-offset plus -8.
>>>> You mentioned something about the high order byte being
>>>> masked on AArch64, but didn't quite connect the dot to how that is
>>>> related to this code. Is it?
>>> Yes, we also need to mask the high order byte in AArch64 because of the
>>> way addressing works on that platform, and because -8 flips on those
>>> negative bits.
>> Okay. Looking at
>> https://builds.shipilev.net/patch-openjdk-shenandoah-jdk/latest/src/hotspot/share/asm/assembler.cpp.udiff.html
>> that you posted before, assuming this is how it will be extended.
>>
>> So in the case with a heap base, the resulting address will never be
>> negative, and you know it will trap (presuming there is an mprotected
>> page at offset -8 from the heap base pointer. So it seems in this case,
>> the check could have just been offset == -8.
>>
>> So let's say we access null -8 without a heap base then. The address
>> becomes 0xFFFFFFFFFFFFFFF8. When that traps, the value you see in the
>> signal handler becomes 0x00FFFFFFFFFFFFF8 because of the virtual address
>> masking of the high order byte. I presume that is what the AArch64 code
>> is dealing with. But I don't quite understand why. Addresses starting
>> with 0x00FF are not valid in user space so we never need to worry about
>> any memory being mapped in there accidentally. So I'm not sure what this
>> code protects against.
>>
>> So that takes me back to my original intuition: isn't this the same as
>> checking at the very top: if (offset == BrooksPointer::byte_offset())
>> return false;
>>
>> If it really is as simple as just that, and I did not miss something not
>> obvious, then perhaps we would be better off abstracting a cell header
>> check here, as opposed to wrapping the whole thing in a virtual member
>> function. Especially since there are now a number of these occurrences
>> where some basic knowledge about cell header size that is 0 for all GCs
>> except Shenandoah where it is 8, leaves us wrapping a whole bunch of
>> stuff behind a GC interface, only to handle that one difference.
>>
>> While I don't know exactly what this cell abstraction would reasonably
>> look like in its full shape, perhaps we could start with something
>> simple like a virtual member function size_t cell_header_size() on
>> CollectedHeap with a suitable comment explaining that a cell header is a
>> GC thing we sometimes put above the object header. And then see if there
>> are more related functions that lead us to a helper class for cells or
>> something like that.
>>
>> I'm open to suggestions of course. Thought I'd check with you if this
>> sounds sane to you. Of course if my assumption is wrong that this check
>> could be written much simpler than it looks like, then I suppose I need
>> to give up on that idea. It all boils down to that really.
>>
>> Thanks,
>> /Erik
>>
>>> Also, the mach5 job came back with FAILED (see below). Can somebody with
>>> access check and see what's up?
>>>
>>> Thanks,
>>> Roman
>>>
>>>
>>> Build Details: 2018-11-01-1146402.roman.source
>>> 0 Failed Tests
>>> Mach5 Tasks Results Summary
>>>
>>>       EXECUTED_WITH_FAILURE: 6
>>>       KILLED: 0
>>>       UNABLE_TO_RUN: 21
>>>       PASSED: 48
>>>       FAILED: 0
>>>       NA: 0
>>>       Build
>>>
>>>           6 Executed with failure
>>>               solaris-sparcv9-solaris-sparcv9-build-8 error while
>>> building, return value: 2
>>>               solaris-sparcv9-debug-solaris-sparcv9-build-9 error while
>>> building, return value: 2
>>>               windows-x64-windows-x64-build-10 error while building,
>>> return value: 2
>>>               windows-x64-debug-windows-x64-build-11 error while building,
>>> return value: 2
>>>               windows-x64-open-windows-x64-build-12 error while building,
>>> return value: 2
>>>               windows-x64-open-debug-windows-x64-build-13 error while
>>> building, return value: 2
>>>           2 Not run
>>>               solaris-sparcv9-install-solaris-sparcv9-build-16 Dependency
>>> task failed: mach5...-8300-solaris-sparcv9-solaris-sparcv9-build-8
>>>               windows-x64-install-windows-x64-build-17 Dependency task
>>> failed: YJftjiBUYc
>>>
>>>       Test
>>>
>>>           19 Not run
>>>
>>> tier1-solaris-sparc-open_test_hotspot_jtreg_tier1_common-solaris-sparcv9-57
>>>
>>> Dependency task failed:
>>> mach5...-8300-solaris-sparcv9-solaris-sparcv9-build-8
>>>
>>> tier1-solaris-sparc-open_test_hotspot_jtreg_tier1_common-solaris-sparcv9-debug-58
>>>
>>> Dependency task failed:
>>> mach5...solaris-sparcv9-debug-solaris-sparcv9-build-9
>>>
>>> tier1-product-open_test_hotspot_jtreg_tier1_common-windows-x64-20
>>> Dependency task failed: YJftjiBUYc
>>>
>>> tier1-debug-open_test_hotspot_jtreg_tier1_common-windows-x64-debug-26
>>> Dependency task failed: YJftjiBVYc
>>>
>>> tier1-debug-open_test_hotspot_jtreg_tier1_compiler_1-windows-x64-debug-29
>>> Dependency
>>> task failed: YJftjiBVYc
>>>
>>> tier1-debug-open_test_hotspot_jtreg_tier1_compiler_2-windows-x64-debug-32
>>> Dependency
>>> task failed: YJftjiBVYc
>>>
>>> tier1-debug-open_test_hotspot_jtreg_tier1_compiler_3-windows-x64-debug-35
>>> Dependency
>>> task failed: YJftjiBVYc
>>>
>>> tier1-debug-open_test_hotspot_jtreg_tier1_compiler_not_xcomp-windows-x64-debug-38
>>>
>>> Dependency task failed: YJftjiBVYc
>>>
>>> tier1-debug-open_test_hotspot_jtreg_tier1_gc_1-windows-x64-debug-41
>>> Dependency task failed: YJftjiBVYc
>>>
>>> tier1-debug-open_test_hotspot_jtreg_tier1_gc_2-windows-x64-debug-44
>>> Dependency task failed: YJftjiBVYc
>>>               See all 19...
>>>
>>>
>>>
>>>> Thanks,
>>>> /Erik
>>>>
>>>> On 2018-11-01 12:20, Roman Kennke wrote:
>>>>> Hi Kim, thanks for reviewing! I'll push it through jdk/submit now.
>>>>>
>>>>> Erik: ok for you too?
>>>>>
>>>>> Thanks,
>>>>> Roman
>>>>>
>>>>>
>>>>>>> On Oct 31, 2018, at 6:14 PM, Roman Kennke <rkennke at redhat.com> wrote:
>>>>>>>
>>>>>>> Hi Erik,
>>>>>>>
>>>>>>> right. Fixed this, and what what Kim mentioned plus a missing
>>>>>>> include:
>>>>>>>
>>>>>>> Incremental:
>>>>>>> http://cr.openjdk.java.net/~rkennke/JDK-8213199/webrev.01.diff/
>>>>>>> Full:
>>>>>>> http://cr.openjdk.java.net/~rkennke/JDK-8213199/webrev.01/
>>>>>>>
>>>>>>> Ok now?
>>>>>> Looks good.
>>>>>>




More information about the hotspot-gc-dev mailing list