RFR: JDK-8213199: GC abstraction for Assembler::needs_explicit_null_check()

Roman Kennke rkennke at redhat.com
Mon Nov 5 15:25:06 UTC 2018


Another way to look at it is this:

The method checks if:

1. An offset that, would it be dereferenced on NULL (0) hits the
protected 0-page (0<=offset<page_size). Or
2. An actual address (the fault address in the signal handler) hits the
protected 0-page.

... which basically amounts to the exact same check. Extra handling is
there to:

- handle narrow-oops
- mask aarch64 address (should be moved to signal handler)
- with Shenandoah, handle -8 offset, which would wrap-around the fault
address to 0xFF... if dereferenced on NULL

The proposed patch seems like a reasonably good solution to me. And I
have tried a bunch of different approaches too (basically keeping the
main 'algorithm' in place, and only add a hook or two to handle
Shenandoah), which didn't work out quite as nicely.


Roman

> Hi Roman,
> 
> So I get what you are saying right, we are using the bool
> needs_explicit_null_check(intptr_t offset) function both with offsets
> (what the parameter seems to suggest should be passed in), but also with
> raw addresses from the signal handler, posing as offsets. And we
> happened to get away with that because the check was written in such a
> clever way that it would return the right answer regardless of whether
> it was an offset or an address, saving us from writing those ~3 lines
> extra code for a new function expecting an address.
> 
> Except now with negative offsets, that also additionally get masked by
> the AArch64 linux signal handler, this suddenly falls apart and we can
> no longer seamlessly pass in offsets and addresses to get consistent
> answers. The same function suddenly handles different combinations of
> masked/unmasked positive/negative addresses, with/without heap base, as
> well as positive/negative offsets.
> 
> The first obvious suggestion that comes to mind is to sign extend the
> high order byte inside of the signal handler on AArch64, to fix the
> address before it gets passed in to needs_explicit_null_check, as
> opposed to letting the shared needs_explicit_null_check function take
> that responsibility, handling not just offsets and virtual addresses,
> but additionally also masked virtual addresses from the signal handler.
> 
> So with the AArch64 stuff gone from this function, I still can't help
> but think that the conditions in this code would be possible to write in
> a trivially understandable way that doesn't require me to scratch my
> head, if there was one function expecting addresses and one function
> expecting offsets. And in that case, I think it would look trivial to
> have a cell header check in those functions, which would fit in well. I
> think.
> 
> Thoughts?
> 
> Thanks,
> /Erik
> 
> On 2018-11-02 23:31, Roman Kennke wrote:
>> Hi Erik,
>>
>> I had a discussion with Andrew Dinn (who actually wrote that code back
>> then) and will copy his explanation here, I think it covers the
>> situation:
>>
>> ""
>> Let's just recap without considering Shenandoah:
>>
>> MacroAssembler::needs_explicit_null_check is called from several places,
>> not just the SEGV handler.
>>
>> In those other cases offset is a putative field offset for an object.
>>
>> With uncompressed pointers the method has to decide whether this offset
>> will lie within the read-protected page at address 0 i.e. 0 <= offset <
>> os::vm_page_size(). If that is true then a null check can be omitted for
>> a read or write operation. If not an explicit check is needed.
>>
>> With compressed pointers when Universe::narrow_oop_base() != NULL the
>> test also checks whether this offset may lie within the read-protected
>> page at the heap base only -- oh -- that's the same test! Of course,
>> we'll see that this is different when the SEGV handler makes the call.
>>
>> n.b. I think this case is included because there is the possibility that
>> a null pointer can be represented by the narrow oop base address.
>> However, I think in reality special case checks in all encode decode
>> operations ensure that a null oop is always represented as zero (for
>> AArch64 at least). I'm not absolutely sure of that though.
>>
>> Now, when called from the SEGV handler offset is actually the offending
>> read/write address that caused the SEGV. In that case a null pointer
>> will only have been dereferenced if the address lies in the protected
>> zero page or in the protected heap base page. The first if clause
>> handles the compressed pointer case and reduces it to the uncompressed
>> pointer case by subtracting base when offset >= base.
>>
>> So, either way the offending address is potentially a null pointer
>> dereference under exactly the same conditions as for those other calls.
>>
>> Next, how does Shenandoah modify this? Well, it looks like that depends
>> on how this method is called. If it can be invoked from any of those
>> other places to check whether a Brooks pointer read or write is ok to
>> validate with an implicit null check then the offset may be passed as -8
>> i.e. 0xfffffffffffffff8. However, that's actually not enough.
>>
>> When invoked from the handler because of an access at of (0 + offset)
>> then offset may legitimately be 0x00fffffffffffff8 or lie between 0 and
>> os::vm_page_size().
>>
>> When invoked form the handler because of a dereference of
>> (narrow_oop_base + offset) then the subtract i the if clause means that
>> offset may legitimately be 0xfffffffffffffff8 or lie between 0 and
>> os::vm_page_size().
>>
>> So, the masking in the Shenandoah case is to reduce those two special
>> cases into one.
>> ""
>>
>> Roman
>>
>>
>>
>>> Hi Roman,
>>>
>>> On 2018-11-01 17:58, Roman Kennke wrote:
>>>> Hi Erik,
>>>>
>>>>> Would you mind explaining how you need to override this and why? I'm
>>>>> afraid I did not quite get it from your description in the RFC
>>>>> (which is
>>>>> also why I didn't come up with a solution either).
>>>>>
>>>>> Am I correct that you need to return false if the passed in offset is
>>>>> the cell header offset -8, or is there anything else at all
>>>>> required to
>>>>> that logic?
>>>> No, it's really just that. Plus take care of it in the case of combined
>>>> narrow-oops-offset plus -8.
>>>>> You mentioned something about the high order byte being
>>>>> masked on AArch64, but didn't quite connect the dot to how that is
>>>>> related to this code. Is it?
>>>> Yes, we also need to mask the high order byte in AArch64 because of the
>>>> way addressing works on that platform, and because -8 flips on those
>>>> negative bits.
>>> Okay. Looking at
>>> https://builds.shipilev.net/patch-openjdk-shenandoah-jdk/latest/src/hotspot/share/asm/assembler.cpp.udiff.html
>>>
>>> that you posted before, assuming this is how it will be extended.
>>>
>>> So in the case with a heap base, the resulting address will never be
>>> negative, and you know it will trap (presuming there is an mprotected
>>> page at offset -8 from the heap base pointer. So it seems in this case,
>>> the check could have just been offset == -8.
>>>
>>> So let's say we access null -8 without a heap base then. The address
>>> becomes 0xFFFFFFFFFFFFFFF8. When that traps, the value you see in the
>>> signal handler becomes 0x00FFFFFFFFFFFFF8 because of the virtual address
>>> masking of the high order byte. I presume that is what the AArch64 code
>>> is dealing with. But I don't quite understand why. Addresses starting
>>> with 0x00FF are not valid in user space so we never need to worry about
>>> any memory being mapped in there accidentally. So I'm not sure what this
>>> code protects against.
>>>
>>> So that takes me back to my original intuition: isn't this the same as
>>> checking at the very top: if (offset == BrooksPointer::byte_offset())
>>> return false;
>>>
>>> If it really is as simple as just that, and I did not miss something not
>>> obvious, then perhaps we would be better off abstracting a cell header
>>> check here, as opposed to wrapping the whole thing in a virtual member
>>> function. Especially since there are now a number of these occurrences
>>> where some basic knowledge about cell header size that is 0 for all GCs
>>> except Shenandoah where it is 8, leaves us wrapping a whole bunch of
>>> stuff behind a GC interface, only to handle that one difference.
>>>
>>> While I don't know exactly what this cell abstraction would reasonably
>>> look like in its full shape, perhaps we could start with something
>>> simple like a virtual member function size_t cell_header_size() on
>>> CollectedHeap with a suitable comment explaining that a cell header is a
>>> GC thing we sometimes put above the object header. And then see if there
>>> are more related functions that lead us to a helper class for cells or
>>> something like that.
>>>
>>> I'm open to suggestions of course. Thought I'd check with you if this
>>> sounds sane to you. Of course if my assumption is wrong that this check
>>> could be written much simpler than it looks like, then I suppose I need
>>> to give up on that idea. It all boils down to that really.
>>>
>>> Thanks,
>>> /Erik
>>>
>>>> Also, the mach5 job came back with FAILED (see below). Can somebody
>>>> with
>>>> access check and see what's up?
>>>>
>>>> Thanks,
>>>> Roman
>>>>
>>>>
>>>> Build Details: 2018-11-01-1146402.roman.source
>>>> 0 Failed Tests
>>>> Mach5 Tasks Results Summary
>>>>
>>>>       EXECUTED_WITH_FAILURE: 6
>>>>       KILLED: 0
>>>>       UNABLE_TO_RUN: 21
>>>>       PASSED: 48
>>>>       FAILED: 0
>>>>       NA: 0
>>>>       Build
>>>>
>>>>           6 Executed with failure
>>>>               solaris-sparcv9-solaris-sparcv9-build-8 error while
>>>> building, return value: 2
>>>>               solaris-sparcv9-debug-solaris-sparcv9-build-9 error while
>>>> building, return value: 2
>>>>               windows-x64-windows-x64-build-10 error while building,
>>>> return value: 2
>>>>               windows-x64-debug-windows-x64-build-11 error while
>>>> building,
>>>> return value: 2
>>>>               windows-x64-open-windows-x64-build-12 error while
>>>> building,
>>>> return value: 2
>>>>               windows-x64-open-debug-windows-x64-build-13 error while
>>>> building, return value: 2
>>>>           2 Not run
>>>>               solaris-sparcv9-install-solaris-sparcv9-build-16
>>>> Dependency
>>>> task failed: mach5...-8300-solaris-sparcv9-solaris-sparcv9-build-8
>>>>               windows-x64-install-windows-x64-build-17 Dependency task
>>>> failed: YJftjiBUYc
>>>>
>>>>       Test
>>>>
>>>>           19 Not run
>>>>
>>>> tier1-solaris-sparc-open_test_hotspot_jtreg_tier1_common-solaris-sparcv9-57
>>>>
>>>>
>>>> Dependency task failed:
>>>> mach5...-8300-solaris-sparcv9-solaris-sparcv9-build-8
>>>>
>>>> tier1-solaris-sparc-open_test_hotspot_jtreg_tier1_common-solaris-sparcv9-debug-58
>>>>
>>>>
>>>> Dependency task failed:
>>>> mach5...solaris-sparcv9-debug-solaris-sparcv9-build-9
>>>>
>>>> tier1-product-open_test_hotspot_jtreg_tier1_common-windows-x64-20
>>>> Dependency task failed: YJftjiBUYc
>>>>
>>>> tier1-debug-open_test_hotspot_jtreg_tier1_common-windows-x64-debug-26
>>>> Dependency task failed: YJftjiBVYc
>>>>
>>>> tier1-debug-open_test_hotspot_jtreg_tier1_compiler_1-windows-x64-debug-29
>>>>
>>>> Dependency
>>>> task failed: YJftjiBVYc
>>>>
>>>> tier1-debug-open_test_hotspot_jtreg_tier1_compiler_2-windows-x64-debug-32
>>>>
>>>> Dependency
>>>> task failed: YJftjiBVYc
>>>>
>>>> tier1-debug-open_test_hotspot_jtreg_tier1_compiler_3-windows-x64-debug-35
>>>>
>>>> Dependency
>>>> task failed: YJftjiBVYc
>>>>
>>>> tier1-debug-open_test_hotspot_jtreg_tier1_compiler_not_xcomp-windows-x64-debug-38
>>>>
>>>>
>>>> Dependency task failed: YJftjiBVYc
>>>>
>>>> tier1-debug-open_test_hotspot_jtreg_tier1_gc_1-windows-x64-debug-41
>>>> Dependency task failed: YJftjiBVYc
>>>>
>>>> tier1-debug-open_test_hotspot_jtreg_tier1_gc_2-windows-x64-debug-44
>>>> Dependency task failed: YJftjiBVYc
>>>>               See all 19...
>>>>
>>>>
>>>>
>>>>> Thanks,
>>>>> /Erik
>>>>>
>>>>> On 2018-11-01 12:20, Roman Kennke wrote:
>>>>>> Hi Kim, thanks for reviewing! I'll push it through jdk/submit now.
>>>>>>
>>>>>> Erik: ok for you too?
>>>>>>
>>>>>> Thanks,
>>>>>> Roman
>>>>>>
>>>>>>
>>>>>>>> On Oct 31, 2018, at 6:14 PM, Roman Kennke <rkennke at redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Erik,
>>>>>>>>
>>>>>>>> right. Fixed this, and what what Kim mentioned plus a missing
>>>>>>>> include:
>>>>>>>>
>>>>>>>> Incremental:
>>>>>>>> http://cr.openjdk.java.net/~rkennke/JDK-8213199/webrev.01.diff/
>>>>>>>> Full:
>>>>>>>> http://cr.openjdk.java.net/~rkennke/JDK-8213199/webrev.01/
>>>>>>>>
>>>>>>>> Ok now?
>>>>>>> Looks good.
>>>>>>>
> 



More information about the hotspot-dev mailing list