Problem in MacroAssembler::needs_explicit_null_check
The Problem ----------- This is a follow-up of my review of the Shenandoah patch addressing a curiosity that was found in the fault address values passed into MacroAssembler::needs_explicit_null_check from the SEGV handler. It turns out that when we dereference address -8 (aka 0xffff_ffff_ffff_fff8 aka Brooks pointer load via null) the fault address passed into the SEGV handler in siginfo->si_addr is 0x00ff_ffff_ffff_fff8. The top 8 bits of the address are zeroed. Why only 56 bits of address? Indeed, why are there not 48 bits of address? The relevant info can be found at http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/BABBEFA... To summarize: The virtual address space requires use of all 64 bits but it is in two parts with a hole in the middle i.e. addresses in the range 0x1_0000_0000_0000 to 0xFFFE_FFFF_FFFF_FFFF inclusive are not mappable. The top and bottom mappable regions each allow for 48 bits worth of vmem. So, the total available space adds up to 2 * 48 bits worth of space. However these two regions can only be mapped onto an underlying physical address space of 48 bits. So, only 48 bits worth of vmem can be in use at any given time. There are two corresponding sets of vmem to pmem translation tables, one for use in app and kernel land (TTBR0) the other kernel land only (TTBR1). TTRB0 maps from the low 48 bit space i.e. top 16 bits of vmem addresses are all 0, TTBR0 mappings are active in all execution levels (0-3). TTBR1 maps from the high 48-bit space where the top 16 bits are all 1. TTBR1 mappings are only active at execution level 3. So, accessing an address in this range from user land results in an automatic SEGV. The gap between these two spaces (i.e. where the top 16 bits are mixed) cannot be mapped via either translation table. The crucial comment on the page linked above is "You can enable VA tagging to exclude the top eight bits from the check" In this case the top 8 bits of an address used by a program /don't/ actually need to be all 0s or all 1s for addressing still to work. Only the top 56 bits are assumed to be valid. This latter option is documented here http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch12s05... It is enabled by writing a relevant bit in the translation table management register. What does this mean for us? --------------------------- It is probably the case that Linux on AArch64 is enabling VA tagging but it may not be If VA tagging is enabled then it may be that in consequence the hardware is only passing the top 56 bits to the trap handler and, thence, to the signal handler. Alternatively, it may be that the hardware passes the full address whatever the hardware setting and the OS trap handler is doing the masking off because it has noticed that VA tagging has been enabled. A third possibility is that VA tagging may not be enabled but the trap handler may be doing it irrespective of the setting i.e. we may never see 0xffff. I suspect this last case is unlikely. Anyway the upshot is we probably cannot rely on VA tagging being on or off. Nor, probably, can we rely on the top 8 bits being zeroed if it is on or being left unchanged if it is off. Whichever of these is the case we do know that we are not using tag bits in Java. So, we should only see either 0x0000, 0x00ff or 0xffff in the top 16 bits of any offset coming into method needs_explicit_null_check. The first case indicates an address in the valid vmem range and needs comparing to the heap base to determine whether to compare against the null check table. The second and third cases can only arise in two ways -- either because of a Brooks pointer load via a null oop or because something tried to load a garbage address in the kernel only range. In either of the last two cases we don't need to adjust the offset to do a heap check. This cannot be a heap address. So, when using Shenandoah on AArch64 For the case where (top 16 bits == 0xffff) we just need to test whether (offset == -8). For the case where (top 16 bits == 0x00ff) we can restore the top 8 bits to 0xff, on the assumption that these were the bits in the address that caused the fault i.e. offset = ((offset << 8) >> 8)] and then test whether (offset == -8). What does this mean for the current code? ----------------------------------------- Here is the current Shenandoah repo code bool MacroAssembler::needs_explicit_null_check(intptr_t offset) { // Exception handler checks the nmethod's implicit null checks table // only when this method returns false. #ifdef AARCH64 // AArch64 uses 48-bit addresses const uintptr_t address_bits = (uintptr_t)0xfffffffffffful; #else const uintptr_t address_bits = ~(uintptr_t)0; #endif #ifdef _LP64 if (UseCompressedOops && Universe::narrow_oop_base() != NULL) { assert (Universe::heap() != NULL, "java heap should be initialized"); // The first page after heap_base is unmapped and // the 'offset' is equal to [heap_base + offset] for // narrow oop implicit null checks. uintptr_t base = (uintptr_t)Universe::narrow_oop_base(); int adj = MIN2(0, UseShenandoahGC ? BrooksPointer::byte_offset() : 0); if ((uintptr_t)((offset - adj) & address_bits) >= base) { // Normalize offset for the next check. offset = (intptr_t)(pointer_delta((void*)offset, (void*)base, 1)); } } #endif if (UseShenandoahGC) { if ((offset & address_bits) == (BrooksPointer::byte_offset() & address_bits)) { return false; } } return offset < 0 || os::vm_page_size() <= offset; } To fix this I think we need to special case AArch64 processing separate from non-AArch64 processing. In the former case the Shenandoah checks can be done under #ifdef AARCH64 before the #ifdef LP64 block. This makes the check currently inserted after the #ifdef LP64 block redundant on AArch64 and hence require guarding with #ifndef AARCH64. Here is my suggested alternative: bool MacroAssembler::needs_explicit_null_check(intptr_t offset) { // Exception handler checks the nmethod's implicit null checks table // only when this method returns false. #ifdef AARCH64 // AArch64 addresses should only have 0x0000 or 0xffff // in the top 16 bits but in a fault address the latter // may be reset to 0x00ff. Non-zero bits are only legitimate // when Shenandoah loads a Brooks pointer via a null oop // i.e. original address should be - BrooksPointer::byte_offset(). long loffset = (long)offset; long hi = loffset >> 48; int adj = BrooksPointer::byte_offset(); if (hi != 0) { if (hi == 0x00ffL || hi == 0xffffL) { // the top 8 bits may have been cleared // if so make sure they are set again loffset = (loffset << 8) >> 8); // if this is not 0 - Brooks pointer offset // we need an explicit null check return ((loffset + adj) != 0); } } #endif #ifdef _LP64 if (UseCompressedOops && Universe::narrow_oop_base() != NULL) { assert (Universe::heap() != NULL, "java heap should be initialized"); // The first page after heap_base is unmapped and // the 'offset' is equal to [heap_base + offset] for // narrow oop implicit null checks. uintptr_t base = (uintptr_t)Universe::narrow_oop_base(); int adj = MIN2(0, UseShenandoahGC ? BrooksPointer::byte_offset() : 0); if (((uintptr_t)(offset - adj)) >= base) { // Normalize offset for the next check. offset = (intptr_t)(pointer_delta((void*)offset, (void*)base, 1)); } } #endif #ifndef AARCH64 if (UseShenandoahGC) { if (offset == BrooksPointer::byte_offset()) { return false; } } #endif return offset < 0 || os::vm_page_size() <= offset; } Comments and or corrections welcome. regards, Andrew Dinn -----------
Having just talked to Andrew Haley about this I'll add that I'm suggesting reworking this code (see bottom of original note for a proposed alternative) not to fix any bug but just to make it clearer what is being checked and why. regards, Andrew Dinn ----------- On 26/07/17 15:40, Andrew Dinn wrote:
The Problem -----------
This is a follow-up of my review of the Shenandoah patch addressing a curiosity that was found in the fault address values passed into MacroAssembler::needs_explicit_null_check from the SEGV handler. It turns out that when we dereference address -8 (aka 0xffff_ffff_ffff_fff8 aka Brooks pointer load via null) the fault address passed into the SEGV handler in siginfo->si_addr is 0x00ff_ffff_ffff_fff8. The top 8 bits of the address are zeroed.
Why only 56 bits of address? Indeed, why are there not 48 bits of address? The relevant info can be found at
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/BABBEFA...
To summarize:
The virtual address space requires use of all 64 bits but it is in two parts with a hole in the middle i.e. addresses in the range 0x1_0000_0000_0000 to 0xFFFE_FFFF_FFFF_FFFF inclusive are not mappable. The top and bottom mappable regions each allow for 48 bits worth of vmem. So, the total available space adds up to 2 * 48 bits worth of space. However these two regions can only be mapped onto an underlying physical address space of 48 bits. So, only 48 bits worth of vmem can be in use at any given time.
There are two corresponding sets of vmem to pmem translation tables, one for use in app and kernel land (TTBR0) the other kernel land only (TTBR1).
TTRB0 maps from the low 48 bit space i.e. top 16 bits of vmem addresses are all 0, TTBR0 mappings are active in all execution levels (0-3).
TTBR1 maps from the high 48-bit space where the top 16 bits are all 1. TTBR1 mappings are only active at execution level 3. So, accessing an address in this range from user land results in an automatic SEGV.
The gap between these two spaces (i.e. where the top 16 bits are mixed) cannot be mapped via either translation table.
The crucial comment on the page linked above is
"You can enable VA tagging to exclude the top eight bits from the check"
In this case the top 8 bits of an address used by a program /don't/ actually need to be all 0s or all 1s for addressing still to work. Only the top 56 bits are assumed to be valid.
This latter option is documented here
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch12s05...
It is enabled by writing a relevant bit in the translation table management register.
What does this mean for us? ---------------------------
It is probably the case that Linux on AArch64 is enabling VA tagging but it may not be
If VA tagging is enabled then it may be that in consequence the hardware is only passing the top 56 bits to the trap handler and, thence, to the signal handler.
Alternatively, it may be that the hardware passes the full address whatever the hardware setting and the OS trap handler is doing the masking off because it has noticed that VA tagging has been enabled.
A third possibility is that VA tagging may not be enabled but the trap handler may be doing it irrespective of the setting i.e. we may never see 0xffff.
I suspect this last case is unlikely. Anyway the upshot is we probably cannot rely on VA tagging being on or off. Nor, probably, can we rely on the top 8 bits being zeroed if it is on or being left unchanged if it is off.
Whichever of these is the case we do know that we are not using tag bits in Java. So, we should only see either 0x0000, 0x00ff or 0xffff in the top 16 bits of any offset coming into method needs_explicit_null_check.
The first case indicates an address in the valid vmem range and needs comparing to the heap base to determine whether to compare against the null check table.
The second and third cases can only arise in two ways -- either because of a Brooks pointer load via a null oop or because something tried to load a garbage address in the kernel only range.
In either of the last two cases we don't need to adjust the offset to do a heap check. This cannot be a heap address.
So, when using Shenandoah on AArch64
For the case where (top 16 bits == 0xffff) we just need to test whether (offset == -8).
For the case where (top 16 bits == 0x00ff) we can restore the top 8 bits to 0xff, on the assumption that these were the bits in the address that caused the fault i.e. offset = ((offset << 8) >> 8)] and then test whether (offset == -8).
What does this mean for the current code? -----------------------------------------
Here is the current Shenandoah repo code
bool MacroAssembler::needs_explicit_null_check(intptr_t offset) { // Exception handler checks the nmethod's implicit null checks table // only when this method returns false. #ifdef AARCH64 // AArch64 uses 48-bit addresses const uintptr_t address_bits = (uintptr_t)0xfffffffffffful; #else const uintptr_t address_bits = ~(uintptr_t)0; #endif #ifdef _LP64 if (UseCompressedOops && Universe::narrow_oop_base() != NULL) { assert (Universe::heap() != NULL, "java heap should be initialized"); // The first page after heap_base is unmapped and // the 'offset' is equal to [heap_base + offset] for // narrow oop implicit null checks. uintptr_t base = (uintptr_t)Universe::narrow_oop_base(); int adj = MIN2(0, UseShenandoahGC ? BrooksPointer::byte_offset() : 0); if ((uintptr_t)((offset - adj) & address_bits) >= base) { // Normalize offset for the next check. offset = (intptr_t)(pointer_delta((void*)offset, (void*)base, 1)); } } #endif
if (UseShenandoahGC) { if ((offset & address_bits) == (BrooksPointer::byte_offset() & address_bits)) { return false; } } return offset < 0 || os::vm_page_size() <= offset; }
To fix this I think we need to special case AArch64 processing separate from non-AArch64 processing. In the former case the Shenandoah checks can be done under #ifdef AARCH64 before the #ifdef LP64 block. This makes the check currently inserted after the #ifdef LP64 block redundant on AArch64 and hence require guarding with #ifndef AARCH64. Here is my suggested alternative:
bool MacroAssembler::needs_explicit_null_check(intptr_t offset) { // Exception handler checks the nmethod's implicit null checks table // only when this method returns false. #ifdef AARCH64 // AArch64 addresses should only have 0x0000 or 0xffff // in the top 16 bits but in a fault address the latter // may be reset to 0x00ff. Non-zero bits are only legitimate // when Shenandoah loads a Brooks pointer via a null oop // i.e. original address should be - BrooksPointer::byte_offset(). long loffset = (long)offset; long hi = loffset >> 48; int adj = BrooksPointer::byte_offset(); if (hi != 0) { if (hi == 0x00ffL || hi == 0xffffL) { // the top 8 bits may have been cleared // if so make sure they are set again loffset = (loffset << 8) >> 8); // if this is not 0 - Brooks pointer offset // we need an explicit null check return ((loffset + adj) != 0); } } #endif #ifdef _LP64 if (UseCompressedOops && Universe::narrow_oop_base() != NULL) { assert (Universe::heap() != NULL, "java heap should be initialized"); // The first page after heap_base is unmapped and // the 'offset' is equal to [heap_base + offset] for // narrow oop implicit null checks. uintptr_t base = (uintptr_t)Universe::narrow_oop_base(); int adj = MIN2(0, UseShenandoahGC ? BrooksPointer::byte_offset() : 0); if (((uintptr_t)(offset - adj)) >= base) { // Normalize offset for the next check. offset = (intptr_t)(pointer_delta((void*)offset, (void*)base, 1)); } } #endif
#ifndef AARCH64 if (UseShenandoahGC) { if (offset == BrooksPointer::byte_offset()) { return false; } } #endif return offset < 0 || os::vm_page_size() <= offset; }
Comments and or corrections welcome.
regards,
Andrew Dinn -----------
Hi Andrew, Thanks for the analysis and proposal. I find the bitwise magic a bit too magical:bool MacroAssembler::needs_explicit_null_check(intptr_t offset) {
// Exception handler checks the nmethod's implicit null checks table // only when this method returns false. #ifdef AARCH64 // AArch64 addresses should only have 0x0000 or 0xffff // in the top 16 bits but in a fault address the latter // may be reset to 0x00ff. Non-zero bits are only legitimate // when Shenandoah loads a Brooks pointer via a null oop // i.e. original address should be - BrooksPointer::byte_offset(). long loffset = (long)offset; long hi = loffset >> 48; int adj = BrooksPointer::byte_offset(); if (hi != 0) { if (hi == 0x00ffL || hi == 0xffffL) { // the top 8 bits may have been cleared // if so make sure they are set again loffset = (loffset << 8) >> 8); Why cast to long? Isn't intptr_t basically the same, but more explicitely pointer-sized? How does shifting left then right 'make sure they are set again' ? Do we want them all set? Then simply loffset |= 0xffff << 48; should do it, no? Also, it seems we want to enclose this in if (UseShenandoahGC) ?
// if this is not 0 - Brooks pointer offset // we need an explicit null check return ((loffset + adj) != 0); } }
And if not, then we fall through. Is this the intention?
#endif #ifdef _LP64
Does AArch64 define _LP64 ?
if (UseCompressedOops && Universe::narrow_oop_base() != NULL) { assert (Universe::heap() != NULL, "java heap should be initialized"); // The first page after heap_base is unmapped and // the 'offset' is equal to [heap_base + offset] for // narrow oop implicit null checks. uintptr_t base = (uintptr_t)Universe::narrow_oop_base(); int adj = MIN2(0, UseShenandoahGC ? BrooksPointer::byte_offset() : 0); if (((uintptr_t)(offset - adj)) >= base) { // Normalize offset for the next check. offset = (intptr_t)(pointer_delta((void*)offset, (void*)base, 1)); } } #endif
#ifndef AARCH64 if (UseShenandoahGC) { if (offset == BrooksPointer::byte_offset()) { return false; } } #endif return offset < 0 || os::vm_page_size() <= offset; }
Comments and or corrections welcome.
regards,
Andrew Dinn -----------
I don't much like littering shared code with #ifdef SOMEARCH... Maybe this stuff should be moved to src/cpu somewhere? Roman
On 26/07/17 17:30, Roman Kennke wrote:
Thanks for the analysis and proposal.
I find the bitwise magic a bit too magical:bool
Well personal mileage does vary ...
Why cast to long? Isn't intptr_t basically the same, but more explicitely pointer-sized?
Sure, if the left shift right shift directly on an intptr_t does the right thing then use it instead.
How does shifting left then right 'make sure they are set again' ? Do we want them all set? Then simply loffset |= 0xffff << 48; should do it, no?
Yes, we want them all set so the result becomes a proper negative 2's complement number. We know the 55th bit is 1 so left shift and arithmetic right shift by 8 will ensure bits 56-63 are set. Why not use an or? Because loading the constant would take more instructions than lsh, rsh.
Also, it seems we want to enclose this in if (UseShenandoahGC) Oops, yes! I had that in the version I started writing into the original reply but then lost it when I transcribed between posts and re-edited!
I think we actually need: if (UseShenandoahGC && (hi == 0x00ffL || hi == 0xffffL)) { ... return ((loffset + adj) != 0); } else { return true; }
And if not, then we fall through. Is this the intention?
No, see above. We only continue to the range check when we have a positive top 16 bits because this is just a normal user space pointer which requires a normal pointer comparison against the heap.
Does AArch64 define _LP64 ?
Yes, it does.
I don't much like littering shared code with #ifdef SOMEARCH... Maybe this stuff should be moved to src/cpu somewhere? Preferably, yes. In practice, however, there are a few such occurrences in shared code. My concern was not to relocate the ifdef but just to make it clearer what was happening on AArch64 and why.
regards, Andrew Dinn -----------
On 26/07/17 18:00, Andrew Dinn wrote:
Preferably, yes. In practice, however, there are a few such occurrences in shared code. My concern was not to relocate the ifdef but just to make it clearer what was happening on AArch64 and why.
:-) -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
On 26/07/17 18:02, Andrew Haley wrote:
On 26/07/17 18:00, Andrew Dinn wrote:
Preferably, yes. In practice, however, there are a few such occurrences in shared code. My concern was not to relocate the ifdef but just to make it clearer what was happening on AArch64 and why.
:-) Hmm, yes, I know. I obviously made it much clearer!
regards, Andrew Dinn -----------
participants (3)
-
Andrew Dinn
-
Andrew Haley
-
Roman Kennke