The Problem ----------- This is a follow-up of my review of the Shenandoah patch addressing a curiosity that was found in the fault address values passed into MacroAssembler::needs_explicit_null_check from the SEGV handler. It turns out that when we dereference address -8 (aka 0xffff_ffff_ffff_fff8 aka Brooks pointer load via null) the fault address passed into the SEGV handler in siginfo->si_addr is 0x00ff_ffff_ffff_fff8. The top 8 bits of the address are zeroed. Why only 56 bits of address? Indeed, why are there not 48 bits of address? The relevant info can be found at http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/BABBEFA... To summarize: The virtual address space requires use of all 64 bits but it is in two parts with a hole in the middle i.e. addresses in the range 0x1_0000_0000_0000 to 0xFFFE_FFFF_FFFF_FFFF inclusive are not mappable. The top and bottom mappable regions each allow for 48 bits worth of vmem. So, the total available space adds up to 2 * 48 bits worth of space. However these two regions can only be mapped onto an underlying physical address space of 48 bits. So, only 48 bits worth of vmem can be in use at any given time. There are two corresponding sets of vmem to pmem translation tables, one for use in app and kernel land (TTBR0) the other kernel land only (TTBR1). TTRB0 maps from the low 48 bit space i.e. top 16 bits of vmem addresses are all 0, TTBR0 mappings are active in all execution levels (0-3). TTBR1 maps from the high 48-bit space where the top 16 bits are all 1. TTBR1 mappings are only active at execution level 3. So, accessing an address in this range from user land results in an automatic SEGV. The gap between these two spaces (i.e. where the top 16 bits are mixed) cannot be mapped via either translation table. The crucial comment on the page linked above is "You can enable VA tagging to exclude the top eight bits from the check" In this case the top 8 bits of an address used by a program /don't/ actually need to be all 0s or all 1s for addressing still to work. Only the top 56 bits are assumed to be valid. This latter option is documented here http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch12s05... It is enabled by writing a relevant bit in the translation table management register. What does this mean for us? --------------------------- It is probably the case that Linux on AArch64 is enabling VA tagging but it may not be If VA tagging is enabled then it may be that in consequence the hardware is only passing the top 56 bits to the trap handler and, thence, to the signal handler. Alternatively, it may be that the hardware passes the full address whatever the hardware setting and the OS trap handler is doing the masking off because it has noticed that VA tagging has been enabled. A third possibility is that VA tagging may not be enabled but the trap handler may be doing it irrespective of the setting i.e. we may never see 0xffff. I suspect this last case is unlikely. Anyway the upshot is we probably cannot rely on VA tagging being on or off. Nor, probably, can we rely on the top 8 bits being zeroed if it is on or being left unchanged if it is off. Whichever of these is the case we do know that we are not using tag bits in Java. So, we should only see either 0x0000, 0x00ff or 0xffff in the top 16 bits of any offset coming into method needs_explicit_null_check. The first case indicates an address in the valid vmem range and needs comparing to the heap base to determine whether to compare against the null check table. The second and third cases can only arise in two ways -- either because of a Brooks pointer load via a null oop or because something tried to load a garbage address in the kernel only range. In either of the last two cases we don't need to adjust the offset to do a heap check. This cannot be a heap address. So, when using Shenandoah on AArch64 For the case where (top 16 bits == 0xffff) we just need to test whether (offset == -8). For the case where (top 16 bits == 0x00ff) we can restore the top 8 bits to 0xff, on the assumption that these were the bits in the address that caused the fault i.e. offset = ((offset << 8) >> 8)] and then test whether (offset == -8). What does this mean for the current code? ----------------------------------------- Here is the current Shenandoah repo code bool MacroAssembler::needs_explicit_null_check(intptr_t offset) { // Exception handler checks the nmethod's implicit null checks table // only when this method returns false. #ifdef AARCH64 // AArch64 uses 48-bit addresses const uintptr_t address_bits = (uintptr_t)0xfffffffffffful; #else const uintptr_t address_bits = ~(uintptr_t)0; #endif #ifdef _LP64 if (UseCompressedOops && Universe::narrow_oop_base() != NULL) { assert (Universe::heap() != NULL, "java heap should be initialized"); // The first page after heap_base is unmapped and // the 'offset' is equal to [heap_base + offset] for // narrow oop implicit null checks. uintptr_t base = (uintptr_t)Universe::narrow_oop_base(); int adj = MIN2(0, UseShenandoahGC ? BrooksPointer::byte_offset() : 0); if ((uintptr_t)((offset - adj) & address_bits) >= base) { // Normalize offset for the next check. offset = (intptr_t)(pointer_delta((void*)offset, (void*)base, 1)); } } #endif if (UseShenandoahGC) { if ((offset & address_bits) == (BrooksPointer::byte_offset() & address_bits)) { return false; } } return offset < 0 || os::vm_page_size() <= offset; } To fix this I think we need to special case AArch64 processing separate from non-AArch64 processing. In the former case the Shenandoah checks can be done under #ifdef AARCH64 before the #ifdef LP64 block. This makes the check currently inserted after the #ifdef LP64 block redundant on AArch64 and hence require guarding with #ifndef AARCH64. Here is my suggested alternative: bool MacroAssembler::needs_explicit_null_check(intptr_t offset) { // Exception handler checks the nmethod's implicit null checks table // only when this method returns false. #ifdef AARCH64 // AArch64 addresses should only have 0x0000 or 0xffff // in the top 16 bits but in a fault address the latter // may be reset to 0x00ff. Non-zero bits are only legitimate // when Shenandoah loads a Brooks pointer via a null oop // i.e. original address should be - BrooksPointer::byte_offset(). long loffset = (long)offset; long hi = loffset >> 48; int adj = BrooksPointer::byte_offset(); if (hi != 0) { if (hi == 0x00ffL || hi == 0xffffL) { // the top 8 bits may have been cleared // if so make sure they are set again loffset = (loffset << 8) >> 8); // if this is not 0 - Brooks pointer offset // we need an explicit null check return ((loffset + adj) != 0); } } #endif #ifdef _LP64 if (UseCompressedOops && Universe::narrow_oop_base() != NULL) { assert (Universe::heap() != NULL, "java heap should be initialized"); // The first page after heap_base is unmapped and // the 'offset' is equal to [heap_base + offset] for // narrow oop implicit null checks. uintptr_t base = (uintptr_t)Universe::narrow_oop_base(); int adj = MIN2(0, UseShenandoahGC ? BrooksPointer::byte_offset() : 0); if (((uintptr_t)(offset - adj)) >= base) { // Normalize offset for the next check. offset = (intptr_t)(pointer_delta((void*)offset, (void*)base, 1)); } } #endif #ifndef AARCH64 if (UseShenandoahGC) { if (offset == BrooksPointer::byte_offset()) { return false; } } #endif return offset < 0 || os::vm_page_size() <= offset; } Comments and or corrections welcome. regards, Andrew Dinn -----------