New subject: Problem in MacroAssembler::needs_explicit_null_check

26 Jul 2017

      The Problem
-----------

This is a follow-up of my review of the Shenandoah patch addressing a
curiosity that was found in the fault address values passed into
MacroAssembler::needs_explicit_null_check from the SEGV handler. It
turns out that when we dereference address -8 (aka 0xffff_ffff_ffff_fff8
aka Brooks pointer load via null) the fault address passed into the SEGV
handler in siginfo->si_addr is 0x00ff_ffff_ffff_fff8. The top 8 bits of
the address are zeroed.

Why only 56 bits of address? Indeed, why are there not 48 bits of
address? The relevant info can be found at

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/BABBEFA...

To summarize:

The virtual address space requires use of all 64 bits but it is in two
parts with a hole in the middle i.e. addresses in the range
0x1_0000_0000_0000 to 0xFFFE_FFFF_FFFF_FFFF inclusive are not mappable.
The top and bottom mappable regions each allow for 48 bits worth of
vmem. So, the total available space adds up to 2 * 48 bits worth of
space. However these two regions can only be mapped onto an underlying
physical address space of 48 bits. So, only 48 bits worth of vmem can be
in use at any given time.

There are two corresponding sets of vmem to pmem translation tables, one
for use in app and kernel land (TTBR0) the other kernel land only (TTBR1).

  TTRB0 maps from the low 48 bit space i.e. top 16 bits of vmem
addresses are all 0, TTBR0 mappings are active in all execution levels
(0-3).

  TTBR1 maps from the high 48-bit space where the top 16 bits are all 1.
TTBR1 mappings are only active at execution level 3. So, accessing an
address in this range from user land results in an automatic SEGV.

The gap between these two spaces (i.e. where the top 16 bits are mixed)
cannot be mapped via either translation table.

The crucial comment on the page linked above is

  "You can enable VA tagging to exclude the top eight bits from the check"

In this case the top 8 bits of an address used by a program /don't/
actually need to be all 0s or all 1s for addressing still to work. Only
the top 56 bits are assumed to be valid.

This latter option is documented here

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch12s05...

It is enabled by writing a relevant bit in the translation table
management register.

What does this mean for us?
---------------------------

It is probably the case that Linux on AArch64 is enabling VA tagging but
it may not be

If VA tagging is enabled then it may be that in consequence the hardware
is only passing the top 56 bits to the trap handler and, thence, to the
signal handler.

Alternatively, it may be that the hardware passes the full address
whatever the hardware setting and the OS trap handler is doing the
masking off because it has noticed that VA tagging has been enabled.

A third possibility is that VA tagging may not be enabled but the trap
handler may be doing it irrespective of the setting i.e. we may never
see 0xffff.

I suspect this last case is unlikely. Anyway the upshot is we probably
cannot rely on VA tagging being on or off. Nor, probably, can we rely on
the top 8 bits being zeroed if it is on or being left unchanged if it is
off.

Whichever of these is the case we do know that we are not using tag bits
in Java. So, we should only see either 0x0000, 0x00ff or 0xffff in the
top 16 bits of any offset coming into method needs_explicit_null_check.

  The first case indicates an address in the valid vmem range and needs
comparing to the heap base to determine whether to compare against the
null check table.

  The second and third cases can only arise in two ways -- either
because of a Brooks pointer load via a null oop or because something
tried to load a garbage address in the kernel only range.

In either of the last two cases we don't need to adjust the offset to do
a heap check. This cannot be a heap address.

So, when using Shenandoah on AArch64

  For the case where (top 16 bits == 0xffff) we just need to test
whether (offset == -8).

  For the case where (top 16 bits == 0x00ff) we can restore the top 8
bits to 0xff, on the assumption that these were the bits in the address
that caused the fault i.e.  offset = ((offset << 8) >> 8)] and then test
whether (offset == -8).

What does this mean for the current code?
-----------------------------------------

Here is the current Shenandoah repo code

bool MacroAssembler::needs_explicit_null_check(intptr_t offset) {
  // Exception handler checks the nmethod's implicit null checks table
  // only when this method returns false.
#ifdef AARCH64
  // AArch64 uses 48-bit addresses
  const uintptr_t address_bits = (uintptr_t)0xfffffffffffful;
#else
  const uintptr_t address_bits = ~(uintptr_t)0;
#endif
#ifdef _LP64
  if (UseCompressedOops && Universe::narrow_oop_base() != NULL) {
    assert (Universe::heap() != NULL, "java heap should be initialized");
    // The first page after heap_base is unmapped and
    // the 'offset' is equal to [heap_base + offset] for
    // narrow oop implicit null checks.
    uintptr_t base = (uintptr_t)Universe::narrow_oop_base();
    int adj = MIN2(0, UseShenandoahGC ? BrooksPointer::byte_offset() : 0);
    if ((uintptr_t)((offset - adj) & address_bits) >= base) {
      // Normalize offset for the next check.
      offset = (intptr_t)(pointer_delta((void*)offset, (void*)base, 1));
    }
  }
#endif

  if (UseShenandoahGC) {
    if ((offset & address_bits) == (BrooksPointer::byte_offset() &
address_bits)) {
      return false;
    }
  }
  return offset < 0 || os::vm_page_size() <= offset;
}

To fix this I think we need to special case AArch64 processing separate
from non-AArch64 processing. In the former case the Shenandoah checks
can be done under #ifdef AARCH64 before the #ifdef LP64 block. This
makes the check currently inserted after the #ifdef LP64 block redundant
on AArch64 and hence require guarding with #ifndef AARCH64. Here is my
suggested alternative:

bool MacroAssembler::needs_explicit_null_check(intptr_t offset) {
  // Exception handler checks the nmethod's implicit null checks table
  // only when this method returns false.
#ifdef AARCH64
  // AArch64 addresses should only have 0x0000 or 0xffff
  // in the top 16 bits but in a fault address the latter
  // may be reset to 0x00ff. Non-zero bits are only legitimate
  // when Shenandoah loads a Brooks pointer via a null oop
  // i.e. original address should be - BrooksPointer::byte_offset().
  long loffset = (long)offset;
  long hi = loffset >> 48;
  int adj = BrooksPointer::byte_offset();
  if (hi != 0) {
    if (hi == 0x00ffL || hi == 0xffffL) {
      // the top 8 bits may have been cleared
      // if so make sure they are set again
      loffset = (loffset << 8) >> 8);
      // if this is not 0 - Brooks pointer offset
      // we need an explicit null check
      return ((loffset + adj) != 0);
    }
  }
#endif
#ifdef _LP64
  if (UseCompressedOops && Universe::narrow_oop_base() != NULL) {
    assert (Universe::heap() != NULL, "java heap should be initialized");
    // The first page after heap_base is unmapped and
    // the 'offset' is equal to [heap_base + offset] for
    // narrow oop implicit null checks.
    uintptr_t base = (uintptr_t)Universe::narrow_oop_base();
    int adj = MIN2(0, UseShenandoahGC ? BrooksPointer::byte_offset() : 0);
    if (((uintptr_t)(offset - adj)) >= base) {
      // Normalize offset for the next check.
      offset = (intptr_t)(pointer_delta((void*)offset, (void*)base, 1));
    }
  }
#endif

#ifndef AARCH64
  if (UseShenandoahGC) {
    if (offset == BrooksPointer::byte_offset()) {
      return false;
    }
  }
#endif
  return offset < 0 || os::vm_page_size() <= offset;
}

Comments and or corrections welcome.

regards,

Andrew Dinn
-----------

Problem in MacroAssembler::needs_explicit_null_check

Andrew Dinn

Andrew Dinn

Roman Kennke

Andrew Dinn

Andrew Haley

Andrew Dinn

tags

participants (3)