RFR: 8252500: ZGC on aarch64: Unable to allocate heap for certain Linux kernel configurations

Christoph Göttschkes cgo at openjdk.java.net
Mon Sep 7 10:02:22 UTC 2020


On Mon, 7 Sep 2020 09:11:28 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:

>> The patch introduces a new function to probe for the highest valid bit in the virtual address space for userspace
>> programs on Linux.
>> I guarded the whole implementation to only probe on Linux, other platforms will remain unaffected. Possibly, it would
>> be nicer to move the probing code into an OS+ARCH specific source file. But since this is only a single function, I
>> thought it would be better to put it right next to the caller and guard it with an #ifdef LINUX.  The probing mechanism
>> uses a combination of msync + mmap, to first check if the address is valid using msync (if msync succeeds, the address
>> was valid). If msync fails, mmap is used to check if msync failed because the memory wasn't mapped, or if it failed
>> because the address is invalid. Due to some undefined behavior (documented in the msync man page), I also use a single
>> mmap at the end, if the msync approach failed before.  I tested msync with different combinations of mappings, and also
>> with sbrk, and it always succeeded, or failed with ENOMEM. I never got back any other error code.  The specified
>> minimum value has been chosen "randomly". The JVM terminates (unable to allocate heap), if this minimum value is
>> smaller than the requested Java Heap size, so it might be better to make the minimum dependent on the MaxHeapSize and
>> not a compile time constant? I didn't want to make the minimum too big, since for aarch64 on Linux, the documented
>> minimum would be 38 (see [1]).   I avoided MAP_FIXED_NOREPLACE, because according to the man page, it has been added in
>> Linux 4.17. There are still plenty of stable kernel versions around which will not have that feature, which means we
>> need to implement a workaround for it. Some of my test devices also have a kernel version lower than that.  I executed
>> the HotSpot tier1 JTreg tests on two different aarch64 devices. One with 4KB pages and 3 page levels and the other with
>> 4KB pages and 4 page levels. Tests passed on both devices.  [1]
>> https://www.kernel.org/doc/Documentation/arm64/memory.txt
>
> src/hotspot/cpu/aarch64/gc/z/zGlobals_aarch64.cpp line 156:
> 
>> 154:       max_address_bit = i;
>> 155:       break;
>> 156:     }
> 
> Is there a one-off error here? Taking i == 47 as an example. This means that you test the base_address == '10000000
> 00000000 00000000 00000000 00000000 00000000' (in bits). That is, you test that the address range with the 48th bit set
> (128T-256T) is usable. However, when 47 then is returned to the caller, it interprets it as if the 64T-128T range is
> usable.

Good catch. I verified your assumption by doing the following:
I returned the result of probe_valid_max_address_bit() + 1 from ZPlatformAddressOffsetBits(). I then executed a VM on a
platform with only 3 pagetable levels, which has less than DEFAULT_MAX_ADDRESS_BIT bits available in the address space.
The heap allocation succeeded, which means the current patch has a one off error.

However, I would fix this in ZPlatformAddressOffsetBits(). because I interpret the result of
probe_valid_max_address_bit() as to be the highest bit index of an address which can be allocated. Which means that
mmap the result of 1 << probe_valid_max_address_bit() should succeed (if the page is not already mapped). I think
ZPlatformAddressOffsetBits() should add 1 to the result of probe_valid_max_address_bit().

-------------

PR: https://git.openjdk.java.net/jdk/pull/40



More information about the hotspot-gc-dev mailing list