RFR: 8148783: aarch64: SEGV running SpecJBB2013
Hi, Please review the following webrev http://cr.openjdk.java.net/~enevill/8148783/webrev.0/ JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8148783 The bug is explained in some detail in the JIRA issue. The problem is that the sign is not preserved in the following code from adrp(...) long offset = dest_page - pc_page; offset = (offset & ((1<<20)-1)) << 12; This generally works because the following movk overwrites bits 32..47 However on larger memory systems of 256 Gb it could happen that the PC address was 0x0000ffffXXXXXXXX in which case the falsely positive offset could wrap to 0x00010000XXXXXXXX Bit 48 does not get overwritten by the following movk, hence forming an invalid address. The solution is to use int32_t for offset instead of long, so it gets sign extended correctly when added to the pc(). All the best, Ed.
Hi, On 02/01/2016 08:33 PM, Edward Nevill wrote:
JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8148783
The bug is explained in some detail in the JIRA issue.
The problem is that the sign is not preserved in the following code from adrp(...)
long offset = dest_page - pc_page; offset = (offset & ((1<<20)-1)) << 12;
This generally works because the following movk overwrites bits 32..47
However on larger memory systems of 256 Gb it could happen that the PC address was
0x0000ffffXXXXXXXX
in which case the falsely positive offset could wrap to
0x00010000XXXXXXXX
Bit 48 does not get overwritten by the following movk, hence forming an invalid address.
The solution is to use int32_t for offset instead of long, so it gets sign extended correctly when added to the pc().
I can't accept that patch because the overflowing assignment from long to int32_t is undefined behaviour. It is also very obscure code. Can you test the patch I've appended instead? It tiptoes around the UB and should be OK. Thanks, Andrew. diff --git a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp @@ -3980,6 +3980,14 @@ return inst_mark(); } +int64_t MacroAssembler::truncate_signed_bitfield(int64_t n, int width) { + // Left shifts of a signed integer are UB in Standard C++ but + // well-defined in GNU C++. + n <<= 64 - width; + n >>= 64 - width; + return n; +} + void MacroAssembler::adrp(Register reg1, const Address &dest, unsigned long &byte_offset) { relocInfo::relocType rtype = dest.rspec().reloc()->type(); unsigned long low_page = (unsigned long)CodeCache::low_bound() >> 12; @@ -3999,8 +4007,18 @@ _adrp(reg1, dest.target()); } else { unsigned long pc_page = (unsigned long)pc() >> 12; - long offset = dest_page - pc_page; - offset = (offset & ((1<<20)-1)) << 12; + unsigned long page_offset = dest_page - pc_page; + + // The signed offset (in 4k pages) from PC to dest page. We use a + // reference in order to avoid UB when converting from unsigned to + // signed. + long offset = reinterpret_cast<long&>(page_offset); + + // The signed offset (in bytes) from the PC to the destination + // page. We only want the 32 LSBs of the offset because the range + // of ADRP is +-2G, i.e. 32 bits. + offset = truncate_signed_bitfield(offset << 12, 32); + _adrp(reg1, pc()+offset); movk(reg1, (unsigned long)dest.target() >> 32, 32); } diff --git a/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp b/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp --- a/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp +++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp @@ -85,9 +85,10 @@ void call_VM_helper(Register oop_result, address entry_point, int number_of_arguments, bool check_exceptions = true); - // Maximum size of class area in Metaspace when compressed uint64_t use_XOR_for_compressed_class_base; + int64_t truncate_signed_bitfield(int64_t n, int width); + public: MacroAssembler(CodeBuffer* code) : Assembler(code) { use_XOR_for_compressed_class_base
On Tue, 2016-02-02 at 15:16 +0000, Andrew Haley wrote:
Hi,
On 02/01/2016 08:33 PM, Edward Nevill wrote:
JIRA Issue: https://bugs.openjdk.java.net/browse/JDK-8148783 Can you test the patch I've appended instead? It tiptoes around the UB and should be OK.
Hi, Unfortunately this still fails. I have written a small simulacrum of the problem in C below. The following is the output. ed@arm64:~/tmp/adrp$ ./adrp original_adrp: pc = 0xffff70000000, dest = 0xfffe00000000, offset = 0x90000000, addr = 0x1000000000000 original_adrp: pc = 0xfffffffff000, dest = 0xfffe00000000, offset = 0x1000, addr = 0x1000000000000 new_adrp: pc = 0xffff70000000, dest = 0xfffe00000000, offset = 0xffffffff90000000, addr = 0xffff00000000 new_adrp: pc = 0xfffffffff000, dest = 0xfffe00000000, offset = 0x1000, addr = 0x1000000000000 <<<<< HERE bit 48 set The original generated an invalid address in both cases (where offset is +ve and -ve). The new version generates the correct output when the offset is -ve, however a +ve offset still generates an address with bit 48 set. A second problem is the following code in pd_patch_instruction // movk #imm16<<32 Instruction_aarch64::patch(branch + 4, 20, 5, (uint64_t)target >> 32); offset &= (1<<20)-1; instructions = 2; This is essentially doing the same thing as the original adrp, so even when the original adrp got the instruction correct the subsequent patching broke it again. I have attached a new webrev which fixes both these issues in a much simpler manner. http://cr.openjdk.java.net/~enevill/8148783/webrev.2 The key is to construct the instructions exactly as we are using them. When we use an adrp/movk combination to construct a 48 bit address we are using the adrp to construct the bottom 32 bits (with the bottom 12 bit 0) and the movk to construction bits 32..47 overwriting any values the adrp may have put in bits 32..47 So the instruction sequence is adrp Xn, 0xXXXXAAAAA000 movk Xn, 0xAAAA00000000 Where A represents required bits of the address and XXXX represent don't care bits. The only requirement on the XXXX bits is that they must be reachable using the adrp instruction. The webrev ensures this by using bits 32..47 from the PC and bits 0..31 from the destination address. The fact that we use the XXXX bits from the PC ensures the requirement that the address is reachable and using only the bottom 32 bits of the dest ensures we only get the bits we actually want the adrp instruction to construct and not any extraneous bits in bits 48 etc. The code that does this is unsigned long adrp_target = (target & 0xffffffffUL) | (source & 0xffff00000000UL); and this is also reflected in pd_patch_instruction to calculate the adrp target there. All the best, Ed. --- adrp.c --- #include <stdio.h> void original_adrp(unsigned long pc, unsigned long dest) { unsigned long dest_page = dest >> 12; unsigned long pc_page = pc >> 12; long offset = dest_page - pc_page; offset = (offset & ((1<<20)-1)) << 12; printf("original_adrp: pc = 0x%lx, dest = 0x%lx, offset = 0x%lx, addr = 0x%lx\n", pc, dest, offset, pc+offset); } long truncate_signed_bitfield(long n, int width) { // Left shifts of a signed integer are UB in Standard C++ but // well-defined in GNU C++. n <<= 64 - width; n >>= 64 - width; return n; } void new_adrp(unsigned long pc, unsigned long dest) { unsigned long dest_page = dest >> 12; unsigned long pc_page = pc >> 12; unsigned long page_offset = dest_page - pc_page; long offset = page_offset; offset = truncate_signed_bitfield(offset << 12, 32); printf("new_adrp: pc = 0x%lx, dest = 0x%lx, offset = 0x%lx, addr = 0x% lx\n", pc, dest, offset, pc+offset); } int main(void) { original_adrp(0x0000ffff70000000, 0x0000fffe00000000); original_adrp(0x0000fffffffff000, 0x0000fffe00000000); new_adrp(0x0000ffff70000000, 0x0000fffe00000000); new_adrp(0x0000fffffffff000, 0x0000fffe00000000); } --- cut here ---
On 02/04/2016 04:46 PM, Edward Nevill wrote:
The webrev ensures this by using bits 32..47 from the PC and bits 0..31 from the destination address. The fact that we use the XXXX bits from the PC ensures the requirement that the address is reachable and using only the bottom 32 bits of the dest ensures we only get the bits we actually want the adrp instruction to construct and not any extraneous bits in bits 48 etc.
The code that does this is
unsigned long adrp_target = (target & 0xffffffffUL) | (source & 0xffff00000000UL);
and this is also reflected in pd_patch_instruction to calculate the adrp target there.
Much better, but this still is confusing. Surely you can do unsigned long target = (unsigned long)dest.target(); unsigned long adrp_target = (target & 0xffffffffUL) | ((unsigned long)pc() & 0xffff00000000UL); _adrp(reg1, (address)adrp_target); movk(reg1, target >> 32, 32); } "source" doesn't really mean anything here. OK with that change. Andrew.
participants (2)
-
Andrew Haley
-
Edward Nevill