RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base
Lutz Schmidt
lucy at openjdk.java.net
Wed Feb 17 16:58:40 UTC 2021
On Tue, 16 Feb 2021 20:49:49 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
> If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer.
>
> This can be reproduced by starting the VM with
> -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m
> but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination:
> - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000.
> - class space follows at 0x8800_0000
> - the narrow klass pointer base points to the start of the class space at 0x8800_0000.
>
> In MacroAssembler::encode_klass_not_null(), there is the following section:
>
> if (base != NULL) {
> unsigned int base_h = ((unsigned long)base)>>32;
> unsigned int base_l = (unsigned int)((unsigned long)base);
> if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) {
> lgr_if_needed(dst, current);
> z_aih(dst, -((int)base_h)); // Base has no set bits in lower half.
> } else if ((base_h == 0) && (base_l != 0)) { (A)
> lgr_if_needed(dst, current);
> z_agfi(dst, -(int)base_l); (B)
> } else {
> load_const(Z_R0, base);
> lgr_if_needed(dst, current);
> z_sgr(dst, Z_R0);
> }
> current = dst;
> }
>
> We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000:
>
> In the case of the crash, we have:
> base: 8800_0000
> klass pointer: 8804_1040
> 32bit two's complement of base: 7800_0000
> added to the klass pointer: 1_0004_1040
>
> So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs.
>
> This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552).
>
> ================
>
> Fix:
>
> I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit.
>
> I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any.
>
> ----
>
> Tests:
>
> I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right.
>
> I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base).
>
> I used this method to test various combinations:
> - narrow klass pointer base > 0 < 4g + ccs end < 4g (we hit our branch doing AFI)
> - narrow klass pointer base > 0 < 4g + ccs end > 4g (we hit the fallback doing SGR with r0)
> - narrow klass pointer base = 0 (we dont do anything)
>
> (would this override-feature be useful? We could do better testing).
>
> Thanks, Thomas
src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3645:
> 3643: }
> 3644: #endif
> 3645:
I do not like the cross-dependency to metaspace.hpp just for the sake of checking an artificial restriction on Klass pointers. And by the way, you could do the check with one test:
z_oihf(current, 0);
z_brc(Assembler::bcondZero, ok);
z_oihf() does modify the contents of register current, but it writes back the same value.
src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3657:
> 3655: } else {
> 3656: load_const(Z_R0, base);
> 3657: lgr_if_needed(dst, current);
What would you think of a more general rework like this? The comments in the code should explain the intentions/assumptions/conclusions.
// Klass oop manipulations if compressed.
void MacroAssembler::encode_klass_not_null(Register dst, Register src) {
Register current = (src != noreg) ? src : dst; // Klass is in dst if no src provided. (dst == src) also possible.
address base = CompressedKlassPointers::base();
int shift = CompressedKlassPointers::shift();
bool need_zero_extend = false;
assert(UseCompressedClassPointers, "only for compressed klass ptrs");
BLOCK_COMMENT("cKlass encoder {");
#ifdef ASSERT
Label ok;
z_tmll(current, KlassAlignmentInBytes-1); // Check alignment.
z_brc(Assembler::bcondAllZero, ok);
// The plain disassembler does not recognize illtrap. It instead displays
// a 32-bit value. Issueing two illtraps assures the disassembler finds
// the proper beginning of the next instruction.
z_illtrap(0xee);
z_illtrap(0xee);
bind(ok);
#endif
// Scale down the incoming klass pointer first.
// We then can be sure we calculate an offset that fits into 32 bit.
// More generally speaking: all subsequent calculations are purely 32-bit.
if (shift != 0) {
assert (LogKlassAlignmentInBytes == shift, "decode alg wrong");
z_srlg(dst, current, shift);
need_zero_extend = true;
current = dst;
}
if (base != NULL) {
// Use scaled-down base address parts to match scaled-down klass pointer.
unsigned int base_h = ((unsigned long)base)>>(32+shift);
unsigned int base_l = (unsigned int)(((unsigned long)base)>>shift);
// General considerations:
// - when calculating (current_h - base_h), all digits must cancel (become 0).
// Otherwise, we would end up with a compressed klass pointer which doesn't
// fit into 32-bit.
// - Only bit#33 of the difference could potentially be non-zero. For that
// to happen, (current_l < base_l) must hold. In this case, the subtraction
// will create a borrow out of bit#32, nicely killing bit#33.
// - With the above, we only need to consider current_l and base_l to
// calculate the result.
// - Both values are treated as unsigned. The unsigned subtraction is
// replaced by adding (unsigned) the 2's complement of the subtrahend.
if (base_l == 0) {
// - By theory, the calculation to be performed here (current_h - base_h) MUST
// cancel all high-word bits. Otherwise, we would end up with an offset
// (i.e. compressed klass pointer) that does not fit into 32 bit.
// - current_l remains unchanged.
// - Therefore, we can replace all calculation with just a
// zero-extending load 32 to 64 bit.
// - Even that can be replaced with a conditional load if dst != current.
// (this is a local view. The shift step may have requested zero-extension).
} else {
// To begin with, we may need to copy and/or zero-extend the register operand.
// We have to calculate (current_l - base_l). Because there is no unsigend
// subtract instruction with immediate operand, we add the 2's complement of base_l.
if (need_zero_extend) {
z_llgfr(dst, current);
need_zero_extend = false;
} else {
llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost.
}
current = dst;
z_alfi(dst, -(int)base_l);
}
if (need_zero_extend) {
// We must zero-extend the calculated result. It may have some leftover bits in
// the hi-word because we only did optimized calculations.
z_llgfr(dst, current);
} else {
llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost.
}
BLOCK_COMMENT("} cKlass encoder");
}
-------------
PR: https://git.openjdk.java.net/jdk/pull/2595
More information about the hotspot-dev
mailing list