RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base

Wed Feb 17 16:58:40 UTC 2021

On Tue, 16 Feb 2021 20:49:49 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer.
> 
> This can be reproduced by starting the VM with
> -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m
> but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination:
> - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000.
> - class space follows at 0x8800_0000
> - the narrow klass pointer base points to the start of the class space at 0x8800_0000.
> 
> In MacroAssembler::encode_klass_not_null(), there is the following section:
> 
>   if (base != NULL) {
>     unsigned int base_h = ((unsigned long)base)>>32;
>     unsigned int base_l = (unsigned int)((unsigned long)base);
>     if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) {
>       lgr_if_needed(dst, current);
>       z_aih(dst, -((int)base_h));     // Base has no set bits in lower half.
>     } else if ((base_h == 0) && (base_l != 0)) {   (A)
>       lgr_if_needed(dst, current);                
>       z_agfi(dst, -(int)base_l);                   (B)
>     } else {
>       load_const(Z_R0, base);
>       lgr_if_needed(dst, current);
>       z_sgr(dst, Z_R0);
>     }
>     current = dst;
>   }
> 
> We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000:
> 
> In the case of the crash, we have:
> base: 				  8800_0000
> klass pointer: 			  8804_1040
> 32bit two's complement of base:   7800_0000
> added to the klass pointer: 	1_0004_1040
> 
> So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs.
> 
> This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552).
> 
> ================
> 
> Fix:
> 
> I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit.
> 
> I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any.
> 
> ----
> 
> Tests:
> 
> I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right.
> 
> I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base). 
> 
> I used this method to test various combinations: 
> - narrow klass pointer base > 0 < 4g + ccs end < 4g  (we hit our branch doing AFI)
> - narrow klass pointer base > 0 < 4g + ccs end > 4g  (we hit the fallback doing SGR with r0)
> - narrow klass pointer base = 0                      (we dont do anything)
> 
> (would this override-feature be useful? We could do better testing).
> 
> Thanks, Thomas

src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3645:

> 3643:   }
> 3644: #endif
> 3645: 

I do not like the cross-dependency to metaspace.hpp just for the sake of checking an artificial restriction on Klass pointers. And by the way, you could do the check with one test:
  z_oihf(current, 0);
  z_brc(Assembler::bcondZero, ok);

z_oihf() does modify the contents of register current, but it writes back the same value.

src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3657:

> 3655:     } else {
> 3656:       load_const(Z_R0, base);
> 3657:       lgr_if_needed(dst, current);

What would you think of a more general rework like this? The comments in the code should explain the intentions/assumptions/conclusions.

// Klass oop manipulations if compressed.
void MacroAssembler::encode_klass_not_null(Register dst, Register src) {
  Register current = (src != noreg) ? src : dst; // Klass is in dst if no src provided. (dst == src) also possible.
  address  base    = CompressedKlassPointers::base();
  int      shift   = CompressedKlassPointers::shift();
  bool     need_zero_extend = false;
  assert(UseCompressedClassPointers, "only for compressed klass ptrs");

  BLOCK_COMMENT("cKlass encoder {");

#ifdef ASSERT
  Label ok;
  z_tmll(current, KlassAlignmentInBytes-1); // Check alignment.
  z_brc(Assembler::bcondAllZero, ok);
  // The plain disassembler does not recognize illtrap. It instead displays
  // a 32-bit value. Issueing two illtraps assures the disassembler finds
  // the proper beginning of the next instruction.
  z_illtrap(0xee);
  z_illtrap(0xee);
  bind(ok);
#endif

  // Scale down the incoming klass pointer first.
  // We then can be sure we calculate an offset that fits into 32 bit.
  // More generally speaking: all subsequent calculations are purely 32-bit.
  if (shift != 0) {
    assert (LogKlassAlignmentInBytes == shift, "decode alg wrong");
    z_srlg(dst, current, shift);
    need_zero_extend = true;
    current = dst;
  }

  if (base != NULL) {
    // Use scaled-down base address parts to match scaled-down klass pointer.
    unsigned int base_h = ((unsigned long)base)>>(32+shift);
    unsigned int base_l = (unsigned int)(((unsigned long)base)>>shift);

    // General considerations:
    //  - when calculating (current_h - base_h), all digits must cancel (become 0).
    //    Otherwise, we would end up with a compressed klass pointer which doesn't
    //    fit into 32-bit.
    //  - Only bit#33 of the difference could potentially be non-zero. For that
    //    to happen, (current_l < base_l) must hold. In this case, the subtraction
    //    will create a borrow out of bit#32, nicely killing bit#33.
    //  - With the above, we only need to consider current_l and base_l to
    //    calculate the result.
    //  - Both values are treated as unsigned. The unsigned subtraction is
    //    replaced by adding (unsigned) the 2's complement of the subtrahend.

    if (base_l == 0) {
      //  - By theory, the calculation to be performed here (current_h - base_h) MUST
      //    cancel all high-word bits. Otherwise, we would end up with an offset
      //    (i.e. compressed klass pointer) that does not fit into 32 bit.
      //  - current_l remains unchanged.
      //  - Therefore, we can replace all calculation with just a
      //    zero-extending load 32 to 64 bit.
      //  - Even that can be replaced with a conditional load if dst != current.
      //    (this is a local view. The shift step may have requested zero-extension).
    } else {
      // To begin with, we may need to copy and/or zero-extend the register operand.
      // We have to calculate (current_l - base_l). Because there is no unsigend
      // subtract instruction with immediate operand, we add the 2's complement of base_l.
      if (need_zero_extend) {
        z_llgfr(dst, current);
        need_zero_extend = false;
      } else {
        llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost.
      }
      current = dst;
      z_alfi(dst, -(int)base_l);
    }

  if (need_zero_extend) {
    // We must zero-extend the calculated result. It may have some leftover bits in
    // the hi-word because we only did optimized calculations.
    z_llgfr(dst, current);
  } else {
    llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost.
  }

  BLOCK_COMMENT("} cKlass encoder");
}

-------------

PR: https://git.openjdk.java.net/jdk/pull/2595