RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base

Thu Feb 18 09:14:46 UTC 2021

On Thu, 18 Feb 2021 05:37:18 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3657:
>> 
>>> 3655:     } else {
>>> 3656:       load_const(Z_R0, base);
>>> 3657:       lgr_if_needed(dst, current);
>> 
>> What would you think of a more general rework like this? The comments in the code should explain the intentions/assumptions/conclusions.
>> 
>> // Klass oop manipulations if compressed.
>> void MacroAssembler::encode_klass_not_null(Register dst, Register src) {
>>   Register current = (src != noreg) ? src : dst; // Klass is in dst if no src provided. (dst == src) also possible.
>>   address  base    = CompressedKlassPointers::base();
>>   int      shift   = CompressedKlassPointers::shift();
>>   bool     need_zero_extend = false;
>>   assert(UseCompressedClassPointers, "only for compressed klass ptrs");
>> 
>>   BLOCK_COMMENT("cKlass encoder {");
>> 
>> #ifdef ASSERT
>>   Label ok;
>>   z_tmll(current, KlassAlignmentInBytes-1); // Check alignment.
>>   z_brc(Assembler::bcondAllZero, ok);
>>   // The plain disassembler does not recognize illtrap. It instead displays
>>   // a 32-bit value. Issueing two illtraps assures the disassembler finds
>>   // the proper beginning of the next instruction.
>>   z_illtrap(0xee);
>>   z_illtrap(0xee);
>>   bind(ok);
>> #endif
>> 
>>   // Scale down the incoming klass pointer first.
>>   // We then can be sure we calculate an offset that fits into 32 bit.
>>   // More generally speaking: all subsequent calculations are purely 32-bit.
>>   if (shift != 0) {
>>     assert (LogKlassAlignmentInBytes == shift, "decode alg wrong");
>>     z_srlg(dst, current, shift);
>>     need_zero_extend = true;
>>     current = dst;
>>   }
>> 
>>   if (base != NULL) {
>>     // Use scaled-down base address parts to match scaled-down klass pointer.
>>     unsigned int base_h = ((unsigned long)base)>>(32+shift);
>>     unsigned int base_l = (unsigned int)(((unsigned long)base)>>shift);
>> 
>>     // General considerations:
>>     //  - when calculating (current_h - base_h), all digits must cancel (become 0).
>>     //    Otherwise, we would end up with a compressed klass pointer which doesn't
>>     //    fit into 32-bit.
>>     //  - Only bit#33 of the difference could potentially be non-zero. For that
>>     //    to happen, (current_l < base_l) must hold. In this case, the subtraction
>>     //    will create a borrow out of bit#32, nicely killing bit#33.
>>     //  - With the above, we only need to consider current_l and base_l to
>>     //    calculate the result.
>>     //  - Both values are treated as unsigned. The unsigned subtraction is
>>     //    replaced by adding (unsigned) the 2's complement of the subtrahend.
>> 
>>     if (base_l == 0) {
>>       //  - By theory, the calculation to be performed here (current_h - base_h) MUST
>>       //    cancel all high-word bits. Otherwise, we would end up with an offset
>>       //    (i.e. compressed klass pointer) that does not fit into 32 bit.
>>       //  - current_l remains unchanged.
>>       //  - Therefore, we can replace all calculation with just a
>>       //    zero-extending load 32 to 64 bit.
>>       //  - Even that can be replaced with a conditional load if dst != current.
>>       //    (this is a local view. The shift step may have requested zero-extension).
>>     } else {
>>       // To begin with, we may need to copy and/or zero-extend the register operand.
>>       // We have to calculate (current_l - base_l). Because there is no unsigend
>>       // subtract instruction with immediate operand, we add the 2's complement of base_l.
>>       if (need_zero_extend) {
>>         z_llgfr(dst, current);
>>         need_zero_extend = false;
>>       } else {
>>         llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost.
>>       }
>>       current = dst;
>>       z_alfi(dst, -(int)base_l);
>>     }
>> 
>>   if (need_zero_extend) {
>>     // We must zero-extend the calculated result. It may have some leftover bits in
>>     // the hi-word because we only did optimized calculations.
>>     z_llgfr(dst, current);
>>   } else {
>>     llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost.
>>   }
>> 
>>   BLOCK_COMMENT("} cKlass encoder");
>> }
>
> Looks nice and elegant. 
> 
> But as said offlist, I dislike the fact that this hard codes the limitation to 32bit for the narrow klass pointer range.
> 
> That restriction is artificial and we may just want to drop it. E.g. one recurring idea I have is to drop the duality in metaspace between non-class- and class-metaspace, and just store everything in class space. That would save quite a bit of memory (less overhead) and make the metaspace coding quite a bit simpler. However, in that case it could be that we exceed the current 3g limit and may even exceed 32bit. Since add+shift for decoding is universally done on all platforms at least if CDS is on, this should work out of the box. Unless of course the platforms hard-code the 32bit limitation into their encoding schemes.

I don't see how you want to overcome the 32-bit limit for compressed pointers. This whole "compression" thing is based on the "trick" to store an offset instead of the full address. Depending on the object alignment requirement, this affords you 32 GB (8-byte alignment) or 64 GB (16-byte alignment) of addressable (or should I say offset-able) space. That's quite a bit. 

You use pointer compression to save space, and for nothing else. Space savings have to be so significant that they outweigh the added effort for encoding and decoding. With just some shift and add, the effort is limited, though noticeable. If you would make compressed pointers 40 bits wide (5 bytes), encoding and decoding would impose more effort. What's even worse, you then would have entities with a size not native to any processor. Just imagine you have to atomically store such a value.

I my opinion, wider compressed pointers will have to wait until we have 128-bit pointers.

Back to code:
In the code suggested above, you could make use of the Metaspace::class_space_end() function. If the class space end address, shifted right, fits into 32 bit, need_zero_extend may remain false. Your choice.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2595