[master] RFR: Load Klass* from header, C2 implementation

Wed Feb 9 17:10:47 UTC 2022

On Wed, 9 Feb 2022 15:21:08 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> This implements loading the compressed Klass* from the object header in C2.
>> This is quite a hack: normally we should change the load offset everywhere in C2 from 8 (current klass_offset_in_bytes) to 0 (the mark word offset). However, offset = 0 has different meaning in C2, it means no offset, and throws off all sorts of stuff in C2. Realistically, in the long run, we want to change to loading from offset 4 (upper half of the header), but we currently need to deal with locking, and thus load from offset 0.
>> 
>> I fake it: I pretend to load from offset 4 everywhere in C2 ideal graph, but then load from offset 0 and check the mark bits instead. We can't leave the offset in ideal graph at 8 because that is going to conflict with loads from first field as soon as we drop the Klass*.
>> 
>> We also need to ensure that C2 knows that the condition flags are clobbered (cr), and that src and dst are not the same register.
>> 
>> Testing:
>>  - [x] tier1 (x86_64, aarch64)
>>  - [x] tier2 (x86_64, aarch64)
>>  - [x] tier3 (x86_64, aarch64)
>
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 3716:
> 
>> 3714: 
>> 3715:   bind(slow);
>> 3716:   enter();
> 
> Why `enter()` and `leave()` are removed? I thought those manage the frame pointer to allow stack walk? IOW, it is not enough to just save the `lr`, we also need to manage `rfp`?

enter() saves/restores lr and rfp, and then overrides rfp with current sp. leave() reverts that. C2 doesn't seem to like messing with rfp, but we still need to preserve lr at this point, so that subroutine calls can find their way back. The stub does its own enter/leave around the slow-path call.

> src/hotspot/cpu/x86/x86_64.ad line 12268:
> 
>> 12266: //instruct compN_mem_imm_klass(rFlagsRegU cr, memory mem, immNKlass src)
>> 12267: //%{
>> 12268: //  match(Set cr (CmpN src (LoadNKlass mem)));
> 
> I think you can do `predicate(false)` to disable rules, while still compiling `ins_encode`, thus capturing refactoring errors there. No biggie if you want to comment out the rule.

Ok, good suggestion. I will change it.

> src/hotspot/share/oops/oop.hpp line 310:
> 
>> 308:   // for code generation
>> 309:   static int mark_offset_in_bytes()      { return offset_of(oopDesc, _mark); }
>> 310:   static int nklass_offset_in_bytes()    {
> 
> Feels like it should be the other way around: always ask for `klass_offset_in_bytes()`. Return Lilliput offset in `_LP64`, return the usual `offset_of(oopDesc, _metadata._klass)`. Saves the headache of replacing `klass_offset_in_bytes` to `nklass_offset_in_bytes` everywhere?

Yes, I can do that as soon as the Klass* is eliminated. Currently we still need to deal with both. See below.

> src/hotspot/share/opto/macro.cpp line 1676:
> 
>> 1674:   rawmem = make_store(control, rawmem, object, oopDesc::mark_offset_in_bytes(), mark_node, TypeX_X->basic_type());
>> 1675:   rawmem = make_store(control, rawmem, object, oopDesc::nklass_offset_in_bytes(), klass_node, T_METADATA);
>> 1676:   rawmem = make_store(control, rawmem, object, oopDesc::klass_offset_in_bytes(), klass_node, T_METADATA);
> 
> This looks patently weird. What do we store to, `nklass` or `klass`? Why both here?

The Klass* will be eliminated in the subsequent PR. Until then, we have to keep both.

-------------

PR: https://git.openjdk.java.net/lilliput/pull/29