[master] RFR: Load Klass* from header in interpreter (x86) [v5]

Tue Oct 12 12:40:16 UTC 2021

On Wed, 6 Oct 2021 16:48:11 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> This implements loading the compressed Klass* from the object header, instead of the Klass* field in the x86 interpreter. It does the fast-path (unlocked object) in assembly, and calls into the runtime to deal with locked objects.
>> 
>> This is a proof-of-concept. It is not entirely clear if we can even call into runtime from a couple of places where load_klass() is called, especially in C2, C1 and some stubs. OTOH, we may end up not having to do any of this if we come up with a way to avoid displaced headers altogether.
>> 
>> Testing:
>>  - [x] tier1 (x86_32,x86_64)
>>  - [x] tier2 (x86_32,x86_64)
>>  - [x] hotspot_gc (x86_32,x86_64)
>
> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
> 
>  - Merge branch 'master' into klass-from-header-interpreter
>  - Emit null-checks before load_klass with correct offset
>  - Use constant instead of literal in xor-test-sequence
>  - Update comment about xorb/xorq encoding
>  - Load Klass* from header in interpreter (x86)

Looks okay for the experimental code.

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4554:

> 4552:   // NOTE: While it would seem nice to use xorb instead (for which we don't have an encoding in our assembler),
> 4553:   // the encoding for xorq uses the signed version (0x81/6) of xor, which encodes as compact as xorb would,
> 4554:   // and does't make a difference performance-wise.

I can give you `xorb`, if you want, in a subsequent PR.

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4597:

> 4595:     null_check(src, oopDesc::klass_offset_in_bytes());
> 4596:   }
> 4597:   movptr(dst, Address(src, oopDesc::klass_offset_in_bytes()));

Do we not need to do any fixups for x86_32?

src/hotspot/cpu/x86/macroAssembler_x86.hpp line 342:

> 340: 
> 341:   // oop manipulations
> 342:   void load_klass(Register dst, Register src, Register tmp, bool null_check_src);

I think if you define `bool null_check_src = false`, then the significant part of the changes would go away. It might not be a good idea when upstreaming this, but it would probably simplify the merges quite a bit.

src/hotspot/cpu/x86/templateTable_x86.cpp line 4185:

> 4183:   Register tmp_load_klass = LP64_ONLY(rscratch1) NOT_LP64(noreg);
> 4184:   __ load_klass(rdx, rdx, tmp_load_klass, false);
> 4185:   __ jmp(resolved);

Have you run into troubles with `jmpb` short branch here?

-------------

Marked as reviewed by shade (Committer).

PR: https://git.openjdk.java.net/lilliput/pull/15