[master] RFR: JDK-8325104: Lilliput: Shrink Classpointers [v3]

Mon Mar 25 16:30:51 UTC 2024

On Mon, 25 Mar 2024 14:51:14 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Hi,
>> 
>> I wanted to get input on the following improvement for Lilliput. Testing is still ongoing, but things look really good, so this patch is hopefully near its final form (barring any objections from reviewers, of course).
>> 
>> Note: I have a companion patch prepared for upstream, minus the markword changes. I will attempt to get that one upstream quickly in order to not have a large delta between upstream and lilliput, especially in Metaspace.
>> 
>> ## High-Level Overview
>> 
>> (for a short sequence of slides, please see https://github.com/tstuefe/fosdem24/blob/master/classpointers-and-liliput.pdf - these accompanied a talk we held at FOSDEM 24).
>> 
>> We want to reduce the bit size of narrow Klass to free up bits in the MarkWord. 
>> 
>> We cannot just reduce the Klass encoding range size (well, we could, and maybe we will later, but for now we decided not to). We instead increase the alignment Klass is stored at, and use that alignment shadow to store other information.
>> 
>> In other words, this patch changes the narrow Klass Pointer to a Klass ID, since now (almost) every value in its value range points to a different class. Therefore, we use the value range of nKlass much more efficiently.
>> 
>> We then use the newly freed bits in the MarkWord to restore the iHash to 31 bits: 
>> 
>> 
>> [ 22-bit nKlass | 31-bit iHash | 4 free bits | age | fwd | lck ]
>> 
>> nKlass gets reduced to 22 bits. Identity hash gets re-inflated to 31 bits. Preceding iHash are now 4 unused bits. Rest is unchanged.
>> 
>> (Note: I originally wanted to swap iHash and nKlass such that either of them could be loaded with a 32-bit load, but I found that tricky since C2 seems to rely on the nKlass offset in the Markword being > 0.)
>> 
>> ## nKlass reduction:
>> 
>> The reduction in nKlass size is made by only storing them at 10-bit aligned addresses. That alignment (1KB) works well in practice since Klass - although var sized - typically is between 512 bytes and 1KB in size. Outliers are possible, but the size distribution is bell-curvish [1], so far-away outliers are very rare. 
>> 
>> To not lose memory to alignment waste, metaspace is reshaped to handle arbitrarily aligned allocations efficiently. Basically, we allow the non-Klass arena of a class loader to steal the alignment waste storage from the class arena. So, alignment waste blocks are filled with non-Klass metadata. That works very well in practice since non-Klass metadata is numerous and fine-granular compared to the big Klass bloc...
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits:
> 
>  - Roman feedback - small stuff
>  - Merge branch 'master' into Smaller-ClassPointers
>  - revert COH archive generation
>  - Remove files that accidentally slipped in
>  - Merge branch 'master' into Smaller-ClassPointers
>  - Merge
>  - Merge commit 'c1281e6b45ed167df69d29a6039d81854c145ae6~1' into Smaller-ClassPointers
>  - Fix Typo
>  - Better CDS arch generation
>  - Fix error in COH archive generation
>  - ... and 10 more: https://git.openjdk.org/lilliput/compare/b2fcfb73...1260f2d6

src/hotspot/share/oops/oop.hpp line 345:

> 343:       constexpr int load_shift = markWord::klass_load_shift;
> 344:       STATIC_ASSERT(load_shift % 8 == 0);
> 345:       return mark_offset_in_bytes() + load_shift / 8;

Isn't this broken for big-endian machines? The follow-up question then is, should we really be reading the klass pointer with 32-bit loads? If we load the entire 64-bit "object header" and then shift with `klass_shift`, we wouldn't have to think about endianess, right? Do we keep the 32-bit load because we don't want to mess with C2?

-------------

PR Review Comment: https://git.openjdk.org/lilliput/pull/128#discussion_r1537875026