[master] RFR: Smaller class pointers [v3]

Tue Dec 14 16:35:42 UTC 2021

On Wed, 8 Dec 2021 18:47:47 GMT, John R Rose <jrose at openjdk.org> wrote:

>> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   feedback Ioi, fix CDS archive size estimate
>
> Also, FTR, there are other ways to keep cache density without the extra add/xor step of smearing M-shifted bits.  It would require deeper cuts, but it is worth thinking again of the thing you already mentioned, where a Klass block is split into a fixed-sized "near Klass" and an indirectly addressed "far Klass".  The near Klass would be sized as a small number of cache blocks, to keep Klass metadata dense in the data cache.  There is a sensitive performance tradeoff, though, since you want parts of the Klass which are accessed frequently (dynamic type checks, vtables, GC metadata) to be in the near Klass.  Since v-tables are variable in size, this means some v-table entries (probably) get favored over others.  Tricky business.  I think Coleen looked into this many years ago and may have further insights.
> 
> Independently, a possible future benefit of a near-Klass/far-Klass split would be that we could consider building in a one-to-many relation across the linking indirection.  That could support a tightly integrated "species" concept, where an object's header can point to a near-Klass which represents an individuated species of its class.  In theory that could help represent specialized types like ArrayList<int> as distinct from the erased class ArrayList.

Hi @rose00,

many thanks for the friendly encouragement, it is nice to see the work appreciated!

First off, would you be okay with the patch going in its current form? I'm aware that its not the perfect solution, just a first step. But your suggestions all require more long-term work for which I lack the time for right now (also, vacations coming up). I would like to get this patch in, to enable @rkennke to start using the just slightly narrower narrow Klass pointers.

With this patch, we don't commit to any strategic direction, not really. The changes are tame and some of them even make sense out of the Lilliput context, so I will bring them upstream. There is nothing we could not change later if we fancy so.

About your ideas. This is only a part answer, unfortunately, I lack the time to go in deeper and do them full justice. But I enjoyed reading your proposals, many gems in there. It is apparent that you have thought for a long time about this :-)

(It is a pity that just at that moment the MLs failed. We may want to continue the discussion outside of the PR).

1) Hyper-aligning Klass locations and the danger of diminished cache efficiency

This is a good point I did not think much about before. But before I implement a solution, I'd like to make sure that this is really a problem - also because I would like to see a negative before building a positive, for comparison. Do all Klass accesses concentrate on the first 64 bytes of Klass? Have we already ordered members in Klass by hotness? I would have thought we have not, and that accesses are more evenly distributed - after all, a Klass usually covers at least 8 cache lines. I'm not sure here. Access pattern with statistical distributions for Klass members would certainly help, also in deciding if and what to split off.

In comparison to TLAB, I would have thought that TLAB access is much more concentrated (more loads happening sequentially targetting the TLAB), but that Klass access is more spread out and interspersed with other loads/stores.

Solution-wise, I like your wiggle-idea (wiggling the Klass address around) to mitigate the lockstep problem. Just to be sure I understand you correctly, instead of an allocation cadence like this: 0->0x200->0x400->0x600, you propose one like this 0->0x240->0x480->0x6c0 ? 

My only fear would be that this would increase code size for all the fragments which decode Klass, and that, again, could have a negative effect on performance. At least I cannot predict the effect in my head.

Reducing Klass size would be certainly A Good Thing, from various standpoints. For one, reduce metaspace overhead. Especially if we manage to make it homogenous, and especially if it's close to but under a pow2 size. Maybe splitting off vtable, itable. I thought maybe Klass could hold a small inline vtable/itable and just branch to separate structures if they are exhausted.

2) About coloring bits, I may just have misunderstood you. Do you think about hiding the klass id inside the coloring bits of an oop? How would that work with narrow oops? Would you not lose the bits every time you store a narrow oop?

Thanks, Thomas

-------------

PR: https://git.openjdk.java.net/lilliput/pull/13