32 bits header and alignment slack

Mon Mar 11 20:57:59 UTC 2024

On 4 Mar 2024, at 5:10, Kennke, Roman wrote:

>> … Thinking a little more about it.
>> The aligment gap is not necessary at the end of the object, a j.l.Long with a 32 bits header has a 32 bits gap in between the mark/class word and the field containing the value.
>
> This is a very good point, and a nice improvement! And should not be very difficult to implement, either.
>
> Thanks for suggesting that!

This is starting to feel a little like the parallel task for hunting around for a good place to put the “inflated klass” field (if we ever do such a thing).

The “inflated identity hash” field is different, since it is optional.  As an optional thing, it wants to go at the end of the object with a size increase.  (Or before the beginning, yuck.)  And then placing that field in a fragmentation-loss space is a win, since there’s no size increase.

BUT ALSO, if the object is large enough, preallocating an injected “inflated identity hash” field is just as good, within some epsilon.  (So that’s the suggestion that comes from comparing the problem with the problem of inflated klass fields.)  What epsilon, exactly?  Well, that’s a configuration parameter to the VM.  The larger the minimum object size (for preallocated IHC), the smaller the epsilon.  The meaningful scale here is a few cache lines.

Given all of the above, an object might have two possible layouts, IF it is (a) small and (b) already tightly packed.  But other objects would have just one layout, a layout which includes an inflated-IHC field.  A bit in the Klass would say whether its objects have (in fact) two layouts.  A bit in the header would select the second layout, if relevant.

Bonus Leyden tactic:  If a training run shows that some classes have frequent uses of IHCs, then a CDS archive could mention that fact, and the JVM could inject IHC fields even if they would otherwise be classified as too small and too dense.

Do preallocated IHC fields really matter, do they justify their static cost?  Yes, they might matter if IHC codes are important for some classes in some workload.  In that case, the clever derivation of IHC from new-space pointers would be less of a bottleneck.  After all, if ask an object 3 times for its IHC, and the derivation cost is X, you pay 3X instead of X plus two cache reads.  And X is probably somewhat greater than a cache read, especially if it creates a strong result.

BTW, the “clever derivation” should be more clever than just adding the VA of the object into some salt.  (You read read salt from a cache, so it’s still at least as expensive than a cache read of the finished result.)  Weak IHCs are something we try to avoid in HotSpot. I recommend at least a multiply or two to mix salt with VA, with non-linear operations like XOR or variable ROR.  An object with a preallocated IHC field would perform the mixing once, making the tradeoffs easier.

And FTR, I have been experimenting with some strong mixing algorithms that run in about a nanosecond but produce results that pass RNG stress tests.  They can also be progressively detuned to trade strength for speed.  One such hash is used in this experiment:

https://github.com/openjdk/jdk/commit/aff6e45224e3241fd22321a8effb760ebe388bc0#diff-0a958b47af7932e08b14e833e8daa3a2c2b23db33c02effab4422f1ddcd82a97

There are others in the literature as well.  My main point there would be, let’s use something better than, “oh, I think I’ll XOR the high 15 bits now…” or “let’s see what Objects::hash does…”  Today’s hardware multipliers change the game significantly, relative to such 1980’s techniques.