Reducing class pointer size useful?

Fri Sep 10 11:11:56 UTC 2021

Hi,

Would it be of use for Lilliput to shrink the class pointer size beyond 32
bit? I did not closely follow the discussions. Therefore I am not sure
where the current thinking goes.

If yes, maybe we could reduce the pointer size not only by reducing the
encoding range but by using larger alignments.

We encode with add-and-shift, as we do with compressed oops. Traditionally
the shift was 3, since sizeof(void*) is the alignment requirement for
metaspace allocations. This shift was used to enlarge the coverage of class
pointer encoding from 4GB to 32GB (KlassEncodingMetaspaceMax). But we never
used this to my knowledge since we limit class space size to 3GB at most.
And nobody needs 32GB class space anyway. So there was never a reason to
cover more than (3GB + <cds size>). Unless I missed something, the shift
had been useless. In fact, we recently removed the shift if CDS is on
(JDK-8265705) to solve an unrelated aarch64 issue, and nothing bad happened.

But we could use the shift, not to enlarge the encoding range but to reduce
the class pointer size. And we could use a larger shift value. For example,
let's say we shift 8 bits. Then cut off those bits and reduce the class
pointer to 24 bits.

The resulting alignment would be 256 bytes. Applied to all metaspace
allocations such an alignment would be prohibitively expensive, since most
allocations are very small. But if we apply this larger alignment to the
class space only, leave the rest of the metaspace alone, it is not so bad.
Before JEP 387, using different alignments would have been difficult to
implement, but metaspace coding is much more modular now, and using
different alignments for the different regions can be done.

So we apply the larger alignment only to Klass structures. Klass structures
are large, and the relative loss due to alignment would matters less. They
are variable-sized but sizes are clustered between ~512 bytes and ~1K. They
can get much larger than that, but that is rare. Alignment loss would be
between 0-255 bytes, lets say on average 127. For a typical larger app of
10000 classes, this would waste ~1.2MB. If that is acceptable depends on
what positive effect the smaller compressed class pointer has on project
Lilliput.

---

One could argue that using an 8 bit shifted class pointer emans it stops
being a pointer and becomes an index into a table of 256-byte-slots,
populated with variable-sized Klass structures. With Klass sizes clustered
between 512 bytes..1K each Klass would populate 2..4 slots on average. The
24-bit pointer is enough to address 16mio slots, hence on average 4..8
million Klass structures, still covering a 4G total range.

We could further slim down the class pointer if we agree on a lower maximum
number of classes. E.g. with 22 bits, we could address 4mio slots and house
about 500k...1mio classes, still allowing for a maximum encoding range of
1G.

We could play around with these variables. E.g. a larger shift of 10 bits -
1KB alignment - would mean most Klass structures occupy just one slot, we
would have to live a somewhat higher alignment waste of 0...1024, but now
can reduce the encoded class pointer to 20 bits, still being able to
address 1 mio slots resp. close to 1mio classes, with the total encoding
range still covering a 1GB.

---

I think this approach is a variant of the
Klass-structures-in-a-table-and-store-the-index approach, but it allows for
those rare Klass structures to be larger than a single table slot and it
has a much larger max. cap on the number of classes than if we were just to
limit the encoding range. To me this matters somewhat because I have seen
productive installations where the number of classes was the low 100000's.
I don't think the 8192 limit cited in the Lilliput Wiki is practical.

If I am right this approach should not require a lot of changes:
- we would need to modify metaspace to use separte alignments for the class
space
- may have to fix class pointer encoding for the various platforms if they
don't work with larger shifts out of the box, or are inefficient. E.g. on
x64, we use LEAQ to encode pointers, and LEAQ allows for a max. shift of 3,
so for shift=8 we may need to use separate add and shift.
- CDS may need some work too, since the Klass structures in the CDS region
need to be aligned to the larger alignment as well.

Hope I did not make some gross miscalculation somewhare, but that's my
idea. What do you think.

Thanks, Thomas