Reducing class pointer size useful?

Tue Sep 21 05:03:39 UTC 2021

Hi Roman,

On Mon, Sep 20, 2021 at 4:56 PM Roman Kennke <rkennke at redhat.com> wrote:

> Hi Thomas,
>
> This is very useful! As soon as the PR is ready, I would be happy to
> merge it to enable further experimentation. It would be good if we could
> make the number-of-bits-per-Klass* configurable, even for the header
> layout, so that we can trade some Klass* bits if we need them. Currently
> I think we should be good with 24bits.
>

It's already configurable as a compile-time constant:
https://github.com/openjdk/lilliput/blob/42fd0204a9fb427f3a93f595866640cb0481d3b5/src/hotspot/share/utilities/globalDefinitions.hpp#L537
.

>
> Also, 4G encoding range seems excessive for the vast majority of
> applications, can we trade some encoding range for smaller alignment at
> the same number-of-bits too?
>

I think yes. Easily down to 2 or 1G. Below that some fiddling is needed
since the range needs to encompass both class space and those CDS regions
which contain Klasses.

>
> Thanks!
> Roman
>

I'll prepare the patch.

Cheers, Thomas

>
> > Hi,
> >
> > I built a prototype following my idea and tested it a bit.
> >
> > For the prototype, I changed metaspace such that the class space portion
> > could run with a different alignment. I had to rewrite arena guard
> handling
> > (split out as an own patch for main, [1]) and fix code generation for x64
> > klass pointer decoding for shift values >3. It's very likely that this
> has
> > to be done for at least aarch64 and s390 too.
> >
> > I used a 10 bits shift value (1K) which seems excessive at first glance
> but
> > looks like a sweet spot (that, or 9 bits) since the vast majority of
> Klass
> > structures seem to hover around 600-700 bytes. Note that beyond the 10
> bits
> > shift, I did not reduce the encoding range further but hard-coded it to
> be
> > 4G. So, we still can cover 4G encoding range.
> >
> > All this means that in my prototype compressed class pointers do not
> exceed
> > 22 bits in size.
> >
> > My modifications to lilliput live in a Draft PR in the lilliput repo [2],
> > but it is still a work in progress.
> >
> > ============
> >
> > 1) I ran a test where I loaded 40000 classes [3] (actually ~43000
> including
> > the JDK itself).
> >
> > VM args:
> > - `-Xshare:off` because CDS does not yet work with the modified alignemnt
> > - `-XX:+AlwaysPreTouch` to stability RSS somewhat
> > - `-Xmx512m -Xms512m` to limit heap size and reduce its effect on RSS
> > - `-XX:CompressedClassSpaceSize=1g` kind of pointless, its the default
> > - `-XX:+UnlockDiagnosticVMOptions
> > -XX:CompressedClassSpaceBaseAddress=0xabcde000000` to test non-base-NULL
> > non-shift-0 encoding.
> >
> > Program args:
> > `--num-generations=1 --num-loaders=1 --num-classes=40000`
> > Only one loader sequentially loading stuff, so minimal per-loader
> overhead,
> > which would have obfuscated the difference in memory consumption due to
> > Klass alignment.
> >
> > Results:
> >
> > 3-bit alignment (base):
> >
> > ```
> > [0.058s][info][metaspace] Compressed class space mapped at:
> > 0x00000abcde000000-0x00000abd1e000000, reserved size: 1073741824
> >
> > [0.058s][info][metaspace] Narrow klass base: 0x00000abcde000000, Narrow
> > klass shift: 3, Narrow klass range: 0x40000000
> > ```
> >
> > RSS: 1,26..1,28 GB
> >
> > Metaspace footprint:
> >    Non-class space:      400,00 MB reserved,     399,12 MB (>99%)
> committed,
> >   50 nodes.
> >        Class space:        1,00 GB reserved,      31,00 MB (  3%)
> committed,
> >   1 nodes.
> >               Both:        1,39 GB reserved,     430,12 MB ( 30%)
> committed.
> >
> > ------------
> >
> > 10-bit alignment:
> >
> > ```
> > [0.064s][info][metaspace] Compressed class space mapped at:
> > 0x00000abcde000000-0x00000abd1e000000, reserved size: 1073741824
> > [0.064s][info][metaspace] Narrow klass base: 0x00000abcde000000, Narrow
> > klass shift: 10, Narrow klass range: 0x40000000
> > ```
> >
> > RSS: 1,24..1,27 GB
> >
> > Metaspace footprint:
> >    Non-class space:      400,00 MB reserved,     398,44 MB (>99%)
> committed,
> >   50 nodes.
> >        Class space:        1,00 GB reserved,      42,38 MB (  4%)
> committed,
> >   1 nodes.
> >               Both:        1,39 GB reserved,     440,81 MB ( 31%)
> committed.
> >
> > Class space increase compared with base: 11.38 MB
> > Average alignment loss per Klass compared with base: 277 bytes
> >
> > -----------
> >
> > Interpretation:
> >
> > As expected, with 10 bits class space consumption went up somewhat, by
> > about 11 MB. Non-class space stayed stable since it was still using
> > standard metaspace alignment.
> > RSS wobbled too much for this small difference in class space size to be
> > noticeable above the background noise (despite +AlwaysPreTouch).
> >
> > ============
> >
> > 2) Since the first test used classes artificially generated by me and
> > therefore may be skewed, I also did a simple test with the Springboot
> > petclinic. I started the petclinic and measured after it came up. At that
> > point, the petclinic loaded about 15000 classes. I used the same VM
> > arguments as (1).
> >
> > Results:
> >
> > 3 bit alignment (base):
> >
> > [0.059s][info][metaspace] Compressed class space mapped at:
> > 0x00000abcde000000-0x00000abd1e000000, reserved size: 1073741824
> >
> > [0.059s][info][metaspace] Narrow klass base: 0x00000abcde000000, Narrow
> > klass shift: 3, Narrow klass range: 0x40000000
> >
> >
> > RSS: 841-866 MB
> >
> > Metaspace footprint:
> >
> >    Non-class space:       64,00 MB reserved,      62,56 MB ( 98%)
> committed,
> >   8 nodes.
> >        Class space:        1,00 GB reserved,       9,38 MB ( <1%)
> committed,
> >   1 nodes.
> >               Both:        1,06 GB reserved,      71,94 MB (  7%)
> committed.
> >
> >
> > ------
> >
> > 10 bit alignment:
> >
> > [0.060s][info][metaspace] Compressed class space mapped at:
> > 0x00000abcde000000-0x00000abd1e000000, reserved size: 1073741824
> >
> >                                                           [60/571]
> > [0.060s][info][metaspace] Narrow klass base: 0x00000abcde000000, Narrow
> > klass shift: 10, Narrow klass range: 0x40000000
> >
> > RSS: 849-850 MB
> > Metaspace footprint:
> >    Non-class space:       64,00 MB reserved,      62,62 MB ( 98%)
> committed,
> >   8 nodes.
> >        Class space:        1,00 GB reserved,      15,69 MB (  2%)
> committed,
> >   1 nodes.
> >               Both:        1,06 GB reserved,      78,31 MB (  7%)
> committed.
> >
> > Class space increase compared with base: 6.31 MB
> > Average alignment loss per Klass compared with base: 441 bytes
> >
> > ------
> >
> > 9 bit alignment (to see if avg alignment loss changes significantly with
> > 512 bytes alignment):
> >
> > [0.059s][info][metaspace] Compressed class space mapped at:
> > 0x00000abcde000000-0x00000abd1e000000, reserved size: 1073741824
> >
> > [0.059s][info][metaspace] Narrow klass base: 0x00000abcde000000, Narrow
> > klass shift: 9, Narrow klass range: 0x40000000
> >
> > RSS: 836-861 MB
> > Metaspace footprint:
> >    Non-class space:       64,00 MB reserved,      62,62 MB ( 98%)
> committed,
> >   8 nodes.
> >        Class space:        1,00 GB reserved,      12,88 MB (  1%)
> committed,
> >   1 nodes.
> >               Both:        1,06 GB reserved,      75,50 MB (  7%)
> committed.
> >
> > Class space increase compared with base: 3.5 MB
> > Average alignment loss per Klass compared with base: 245 bytes
> >
> > -----------
> >
> > Interpretation:
> >
> > Mirroring the results from (1), class space consumption went up a bit,
> but
> > was not noticeable above RSS wobbling from run to run.
> > Reducing alignment from 10 to 9 bits reduces avg loss per Klass. So, the
> > Jury may still be out which alignment is more effective, but the
> > differences are small.
> >
> > ============
> >
> > Conclusion:
> >
> > I think this approach makes sense. The increased class space size should
> > easily be recoverable by reduced heap size if Lilliput is successful in
> > doing that.
> >
> > In my opinion, this approach allows us to easily investigate other forms
> of
> > object headers. It allows for Klass structures to stay variable-sized, to
> > be huge if they want to be huge, we do not have to artificially limit the
> > number of classes, we do not have to split classes into near- and far
> > classes, and do not have to split Klass into several components. If we
> > want, we can do all that later, but for now, it could just work with the
> > existing hotspot and only minor modifications.
> >
> > I also think that keeping Klass in metaspace has a number of advantages:
> > - fast, arena-style allocation
> > - despite being in an arena, we get free-block management
> > - we can continue to use the memory reclamation mechanism of metaspace on
> > class unloading
> > - we have monitoring tools in place
> >
> > Basically, I feel that if we invented a different scheme to store Klass
> > structures, we would eventually re-invent most of that.
> >
> > If you guys think this is a good alley to investigate further, the next
> > steps would be:
> >
> > - make Klass pointer encoding work on all 64-bit platforms with arbitrary
> > bases and shifts, and maybe optimize it further. For now this only works
> on
> > x64.
> > - Fix CDS to work with the new alignment
> >
> > Thanks, Thomas
> >
> > [1] https://github.com/openjdk/jdk/pull/5518
> > [2] https://github.com/openjdk/lilliput/pull/13
> > [3]
> >
> https://github.com/tstuefe/ojdk-repros/blob/master/repros8/src/main/java/de/stuefe/repros/metaspace/InterleavedLoaders.java
> >
> >
> > On Fri, Sep 10, 2021 at 1:11 PM Thomas Stüfe <thomas.stuefe at gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> Would it be of use for Lilliput to shrink the class pointer size beyond
> 32
> >> bit? I did not closely follow the discussions. Therefore I am not sure
> >> where the current thinking goes.
> >>
> >> If yes, maybe we could reduce the pointer size not only by reducing the
> >> encoding range but by using larger alignments.
> >>
> >> We encode with add-and-shift, as we do with compressed oops.
> Traditionally
> >> the shift was 3, since sizeof(void*) is the alignment requirement for
> >> metaspace allocations. This shift was used to enlarge the coverage of
> class
> >> pointer encoding from 4GB to 32GB (KlassEncodingMetaspaceMax). But we
> never
> >> used this to my knowledge since we limit class space size to 3GB at
> most.
> >> And nobody needs 32GB class space anyway. So there was never a reason to
> >> cover more than (3GB + <cds size>). Unless I missed something, the shift
> >> had been useless. In fact, we recently removed the shift if CDS is on
> >> (JDK-8265705) to solve an unrelated aarch64 issue, and nothing bad
> happened.
> >>
> >> But we could use the shift, not to enlarge the encoding range but to
> >> reduce the class pointer size. And we could use a larger shift value.
> For
> >> example, let's say we shift 8 bits. Then cut off those bits and reduce
> the
> >> class pointer to 24 bits.
> >>
> >> The resulting alignment would be 256 bytes. Applied to all metaspace
> >> allocations such an alignment would be prohibitively expensive, since
> most
> >> allocations are very small. But if we apply this larger alignment to the
> >> class space only, leave the rest of the metaspace alone, it is not so
> bad.
> >> Before JEP 387, using different alignments would have been difficult to
> >> implement, but metaspace coding is much more modular now, and using
> >> different alignments for the different regions can be done.
> >>
> >> So we apply the larger alignment only to Klass structures. Klass
> >> structures are large, and the relative loss due to alignment would
> matters
> >> less. They are variable-sized but sizes are clustered between ~512 bytes
> >> and ~1K. They can get much larger than that, but that is rare. Alignment
> >> loss would be between 0-255 bytes, lets say on average 127. For a
> typical
> >> larger app of 10000 classes, this would waste ~1.2MB. If that is
> acceptable
> >> depends on what positive effect the smaller compressed class pointer
> has on
> >> project Lilliput.
> >>
> >> ---
> >>
> >> One could argue that using an 8 bit shifted class pointer emans it stops
> >> being a pointer and becomes an index into a table of 256-byte-slots,
> >> populated with variable-sized Klass structures. With Klass sizes
> clustered
> >> between 512 bytes..1K each Klass would populate 2..4 slots on average.
> The
> >> 24-bit pointer is enough to address 16mio slots, hence on average 4..8
> >> million Klass structures, still covering a 4G total range.
> >>
> >> We could further slim down the class pointer if we agree on a lower
> >> maximum number of classes. E.g. with 22 bits, we could address 4mio
> slots
> >> and house about 500k...1mio classes, still allowing for a maximum
> encoding
> >> range of 1G.
> >>
> >> We could play around with these variables. E.g. a larger shift of 10
> bits
> >> - 1KB alignment - would mean most Klass structures occupy just one
> slot, we
> >> would have to live a somewhat higher alignment waste of 0...1024, but
> now
> >> can reduce the encoded class pointer to 20 bits, still being able to
> >> address 1 mio slots resp. close to 1mio classes, with the total encoding
> >> range still covering a 1GB.
> >>
> >> ---
> >>
> >> I think this approach is a variant of the
> >> Klass-structures-in-a-table-and-store-the-index approach, but it allows
> for
> >> those rare Klass structures to be larger than a single table slot and it
> >> has a much larger max. cap on the number of classes than if we were
> just to
> >> limit the encoding range. To me this matters somewhat because I have
> seen
> >> productive installations where the number of classes was the low
> 100000's.
> >> I don't think the 8192 limit cited in the Lilliput Wiki is practical.
> >>
> >> If I am right this approach should not require a lot of changes:
> >> - we would need to modify metaspace to use separte alignments for the
> >> class space
> >> - may have to fix class pointer encoding for the various platforms if
> they
> >> don't work with larger shifts out of the box, or are inefficient. E.g.
> on
> >> x64, we use LEAQ to encode pointers, and LEAQ allows for a max. shift
> of 3,
> >> so for shift=8 we may need to use separate add and shift.
> >> - CDS may need some work too, since the Klass structures in the CDS
> region
> >> need to be aligned to the larger alignment as well.
> >>
> >> Hope I did not make some gross miscalculation somewhare, but that's my
> >> idea. What do you think.
> >>
> >> Thanks, Thomas
> >>
> >
>
>