Reducing class pointer size useful?

Mon Sep 20 14:56:29 UTC 2021

Hi Thomas,

This is very useful! As soon as the PR is ready, I would be happy to 
merge it to enable further experimentation. It would be good if we could 
make the number-of-bits-per-Klass* configurable, even for the header 
layout, so that we can trade some Klass* bits if we need them. Currently 
I think we should be good with 24bits.

Also, 4G encoding range seems excessive for the vast majority of 
applications, can we trade some encoding range for smaller alignment at 
the same number-of-bits too?

Thanks!
Roman

> Hi,
> 
> I built a prototype following my idea and tested it a bit.
> 
> For the prototype, I changed metaspace such that the class space portion
> could run with a different alignment. I had to rewrite arena guard handling
> (split out as an own patch for main, [1]) and fix code generation for x64
> klass pointer decoding for shift values >3. It's very likely that this has
> to be done for at least aarch64 and s390 too.
> 
> I used a 10 bits shift value (1K) which seems excessive at first glance but
> looks like a sweet spot (that, or 9 bits) since the vast majority of Klass
> structures seem to hover around 600-700 bytes. Note that beyond the 10 bits
> shift, I did not reduce the encoding range further but hard-coded it to be
> 4G. So, we still can cover 4G encoding range.
> 
> All this means that in my prototype compressed class pointers do not exceed
> 22 bits in size.
> 
> My modifications to lilliput live in a Draft PR in the lilliput repo [2],
> but it is still a work in progress.
> 
> ============
> 
> 1) I ran a test where I loaded 40000 classes [3] (actually ~43000 including
> the JDK itself).
> 
> VM args:
> - `-Xshare:off` because CDS does not yet work with the modified alignemnt
> - `-XX:+AlwaysPreTouch` to stability RSS somewhat
> - `-Xmx512m -Xms512m` to limit heap size and reduce its effect on RSS
> - `-XX:CompressedClassSpaceSize=1g` kind of pointless, its the default
> - `-XX:+UnlockDiagnosticVMOptions
> -XX:CompressedClassSpaceBaseAddress=0xabcde000000` to test non-base-NULL
> non-shift-0 encoding.
> 
> Program args:
> `--num-generations=1 --num-loaders=1 --num-classes=40000`
> Only one loader sequentially loading stuff, so minimal per-loader overhead,
> which would have obfuscated the difference in memory consumption due to
> Klass alignment.
> 
> Results:
> 
> 3-bit alignment (base):
> 
> ```
> [0.058s][info][metaspace] Compressed class space mapped at:
> 0x00000abcde000000-0x00000abd1e000000, reserved size: 1073741824
> 
> [0.058s][info][metaspace] Narrow klass base: 0x00000abcde000000, Narrow
> klass shift: 3, Narrow klass range: 0x40000000
> ```
> 
> RSS: 1,26..1,28 GB
> 
> Metaspace footprint:
>    Non-class space:      400,00 MB reserved,     399,12 MB (>99%) committed,
>   50 nodes.
>        Class space:        1,00 GB reserved,      31,00 MB (  3%) committed,
>   1 nodes.
>               Both:        1,39 GB reserved,     430,12 MB ( 30%) committed.
> 
> ------------
> 
> 10-bit alignment:
> 
> ```
> [0.064s][info][metaspace] Compressed class space mapped at:
> 0x00000abcde000000-0x00000abd1e000000, reserved size: 1073741824
> [0.064s][info][metaspace] Narrow klass base: 0x00000abcde000000, Narrow
> klass shift: 10, Narrow klass range: 0x40000000
> ```
> 
> RSS: 1,24..1,27 GB
> 
> Metaspace footprint:
>    Non-class space:      400,00 MB reserved,     398,44 MB (>99%) committed,
>   50 nodes.
>        Class space:        1,00 GB reserved,      42,38 MB (  4%) committed,
>   1 nodes.
>               Both:        1,39 GB reserved,     440,81 MB ( 31%) committed.
> 
> Class space increase compared with base: 11.38 MB
> Average alignment loss per Klass compared with base: 277 bytes
> 
> -----------
> 
> Interpretation:
> 
> As expected, with 10 bits class space consumption went up somewhat, by
> about 11 MB. Non-class space stayed stable since it was still using
> standard metaspace alignment.
> RSS wobbled too much for this small difference in class space size to be
> noticeable above the background noise (despite +AlwaysPreTouch).
> 
> ============
> 
> 2) Since the first test used classes artificially generated by me and
> therefore may be skewed, I also did a simple test with the Springboot
> petclinic. I started the petclinic and measured after it came up. At that
> point, the petclinic loaded about 15000 classes. I used the same VM
> arguments as (1).
> 
> Results:
> 
> 3 bit alignment (base):
> 
> [0.059s][info][metaspace] Compressed class space mapped at:
> 0x00000abcde000000-0x00000abd1e000000, reserved size: 1073741824
> 
> [0.059s][info][metaspace] Narrow klass base: 0x00000abcde000000, Narrow
> klass shift: 3, Narrow klass range: 0x40000000
> 
> 
> RSS: 841-866 MB
> 
> Metaspace footprint:
> 
>    Non-class space:       64,00 MB reserved,      62,56 MB ( 98%) committed,
>   8 nodes.
>        Class space:        1,00 GB reserved,       9,38 MB ( <1%) committed,
>   1 nodes.
>               Both:        1,06 GB reserved,      71,94 MB (  7%) committed.
> 
> 
> ------
> 
> 10 bit alignment:
> 
> [0.060s][info][metaspace] Compressed class space mapped at:
> 0x00000abcde000000-0x00000abd1e000000, reserved size: 1073741824
> 
>                                                           [60/571]
> [0.060s][info][metaspace] Narrow klass base: 0x00000abcde000000, Narrow
> klass shift: 10, Narrow klass range: 0x40000000
> 
> RSS: 849-850 MB
> Metaspace footprint:
>    Non-class space:       64,00 MB reserved,      62,62 MB ( 98%) committed,
>   8 nodes.
>        Class space:        1,00 GB reserved,      15,69 MB (  2%) committed,
>   1 nodes.
>               Both:        1,06 GB reserved,      78,31 MB (  7%) committed.
> 
> Class space increase compared with base: 6.31 MB
> Average alignment loss per Klass compared with base: 441 bytes
> 
> ------
> 
> 9 bit alignment (to see if avg alignment loss changes significantly with
> 512 bytes alignment):
> 
> [0.059s][info][metaspace] Compressed class space mapped at:
> 0x00000abcde000000-0x00000abd1e000000, reserved size: 1073741824
> 
> [0.059s][info][metaspace] Narrow klass base: 0x00000abcde000000, Narrow
> klass shift: 9, Narrow klass range: 0x40000000
> 
> RSS: 836-861 MB
> Metaspace footprint:
>    Non-class space:       64,00 MB reserved,      62,62 MB ( 98%) committed,
>   8 nodes.
>        Class space:        1,00 GB reserved,      12,88 MB (  1%) committed,
>   1 nodes.
>               Both:        1,06 GB reserved,      75,50 MB (  7%) committed.
> 
> Class space increase compared with base: 3.5 MB
> Average alignment loss per Klass compared with base: 245 bytes
> 
> -----------
> 
> Interpretation:
> 
> Mirroring the results from (1), class space consumption went up a bit, but
> was not noticeable above RSS wobbling from run to run.
> Reducing alignment from 10 to 9 bits reduces avg loss per Klass. So, the
> Jury may still be out which alignment is more effective, but the
> differences are small.
> 
> ============
> 
> Conclusion:
> 
> I think this approach makes sense. The increased class space size should
> easily be recoverable by reduced heap size if Lilliput is successful in
> doing that.
> 
> In my opinion, this approach allows us to easily investigate other forms of
> object headers. It allows for Klass structures to stay variable-sized, to
> be huge if they want to be huge, we do not have to artificially limit the
> number of classes, we do not have to split classes into near- and far
> classes, and do not have to split Klass into several components. If we
> want, we can do all that later, but for now, it could just work with the
> existing hotspot and only minor modifications.
> 
> I also think that keeping Klass in metaspace has a number of advantages:
> - fast, arena-style allocation
> - despite being in an arena, we get free-block management
> - we can continue to use the memory reclamation mechanism of metaspace on
> class unloading
> - we have monitoring tools in place
> 
> Basically, I feel that if we invented a different scheme to store Klass
> structures, we would eventually re-invent most of that.
> 
> If you guys think this is a good alley to investigate further, the next
> steps would be:
> 
> - make Klass pointer encoding work on all 64-bit platforms with arbitrary
> bases and shifts, and maybe optimize it further. For now this only works on
> x64.
> - Fix CDS to work with the new alignment
> 
> Thanks, Thomas
> 
> [1] https://github.com/openjdk/jdk/pull/5518
> [2] https://github.com/openjdk/lilliput/pull/13
> [3]
> https://github.com/tstuefe/ojdk-repros/blob/master/repros8/src/main/java/de/stuefe/repros/metaspace/InterleavedLoaders.java
> 
> 
> On Fri, Sep 10, 2021 at 1:11 PM Thomas Stüfe <thomas.stuefe at gmail.com>
> wrote:
> 
>> Hi,
>>
>> Would it be of use for Lilliput to shrink the class pointer size beyond 32
>> bit? I did not closely follow the discussions. Therefore I am not sure
>> where the current thinking goes.
>>
>> If yes, maybe we could reduce the pointer size not only by reducing the
>> encoding range but by using larger alignments.
>>
>> We encode with add-and-shift, as we do with compressed oops. Traditionally
>> the shift was 3, since sizeof(void*) is the alignment requirement for
>> metaspace allocations. This shift was used to enlarge the coverage of class
>> pointer encoding from 4GB to 32GB (KlassEncodingMetaspaceMax). But we never
>> used this to my knowledge since we limit class space size to 3GB at most.
>> And nobody needs 32GB class space anyway. So there was never a reason to
>> cover more than (3GB + <cds size>). Unless I missed something, the shift
>> had been useless. In fact, we recently removed the shift if CDS is on
>> (JDK-8265705) to solve an unrelated aarch64 issue, and nothing bad happened.
>>
>> But we could use the shift, not to enlarge the encoding range but to
>> reduce the class pointer size. And we could use a larger shift value. For
>> example, let's say we shift 8 bits. Then cut off those bits and reduce the
>> class pointer to 24 bits.
>>
>> The resulting alignment would be 256 bytes. Applied to all metaspace
>> allocations such an alignment would be prohibitively expensive, since most
>> allocations are very small. But if we apply this larger alignment to the
>> class space only, leave the rest of the metaspace alone, it is not so bad.
>> Before JEP 387, using different alignments would have been difficult to
>> implement, but metaspace coding is much more modular now, and using
>> different alignments for the different regions can be done.
>>
>> So we apply the larger alignment only to Klass structures. Klass
>> structures are large, and the relative loss due to alignment would matters
>> less. They are variable-sized but sizes are clustered between ~512 bytes
>> and ~1K. They can get much larger than that, but that is rare. Alignment
>> loss would be between 0-255 bytes, lets say on average 127. For a typical
>> larger app of 10000 classes, this would waste ~1.2MB. If that is acceptable
>> depends on what positive effect the smaller compressed class pointer has on
>> project Lilliput.
>>
>> ---
>>
>> One could argue that using an 8 bit shifted class pointer emans it stops
>> being a pointer and becomes an index into a table of 256-byte-slots,
>> populated with variable-sized Klass structures. With Klass sizes clustered
>> between 512 bytes..1K each Klass would populate 2..4 slots on average. The
>> 24-bit pointer is enough to address 16mio slots, hence on average 4..8
>> million Klass structures, still covering a 4G total range.
>>
>> We could further slim down the class pointer if we agree on a lower
>> maximum number of classes. E.g. with 22 bits, we could address 4mio slots
>> and house about 500k...1mio classes, still allowing for a maximum encoding
>> range of 1G.
>>
>> We could play around with these variables. E.g. a larger shift of 10 bits
>> - 1KB alignment - would mean most Klass structures occupy just one slot, we
>> would have to live a somewhat higher alignment waste of 0...1024, but now
>> can reduce the encoded class pointer to 20 bits, still being able to
>> address 1 mio slots resp. close to 1mio classes, with the total encoding
>> range still covering a 1GB.
>>
>> ---
>>
>> I think this approach is a variant of the
>> Klass-structures-in-a-table-and-store-the-index approach, but it allows for
>> those rare Klass structures to be larger than a single table slot and it
>> has a much larger max. cap on the number of classes than if we were just to
>> limit the encoding range. To me this matters somewhat because I have seen
>> productive installations where the number of classes was the low 100000's.
>> I don't think the 8192 limit cited in the Lilliput Wiki is practical.
>>
>> If I am right this approach should not require a lot of changes:
>> - we would need to modify metaspace to use separte alignments for the
>> class space
>> - may have to fix class pointer encoding for the various platforms if they
>> don't work with larger shifts out of the box, or are inefficient. E.g. on
>> x64, we use LEAQ to encode pointers, and LEAQ allows for a max. shift of 3,
>> so for shift=8 we may need to use separate add and shift.
>> - CDS may need some work too, since the Klass structures in the CDS region
>> need to be aligned to the larger alignment as well.
>>
>> Hope I did not make some gross miscalculation somewhare, but that's my
>> idea. What do you think.
>>
>> Thanks, Thomas
>>
>