Reducing class pointer size useful?

Sun Sep 19 06:40:10 UTC 2021

Hi,

I built a prototype following my idea and tested it a bit.

For the prototype, I changed metaspace such that the class space portion
could run with a different alignment. I had to rewrite arena guard handling
(split out as an own patch for main, [1]) and fix code generation for x64
klass pointer decoding for shift values >3. It's very likely that this has
to be done for at least aarch64 and s390 too.

I used a 10 bits shift value (1K) which seems excessive at first glance but
looks like a sweet spot (that, or 9 bits) since the vast majority of Klass
structures seem to hover around 600-700 bytes. Note that beyond the 10 bits
shift, I did not reduce the encoding range further but hard-coded it to be
4G. So, we still can cover 4G encoding range.

All this means that in my prototype compressed class pointers do not exceed
22 bits in size.

My modifications to lilliput live in a Draft PR in the lilliput repo [2],
but it is still a work in progress.

============

1) I ran a test where I loaded 40000 classes [3] (actually ~43000 including
the JDK itself).

VM args:
- `-Xshare:off` because CDS does not yet work with the modified alignemnt
- `-XX:+AlwaysPreTouch` to stability RSS somewhat
- `-Xmx512m -Xms512m` to limit heap size and reduce its effect on RSS
- `-XX:CompressedClassSpaceSize=1g` kind of pointless, its the default
- `-XX:+UnlockDiagnosticVMOptions
-XX:CompressedClassSpaceBaseAddress=0xabcde000000` to test non-base-NULL
non-shift-0 encoding.

Program args:
`--num-generations=1 --num-loaders=1 --num-classes=40000`
Only one loader sequentially loading stuff, so minimal per-loader overhead,
which would have obfuscated the difference in memory consumption due to
Klass alignment.

Results:

3-bit alignment (base):

```
[0.058s][info][metaspace] Compressed class space mapped at:
0x00000abcde000000-0x00000abd1e000000, reserved size: 1073741824

[0.058s][info][metaspace] Narrow klass base: 0x00000abcde000000, Narrow
klass shift: 3, Narrow klass range: 0x40000000
```

RSS: 1,26..1,28 GB

Metaspace footprint:
  Non-class space:      400,00 MB reserved,     399,12 MB (>99%) committed,
 50 nodes.
      Class space:        1,00 GB reserved,      31,00 MB (  3%) committed,
 1 nodes.
             Both:        1,39 GB reserved,     430,12 MB ( 30%) committed.

------------

10-bit alignment:

```
[0.064s][info][metaspace] Compressed class space mapped at:
0x00000abcde000000-0x00000abd1e000000, reserved size: 1073741824
[0.064s][info][metaspace] Narrow klass base: 0x00000abcde000000, Narrow
klass shift: 10, Narrow klass range: 0x40000000
```

RSS: 1,24..1,27 GB

Metaspace footprint:
  Non-class space:      400,00 MB reserved,     398,44 MB (>99%) committed,
 50 nodes.
      Class space:        1,00 GB reserved,      42,38 MB (  4%) committed,
 1 nodes.
             Both:        1,39 GB reserved,     440,81 MB ( 31%) committed.

Class space increase compared with base: 11.38 MB
Average alignment loss per Klass compared with base: 277 bytes

-----------

Interpretation:

As expected, with 10 bits class space consumption went up somewhat, by
about 11 MB. Non-class space stayed stable since it was still using
standard metaspace alignment.
RSS wobbled too much for this small difference in class space size to be
noticeable above the background noise (despite +AlwaysPreTouch).

============

2) Since the first test used classes artificially generated by me and
therefore may be skewed, I also did a simple test with the Springboot
petclinic. I started the petclinic and measured after it came up. At that
point, the petclinic loaded about 15000 classes. I used the same VM
arguments as (1).

Results:

3 bit alignment (base):

[0.059s][info][metaspace] Compressed class space mapped at:
0x00000abcde000000-0x00000abd1e000000, reserved size: 1073741824

[0.059s][info][metaspace] Narrow klass base: 0x00000abcde000000, Narrow
klass shift: 3, Narrow klass range: 0x40000000

RSS: 841-866 MB

Metaspace footprint:

  Non-class space:       64,00 MB reserved,      62,56 MB ( 98%) committed,
 8 nodes.
      Class space:        1,00 GB reserved,       9,38 MB ( <1%) committed,
 1 nodes.
             Both:        1,06 GB reserved,      71,94 MB (  7%) committed.

------

10 bit alignment:

[0.060s][info][metaspace] Compressed class space mapped at:
0x00000abcde000000-0x00000abd1e000000, reserved size: 1073741824

                                                         [60/571]
[0.060s][info][metaspace] Narrow klass base: 0x00000abcde000000, Narrow
klass shift: 10, Narrow klass range: 0x40000000

RSS: 849-850 MB
Metaspace footprint:
  Non-class space:       64,00 MB reserved,      62,62 MB ( 98%) committed,
 8 nodes.
      Class space:        1,00 GB reserved,      15,69 MB (  2%) committed,
 1 nodes.
             Both:        1,06 GB reserved,      78,31 MB (  7%) committed.

Class space increase compared with base: 6.31 MB
Average alignment loss per Klass compared with base: 441 bytes

------

9 bit alignment (to see if avg alignment loss changes significantly with
512 bytes alignment):

[0.059s][info][metaspace] Compressed class space mapped at:
0x00000abcde000000-0x00000abd1e000000, reserved size: 1073741824

[0.059s][info][metaspace] Narrow klass base: 0x00000abcde000000, Narrow
klass shift: 9, Narrow klass range: 0x40000000

RSS: 836-861 MB
Metaspace footprint:
  Non-class space:       64,00 MB reserved,      62,62 MB ( 98%) committed,
 8 nodes.
      Class space:        1,00 GB reserved,      12,88 MB (  1%) committed,
 1 nodes.
             Both:        1,06 GB reserved,      75,50 MB (  7%) committed.

Class space increase compared with base: 3.5 MB
Average alignment loss per Klass compared with base: 245 bytes

-----------

Interpretation:

Mirroring the results from (1), class space consumption went up a bit, but
was not noticeable above RSS wobbling from run to run.
Reducing alignment from 10 to 9 bits reduces avg loss per Klass. So, the
Jury may still be out which alignment is more effective, but the
differences are small.

============

Conclusion:

I think this approach makes sense. The increased class space size should
easily be recoverable by reduced heap size if Lilliput is successful in
doing that.

In my opinion, this approach allows us to easily investigate other forms of
object headers. It allows for Klass structures to stay variable-sized, to
be huge if they want to be huge, we do not have to artificially limit the
number of classes, we do not have to split classes into near- and far
classes, and do not have to split Klass into several components. If we
want, we can do all that later, but for now, it could just work with the
existing hotspot and only minor modifications.

I also think that keeping Klass in metaspace has a number of advantages:
- fast, arena-style allocation
- despite being in an arena, we get free-block management
- we can continue to use the memory reclamation mechanism of metaspace on
class unloading
- we have monitoring tools in place

Basically, I feel that if we invented a different scheme to store Klass
structures, we would eventually re-invent most of that.

If you guys think this is a good alley to investigate further, the next
steps would be:

- make Klass pointer encoding work on all 64-bit platforms with arbitrary
bases and shifts, and maybe optimize it further. For now this only works on
x64.
- Fix CDS to work with the new alignment

Thanks, Thomas

[1] https://github.com/openjdk/jdk/pull/5518
[2] https://github.com/openjdk/lilliput/pull/13
[3]
https://github.com/tstuefe/ojdk-repros/blob/master/repros8/src/main/java/de/stuefe/repros/metaspace/InterleavedLoaders.java

On Fri, Sep 10, 2021 at 1:11 PM Thomas Stüfe <thomas.stuefe at gmail.com>
wrote:

> Hi,
>
> Would it be of use for Lilliput to shrink the class pointer size beyond 32
> bit? I did not closely follow the discussions. Therefore I am not sure
> where the current thinking goes.
>
> If yes, maybe we could reduce the pointer size not only by reducing the
> encoding range but by using larger alignments.
>
> We encode with add-and-shift, as we do with compressed oops. Traditionally
> the shift was 3, since sizeof(void*) is the alignment requirement for
> metaspace allocations. This shift was used to enlarge the coverage of class
> pointer encoding from 4GB to 32GB (KlassEncodingMetaspaceMax). But we never
> used this to my knowledge since we limit class space size to 3GB at most.
> And nobody needs 32GB class space anyway. So there was never a reason to
> cover more than (3GB + <cds size>). Unless I missed something, the shift
> had been useless. In fact, we recently removed the shift if CDS is on
> (JDK-8265705) to solve an unrelated aarch64 issue, and nothing bad happened.
>
> But we could use the shift, not to enlarge the encoding range but to
> reduce the class pointer size. And we could use a larger shift value. For
> example, let's say we shift 8 bits. Then cut off those bits and reduce the
> class pointer to 24 bits.
>
> The resulting alignment would be 256 bytes. Applied to all metaspace
> allocations such an alignment would be prohibitively expensive, since most
> allocations are very small. But if we apply this larger alignment to the
> class space only, leave the rest of the metaspace alone, it is not so bad.
> Before JEP 387, using different alignments would have been difficult to
> implement, but metaspace coding is much more modular now, and using
> different alignments for the different regions can be done.
>
> So we apply the larger alignment only to Klass structures. Klass
> structures are large, and the relative loss due to alignment would matters
> less. They are variable-sized but sizes are clustered between ~512 bytes
> and ~1K. They can get much larger than that, but that is rare. Alignment
> loss would be between 0-255 bytes, lets say on average 127. For a typical
> larger app of 10000 classes, this would waste ~1.2MB. If that is acceptable
> depends on what positive effect the smaller compressed class pointer has on
> project Lilliput.
>
> ---
>
> One could argue that using an 8 bit shifted class pointer emans it stops
> being a pointer and becomes an index into a table of 256-byte-slots,
> populated with variable-sized Klass structures. With Klass sizes clustered
> between 512 bytes..1K each Klass would populate 2..4 slots on average. The
> 24-bit pointer is enough to address 16mio slots, hence on average 4..8
> million Klass structures, still covering a 4G total range.
>
> We could further slim down the class pointer if we agree on a lower
> maximum number of classes. E.g. with 22 bits, we could address 4mio slots
> and house about 500k...1mio classes, still allowing for a maximum encoding
> range of 1G.
>
> We could play around with these variables. E.g. a larger shift of 10 bits
> - 1KB alignment - would mean most Klass structures occupy just one slot, we
> would have to live a somewhat higher alignment waste of 0...1024, but now
> can reduce the encoded class pointer to 20 bits, still being able to
> address 1 mio slots resp. close to 1mio classes, with the total encoding
> range still covering a 1GB.
>
> ---
>
> I think this approach is a variant of the
> Klass-structures-in-a-table-and-store-the-index approach, but it allows for
> those rare Klass structures to be larger than a single table slot and it
> has a much larger max. cap on the number of classes than if we were just to
> limit the encoding range. To me this matters somewhat because I have seen
> productive installations where the number of classes was the low 100000's.
> I don't think the 8192 limit cited in the Lilliput Wiki is practical.
>
> If I am right this approach should not require a lot of changes:
> - we would need to modify metaspace to use separte alignments for the
> class space
> - may have to fix class pointer encoding for the various platforms if they
> don't work with larger shifts out of the box, or are inefficient. E.g. on
> x64, we use LEAQ to encode pointers, and LEAQ allows for a max. shift of 3,
> so for shift=8 we may need to use separate add and shift.
> - CDS may need some work too, since the Klass structures in the CDS region
> need to be aligned to the larger alignment as well.
>
> Hope I did not make some gross miscalculation somewhare, but that's my
> idea. What do you think.
>
> Thanks, Thomas
>