Far classes

Tue Jun 18 12:23:30 UTC 2024

Hi,

Roman reminded me that we don't have a fallback plan yet for running over
the number of classes representable with an nKlass (be it 22 bits or
smaller). Therefore I would like to air my brain a bit and discuss the
current ideas.

We want to have "near" and "far" classes. The first N classes, N depending
on the bit size of a nKlass, are "near" classes. Klass* for near-class
objects are derived directly from the nKlass in their MW.

Classes loaded after that would be "far" classes. Where to store Klass* for
far-class objects? Probably as part of the Object. But for completeness's
sake let's look at alternatives first, even if they sound stupid:

1) An Object-to-Klass* hashmap. We would pay at least 16 bytes per entry,
plus some overhead, and lookup gets expensive once the map degenerates.
Growing and rehashing would be non-trivial. A GC would have to
remove/reinsert mapping entries for every object it moves.

2) A "shadow-heap" - a mostly uncommitted address range mirroring the heap,
containing Klass* at positions where the MW of far-class objects are
located in the real heap. Lookup would be fast. The GC would have to update
those shadow locations when moving objects. Uncommitting unused shadow heap
pages after evacuation is non-trivial. Process vsize can become a problem
for very large heaps. Footprint rises steeply with a rising number of
far-class objects due to page granularity. Worst case, we double the heap
size footprint.

Both sound appealing, and would get unfeasible if the number of far-class
objects rises.

So: 3) Store Klass* in the object:

We dedicate one bit in the nKlass for "is-far-class". For far classes, we
store the Klass* at the end of the object. Then we encode the offset of the
Klass* slot in the remaining nKlass bits.

That depends on max. object size. How large does an object get? I found no
limit in specs. However, the size of an object depends on its members, and
we have an utf-8 CP-entry per member, and the number of CP entries is
limited to 2^16. So, an object cannot have more than 65535 members (a bit
less, actually). Therefore, I think it cannot be larger than 64k heap words.

To encode this, we need 16 bits, and the additional "is-far-class" bit. So,
with this technique, we could reduce the nKlass size to 17 bits. The cost
would be +8 bytes per far-class object. Only if we store a raw Klass*
instead of some form of nKlass. Storing raw Klass* would mean the Klass
does not have to live in the class space, and we can stop worrying about
class space size. Storing a trailing 32-bit nKlass would mean we have a
chance of just filling the alignment gap before the next object, and not
pay for size increase at all.

We could even get down to 16 bits for the MW-stored nKlass, if we agree on
aligning the Klass* slot trailing the object to 16 bytes. In that case, we
can encode the Klass* slot offset with 15 bits and have the "is-far-class"
as the 16th bit. Then, we could extract the nKlass from the MW with a
16-bit move. This would cost us: On average, another four bytes of overhead
per far-class object, and a halved value range for near class IDs.

Did I make any thinking errors? Overlook something? Does any of this make
sense?

Thank you for reading, and Cheers,

Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/lilliput-dev/attachments/20240618/b0f41dd3/attachment-0001.htm>