Far classes
John Rose
john.r.rose at oracle.com
Fri Jul 12 23:44:02 UTC 2024
On 12 Jul 2024, at 6:46, Thomas Stüfe wrote:
> Hi John,
>
> Sorry for the late response, I have been busy with other Lilliput aspects.
>
> Thanks again for your great ideas!
It’s a pleasure.
> Let's see if I get this right. The base idea is to use joint encoding to
> take the sting out of losing half of the whole nKlass value range for near
> classes with the "is-far-class" signal bit; and the fact (obvious now but
> it did not occur to me) that we can just insert the Klass* into *every*
> large class, near or far, since the overhead paid depends on instance size.
Yes, and your example is correct. I would prefer to make one change
in the joint encoding, because I think it would lead to faster decoding.
See inline.
> Let's say I have 16-bit nKlass - (I need a different name now for the datum
> in the mark word since in case of a far class, this is not a nKlass ID.
> klassWord?) - a 16-bit klassWord, and reserve 8 bits for the
> offset-into-klass.
The header bits are a union, a bit-encoded selection between far
and near: if near, which near class, and if far, where the far
pointer is. Maybe a good name for that is klassSelector or
klassCode or klassLocator.
> In that model:
> a) the largest offset I can represent is 0xFF; counting in words and adding
> 1 since I don't need offset 0 (its the header), the farthest I can point at
> is word offset 256
> a) any (near and far) class that is larger than 256 words will have an
> empty Klass* slot injected at word offset 256
> b) a far class with a small instance size will have a Klass* slot injected
> and populated at the end end of the object
> c) a far class whose instance size is > 256 will populate the Klass* slot
> prepared by (a)
> d) all far classes will set their klassWord to: lower 8 bits zero, higher 8
> bits the capped-at-256 Klass* slot offset
> e) all near classes will set their klassWord to their nKlass ID as we do
> today
> f) No near class can have a nKlass ID with all eight lower bits zero -
> aligned to 8 bit - which reduces the number of valid nKlass IDs to (64k -
> 256)
I would change d) in order to get faster decoding logic. (You might
have misread my suggested get_klass method?)
For a far class, the UPPER 8 bits should be zero and the LOWER should
be the word-scaled offset.
Here are the two decoding methods for the two design choices:
Klass* get_klass_1(uint16_t klassCode) {
if ((klassCode & ((1<<8)-1)) == 0) { // LOWER bits zero
size_t wordOffset = klassCode >> 8; //EXTRA SHIFT
return ((Klass**)this)[wordOffset];
} else {
size_t nKlass = klassCode; //WASTED ENCODINGS
return NEAR_CLASSES[nKlass];
}
}
Klass* get_klass_2(uint16_t klassCode) {
if ((klassCode & (-1<<8)) == 0) { // UPPER bits zero
size_t wordOffset = klassCode;
return ((Klass**)this)[wordOffset];
} else {
size_t nKlass = klassCode - (1<<8);
return NEAR_CLASSES[nKlass];
}
}
The second might also enable a compact flow-free decoding:
Klass* get_klass_3(uint16_t klassCode) {
bool_t is_near = klassCode < (1<<8); // 8 upper bits zero?
Klass** near_base = (Klass**) this;
constexpr Klass** far_base = NEAR_CLASSES[- (1<<8)];
Klass** base = is_near ? near_base : far_base; //CMOV
return base[klassCode];
}
But that assumes a pointer-array NEAR_CLASSES, which you are not
considering at present, AFAIK. The “clever idea” is that the
global pointer-array and the local “this” can both be viewed
as having the same type, array of Klass* pointers. The
problem with that is probably that lookups through that
global array would introduce delays, compared to “dead
reckoning” in an appropriately scaled array of near-klass
structs. But it might be worth doing the experiment.
An advantage would be, at the cost of the indexing array
NEAR_CLASSES (one extra pointer per nKlass, to be chased
every time) you can get (a) better density of the actual
Klass structs, and (b) less D$ false sharing, because of
a wider variety of nKlass base addresses.
More information about the lilliput-dev
mailing list