Far classes

Fri Jul 12 23:44:02 UTC 2024

On 12 Jul 2024, at 6:46, Thomas Stüfe wrote:

> Hi John,
>
> Sorry for the late response, I have been busy with other Lilliput aspects.
>
> Thanks again for your great ideas!

It’s a pleasure.

> Let's see if I get this right. The base idea is to use joint encoding to
> take the sting out of losing half of the whole nKlass value range for near
> classes with the "is-far-class" signal bit; and the fact (obvious now but
> it did not occur to me) that we can just insert the Klass* into *every*
> large class, near or far, since the overhead paid depends on instance size.

Yes, and your example is correct.  I would prefer to make one change
in the joint encoding, because I think it would lead to faster decoding.
See inline.

> Let's say I have 16-bit nKlass - (I need a different name now for the datum
> in the mark word since in case of a far class, this is not a nKlass ID.
> klassWord?) - a 16-bit klassWord, and reserve 8 bits for the
> offset-into-klass.

The header bits are a union, a bit-encoded selection between far
and near: if near, which near class, and if far, where the far
pointer is.  Maybe a good name for that is klassSelector or
klassCode or klassLocator.

> In that model:
> a) the largest offset I can represent is 0xFF; counting in words and adding
> 1 since I don't need offset 0 (its the header), the farthest I can point at
> is word offset 256
> a) any (near and far) class that is larger than 256 words will have an
> empty Klass* slot injected at word offset 256
> b) a far class with a small instance size will have a Klass* slot injected
> and populated at the end end of the object
> c) a far class whose instance size is > 256 will populate the Klass* slot
> prepared by (a)
> d) all far classes will set their klassWord to: lower 8 bits zero, higher 8
> bits the capped-at-256 Klass* slot offset
> e) all near classes will set their klassWord to their nKlass ID as we do
> today
> f) No near class can have a nKlass ID with all eight lower bits zero -
> aligned to 8 bit - which reduces the number of valid nKlass IDs to (64k -
> 256)

I would change d) in order to get faster decoding logic.  (You might
have misread my suggested get_klass method?)

For a far class, the UPPER 8 bits should be zero and the LOWER should
be the word-scaled offset.

Here are the two decoding methods for the two design choices:

Klass* get_klass_1(uint16_t klassCode) {
  if ((klassCode & ((1<<8)-1)) == 0) {  // LOWER bits zero
    size_t wordOffset = klassCode >> 8;  //EXTRA SHIFT
    return ((Klass**)this)[wordOffset];
  } else {
    size_t nKlass = klassCode;  //WASTED ENCODINGS
    return NEAR_CLASSES[nKlass];
  }
}

Klass* get_klass_2(uint16_t klassCode) {
  if ((klassCode & (-1<<8)) == 0) {  // UPPER bits zero
    size_t wordOffset = klassCode;
    return ((Klass**)this)[wordOffset];
  } else {
    size_t nKlass = klassCode - (1<<8);
    return NEAR_CLASSES[nKlass];
  }
}

The second might also enable a compact flow-free decoding:

Klass* get_klass_3(uint16_t klassCode) {
  bool_t is_near = klassCode < (1<<8);  // 8 upper bits zero?
  Klass** near_base = (Klass**) this;
  constexpr Klass** far_base = NEAR_CLASSES[- (1<<8)];
  Klass** base = is_near ? near_base : far_base;  //CMOV
  return base[klassCode];
}

But that assumes a pointer-array NEAR_CLASSES, which you are not
considering at present, AFAIK.  The “clever idea” is that the
global pointer-array and the local “this” can both be viewed
as having the same type, array of Klass* pointers.  The
problem with that is probably that lookups through that
global array would introduce delays, compared to “dead
reckoning” in an appropriately scaled array of near-klass
structs.  But it might be worth doing the experiment.

An advantage would be, at the cost of the indexing array
NEAR_CLASSES (one extra pointer per nKlass, to be chased
every time) you can get (a) better density of the actual
Klass structs, and (b) less D$ false sharing, because of
a wider variety of nKlass base addresses.