[master] RFR: JDK-8325104: Lilliput: Shrink Classpointers [v3]

Wed Mar 27 11:23:36 UTC 2024

On Tue, 26 Mar 2024 14:55:04 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> I still think it is odd that we return that the klass offset is 4 on big-endian machines. If we ever really try to read the klass at (obj + klass_offset_in_bytes()) on those machines we will not get the klass bits, instead we will get some of the other bits in the object header.
>> 
>> My inquiries were more if it really was a good idea to load the klass with 4 byte loads and that it seems safer (more platform independent) to stick with 8 byte loads. When we use 8 byte loads and shifts we don't have to think about endianess.
>> 
>> I glanced in the C2 code how klass_byte_in_offset is used. It looks like most places were `klass_offset_in_bytes()` is used, it is used as a sort of a token to figure out that the load (obj + offset) should be interpreted as a load of the klass. We don't actually use that offset for the load / decode of the klass. Instead we perform an 8 byte load from the start of the object, with associated 8 byte shift operation.
>> 
>> So, maybe my concern here is more that the name "klass_offset_in_bytes" is a bit misleading as it sounds like this is the offset where the klass bits are located. Code using klass_offset_in_bytes() really need to be written with the understanding that this is the case.
>
> The trouble with C2 is that, currently, loading the Klass* (pre-lilliput) has its own memory slice, and is totally immutable. With Lilliput, we currently load the Klass* from the mark-word, and need to deal with header displacement in ObjectMonitors. This currently happens in the backend and is totally opaque to C2 IR.
> 
> However, it all *could* be done in C2 IR, even more so when we get the OMWorld stuff. Then the loads would happen on their true memory slice (the mark-word) and we would do the shifting/masking in IR, too. But it is questionable if we want to do so. The fact that Load(N)Klass can currently be treated as immutable makes the node freely moving in the ideal graph. Wiring it up in the same slice as mark-word loads means that we would have to re-load the Klass* after anything that would potentially touch the mark-word, especially safepoints, calls, etc. even though we know that the actual Klass* portion of the mark-word is still, in-fact, immutable. Therefore I believe it is better to keep the LoadKlass stuff on its own memory slice, with its own offset, even if we never actually load from that offset. offset=4 is a good choice for that, because it would never clash with a true offset of a field. Long-term, something like offset=1 or 2 may be better, when we want to do 4-byte headers, f
 ields may start ot offset=4.
> 
> Also note the weird dance that we need to do in C2 .ad files to figure out the true offset. :-/
> 
> We should run this by some C2 experts to figure out the best way to deal with all that.

Thanks for adding explaining some of the background for this. It sounds like a good idea to get a C2 expert to look at this.

-------------

PR Review Comment: https://git.openjdk.org/lilliput/pull/128#discussion_r1540919324