[External] : RE: Question on the inline type flattening decision

Wed Jul 12 07:22:23 UTC 2023

Hi John, Fred,

Thanks for the so detailed explanation for the flattening policy, which really makes sense!

As a summary, maybe we could change the condition like:

```
if (!too_big_to_flatten &&
     (!(too_atomic_to_flatten | too_volatile_to_flatten) || fieldinfo.access_flags().is_final())) {
...
}
```
That is:
1. For non-atomic klass instances (i.e. Q-type), either the final or non-final fields can be flattened if the size is matched.
2. For atomic klass instance (i.e. the default value klass) or volatile fields, only the final fields are flattened if the size is matched.

The main change is promoting the flat size check at the first stage. Other checks are applied under this condition is matched. 
Is this right?

Please correct me if anything is not right! Thanks!

Best Regards,
Xiaohong

-----Original Message-----
From: John Rose <john.r.rose at oracle.com> 
Sent: Wednesday, July 12, 2023 2:31 AM
To: Frederic Parain <frederic.parain at oracle.com>
Cc: Xiaohong Gong <Xiaohong.Gong at arm.com>; valhalla-dev at openjdk.org; nd <nd at arm.com>
Subject: Re: [External] : RE: Question on the inline type flattening decision

On 10 Jul 2023, at 8:09, Frederic Parain wrote:

> Hi Xiaohong,
>
>
> Field flattening has two major side effects: atomicity and size.

Yes!  Well put.

Here’s some more fine print:

Atomicity of a value class will be something that its class declaration can opt out of.  For a class that is non-atomic, then (I think) both final and non-final instance fields of its flattenable type (the null-excluding type, aka the “val type” or Q-type) can use the same policy.

For a value class which is atomic (and that is the default), it will not be possible (until the day we have efficient HTM) to flatten fields of that type, if they are mutable.

(There’s even more fine print for nullable reference types from value classes, if the VM ever tries to inline nullable types, but that’s way in the future and will not be user visible.)

>
> Final fields are not subject to atomicity issues because they are immutable after their initialization.

So the current policy treats non-final instance fields as flattenable, which means it treats them as immutable.  This is 99.99% correct.

There is a technical debt here concerning what sorts of indeterminate behavior are allowed in the 0.01% case, where (a) the constructor for the object containing the flattenable field allows “this” to escape and another thread picks it up, and (b) the other thread makes a racing read of the flattenable field at just the wrong moment.  Here’s the debt:  Either we do not flatten the field (at least when we know, statically, that this bad thing can happen) or else we somehow delay the racing read until a safe moment (by means of a mutex protocol of some sort), or (yet again) we somehow detune the JVMS to allow atomic value classes to race, if their containers are so rude as to allow concurrent reads through escaped “this” pointers.  I think the most practical option is the first, which means, sadly, the 99.99% correct policy for final fields needs reconsideration.  But maybe I’ve missed some fortunate aspect in the current policy, that allows it to avoid the 0.01% error.

(It’s corner cases like this that make JVM design exceedingly difficult.  Most language specs. and runtimes don’t bother to track all the details to this level, but Java does.)

>
> Both final and non-final fields have an impact on the object size, and potentially on cache behavior.

This is true for instance fields, because there are an indefinite number of instances of them.  It is not really true for static fields, and that’s a distinction that can have an effect on flattening policy.
The existing policy never flattens static fields; all the code quoted in this thread is for non-static fields.

<digression topic=“flattening static fields”>

There would be zero benefit, and some harm, to flattening static final fields.  The harm is to startup time, when that is dominated by interpreter performance. The JIT doesn’t care either way; it’s a compile-time constant.

Non-final statics are also very different from non-final instance fields, so it is reasonable to use a different policy for them as well.  Since statics are inherently shared across threads, maybe the atomicity issue is more strongly felt; maybe.  Or maybe that’s why we have “volatile”, to mark fields where we really care about that.  The current policy makes all static reads and writes fully race-free, at the cost of heap-buffering each stored value.

In any case, it is good that the flattening policy code in the Valhalla VM has separate branches for static and non-static fields.

But, I have sometimes wondered if it would be a good idea to have the VM buffer flattened static non-finals secretly in length-one arrays, and tell the getstatic and putstatic opcodes to go look there for their payloads.  It would be a little wasteful, but not much.  The array references would be rooted immutably in the class mirror object, just as if the fields were plain references.  Unlike other heap buffers, a length-one array creates a mutable variable for a value.  But it would make such non-final statics be much more racy.  Maybe its better to tilt to the side of non-raciness, which is what the current policy does.

None of these musing should be taken as a call to consider flattening statics directly in the normal container for static fields, which is the Class mirror object for the class declaring the statics.  This is a wild and tricky tactic, which would probably become unmanageable (for several reasons) if we tried to wedge flattened fields into the poor Class mirror.  Class mirrors are weird enough already.

</digression>

> Bigger objects are less likely to fit in data caches, and bigger 
> distances between fields would require more cache lines and more cache 
> misses to read them. This issue is not significant when accessing fields of a single object, but it can become dominant when accessing fields of objects stored in a flat array.

It’s an interesting tradeoff:  Indirections almost certainly depart to new cache lines, while if you pile up enough size in flattened variables, then you start departing the cache line just to get to the other side of a single object.  (HW prefetchers often favor contiguous block accesses, which make it favorable longer.)

Also, and semi-independently, memory traffic correlates with cache line traffic.  So if your workload is very flat and cache-line-local, but it loads a bunch of useless bits in every flat object, those useless bits will have a similar effect to (prefetchable) cache line departures.
This can happen even if all the objects fit in one cache line, if the alternative was to have two objects fit in each cache line, in a different organization of the data.  The enabling condition there is that an object might have “hot” and “cold” fields, in which case flattening the “cold”
fields will incur a tax (in data case traffic) on access to the “hot”
fields.  Because loading a cold field you don’t need into a full data cache will displace some other object’s hot field which you do need.

The way I like to think about this latter effect is to envision, on the one hand, everything flattened as much as possible, with no pointers or headers loaded into the cache, but maybe with some “garbage” bits mixed into the flat data.  (“Garbage” bits are, for example, bits which are zero 99% of the time.)  And on the other hand, everything indirected through pointers, which means every non-garbage data reference has to thread through a pointer and jump past a header (loading those items into the cache, AND making a non-prefetchable load), but also enjoying the freedom from loading flattened garbage data.  You can flatten too much, if what’s flattened into your containers has low entropy, and eventually it can cause enough data cache traffic that you wish for your pointers back, so you can refrain from loading cold garbage at the other end of some of those pointers.

>
> So, in theory, the flattening test should be:
>
>   if (!((!fieldinfo.access_flags().is_final() && 
> (too_atomic_to_flatten || too_volatile_to_flatten))
>         || too_big_to_flatten)) {
>
> Atomicity constraints are considered only for non-final fields, and 
> size constraints are considered for all fields.
>
>
> That being said, we have always been more aggressive in the flattening 
> of final fields because it was beneficial to C2.

Yes.  There’s a debt to pay here, though.  It might be that we end up rethinking this policy, regarding non-static final fields, for the two cases of declared-atomic (the default) and declared-non-atomic (the racy power-user option).

— John