[lworld] Handling of missing ValueTypes attributes

Thu Jul 19 22:13:42 UTC 2018

On Jul 13, 2018, at 2:45 AM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> Hi Karen,
> 
> On 11.07.2018 21:42, Karen Kinnear wrote:
>>     2a. flattening in containers - flattenable fields and array - require check of value type vs.
>> ACTUAL loaded type
>>           - this is easy, we preload types
> 
> Yes, no issues here.
> 
>>     2b. JIT scalarization of field access - must be an ACTUAL value type and must be flattenable
>>        This will only work for fields that the JIT’d caller believes are value types, the declarer
>> believes are value types and the declarer does an ACTUAL check
>>        Need caller-callee agreement for a JIT’d caller.
> 
> I don't understand what you mean by caller/callee in the context of a field access?
> Also, I'm not sure what you mean by "scalarization of field access"?

Here's an obtuse answer, although I'm probably missing the
acuteness of your point:

(caller : callee : method)  ::  (accessing object : containing object : field)

The representation choices, and the negotiation of a shared understanding
of those choices, are similar on the two sides of the "::".

A resolved field consists of an offset (small integer) into its container,
plus an indication of its type (if necessary).

Scalarization of field access means that the resolved field is loaded
or stored as contiguous subfields.  A non-scalarized value-type field
is secretly stored as a buffer pointer.  Resolution of an accessing
class's field access involves a check to determine the true status
of the field in the containing class.  To avoid crashes, we must either
throw an error if resolution detects a mismatch, or prepare the
accessing class to access the field in its correct format.  In LW1
we take the former option, which is very strict but fine for initial
experimentation.

The rest of this note is squarely post-LW1; here goes:

Several different formats can be conceived of for value type fields:

- inline-struct:  a contiguous block of memory directly inside the container, aligned and/or padded
- inline-exploded:  components puzzled in wherever they fit in the containing object (or registers)
- word-wise-exploded:  same as inline-struct, but with successive word images separately register-allocated (used by ABIs)
- object-like-ref:  a reference in the container (e.g., compressed oop), indirecting to a hidden buffer node in the shared heap
- thread-local-ref:  a reference in a thread-confined variable (register), indirected to a thread-local heap node
- ref-to-federated:  a reference in the container, indirecting to a hidden buffer area contiguous with the containing object
- off-heap-ref:  a reference in the container, indirecting to a hidden buffer area not in the main heap
- inline-with-souvenir-ref: components stored inline, plus optional (nullable) reference to a shareable copy

The techniques separate broadly into a choice between inlined components
vs. (hidden) references to buffered values.  There are other choices after that.

IR graphs of course mainly use the exploded format, allocating component
values independently to registers, stack, and/or nowhere.  The exploded
format is also good for out-of-line calling sequences.

The souvenir format uses the most storage, but is very helpful in IR graphs,
since it facilitates conversion between exploded and buffered forms.
It is also sometimes used in calling sequences, for the same reason.

In memory, the exploded format minimizes fragmentation overhead by
treating sub-fields exactly like fields, reordering them so that bytes are
next to bytes, longs next to longs, etc., without regard for the boundaries
of distinct values stored within the same container.  The trade-off
is that the fields must be separately located in the container, which
means the CP cache must be equipped with multiple offsets, perhaps
one for each value type field.  It also makes hash of the Unsafe API.
But if we want maximum packing, we could work the details, without
changing any higher-level APIs or specifications.

The exploded format also has potential for highly packed flat arrays,
since fragmentation overheads are multiplied by the length of the array,
and thus potentially worth reducing.  A two-tiered flat array would
distribute blocks of objects tightly packed into cache lines, packing
each block the same way, but as tightly as possible, as if the block
were an object containing a fixed number of values.  Indexing would
compute first the block (using a divide) and then the index within
the block (using a modulo).  Individual fields would picked up
a varying computed offsets within the block.

We aren't considering ref-based storage other than object-like
and thread-local, but I put them there for the record.  The GC
might be able to eliminate headers and retain locality with a
federated format.  It could do this secretly and at its own
discretion, like we do today for thread-locals.  Perhaps
it would federate when the buffered value is unique to
its container.  But the complexity would be high, and
historically we'vehad plenty of trouble, and enough payoff,
implementing the simpler techniques.

Any ref-based format has the physical potential to hold a null,
or to be part of a reference cycle.  The JVM should (a) enforce
invariants that exclude such things, and (b) be robust if a bug
breaks such an invariant.  Probably value types will be able
to contain cycles, but *only* through explicit Java references.
Consider a buffered value that has an Object field that just
happens to refer back to the same value.  There's no way to
exclude such cycles systematically.

Nulls can be systematically excluded from "secret" references,
by simply taking special action when a null is observed, before
a bytecode can get hold of it.  Two special actions are relevant:
(1) Throw NPE, and (2) substitute a defaultvalue.  The first
is useful when legacy code might have sent a null, and we
decide that the user model must exclude this with an exception.
The second is also useful with legacy code, in the more unlikely
case where we decide to substitute the default value for null.
(This is sometimes a requested feature, but not necessarily
one we should agree to pay for.)  The second is useful for
bootstrapping, sometimes:  If a hidden reference is part of
an object's layout, and the object is initialized to all-zero-bits,
then we want to bootstrap the hidden reference to a non-zero
pointer to its buffered default value.  But if we can't reliably
do that in all cases, then any read of that field must be prepared
to zero-check the field and substitute the missing default.
Perhaps we can avoid this in all cases, but I think we
currently use this trick, to simplify object initialization.
We might also need it to bootstrap the defaultvalue bytecode
itself, by lazily creating the canonical buffered default
the first time it is actually used.

> If compiled code loads a value type from a field (no matter if the field is flattenable or not), it
> will scalarize the value type if it's not null (i.e., pass it on in registers or on the stack). This
> relies on the fact that at compile time, the field type is loaded if it's a value type.

This is a case where the accessing/using/calling code wants a value,
and intends to immediately explode it into components.  In that case,
if the container/used/callee object contains a reference, any null must
be handled by (1) or (2) above, depending on user model.  If the
null is supposed to be impossible, the JVM should probably still
fail gracefully if the bad thing appears.

If the caller and callee differ on value-ness, then we have a tug-of-war
between opinions.  For starters, we should do what's convenient if the
problem arises, and document it.  Those are the points where we need
to elevating the first implementation to a robust user model (post-LW1).

> 
>>     2c. JIT calling convention - scalarization of arguments
>>        Need either the caller-callee in agreement if both compiled OR
>>        For caller calls by reference, adapter that can scalarize arguments it knows are ACTUAL value
>> types
>>        Today adaptor is created at callee link time, so we explicitly load types in local methods in
>> the ValueTypes attribute so they can be scalarized
> 
> Yes, but we don't support that for LW1 because there are still lots of other issues to sort out
> before we can re-enable -XX:+ValueTypePassFieldsAsArgs (for example, the issue with lambda forms and
> linkTo* calls).

The LFs and linkTo calls will have to use buffered values uniformly.
That implies that the linkTo calls will have to be given the ability to
scalarize on the fly.  This probably implies pre-generated adapters.
Post-LW1 (or intra-LW1 at best) if we don't have it now.  I don't know
a better approach, though I think Roland and I were on the verge
of something in this discussion:

http://mail.openjdk.java.net/pipermail/valhalla-dev/2018-May/004273.html

One idea would be to expand ref-based calling sequences in place
by exploding into additional argument positions, obtained by sneaking
more space from the caller's top-of-stack area.

We could really use a handshake which allows a callee to borrow more
space in the caller's argument list, but secretly pay it back on return.
This is necessary for tail-calls also:  The chain of tail calls can require
an unbounded amount of extra TOS to store parameters of the various
tail calls, all without the cooperation of the original non-tail caller of the
tail-call chain.  A good problem to think about when not otherwise occupied.

> 
>>      2d. JIT returning a value type
>>         I do not know our plans for value type return optimizations.
> 
> The plan is to re-enable -XX:+ValueTypeReturnedAsFields for lworld once we have sorted out the
> calling convention issues.

+100

> 
>>         The adaptor for returns are stored off of the return type, so they know the ACTUAL value.
> 
> Returns do not use any adapters but we do some kind of handshaking between the caller and the callee
> to make sure that they agree on the type (see page 26/27 of [1]).
> 
>>         In general we can check caller-callee consistency so we can be in agreement about whether a
>> type is a value type.
>>         The exception is the JavaCalls::call_helper path used by Reflection, jni (and others internally)
>>             - I assume we will always return a reference here (I have not studied the details yet,
>> so I don’t know where that is handled)
> 
> I'm not sure about the reflection/jni case but the short answer would be "we don't support
> scalarization of the return value for LW1". And I don't see any problems with the current code.

Like the linkTo* methods, JNI access will have to adapt exploded (scalarized)
arguments and return values to buffered (ref) ones.

If we can solve the TOS-borrowing problem (described above), we can
arrange some pretty reasonable adapters that can shift between ref-based
and exploded formats.

For bonus points, consider designing a calling sequence which includes
souvenirs.  They are useful, and the exercise might simplify the adapter
logic:  When going from references-only to exploded-with-souvenirs,
you don't change any reference arguments at all, just push in the
exploded components *after* them in the argument list.  (This works
fine because of the way we allocate calling sequences left-to-right.)

Going from exploded-with-souvenirs to references-only is a no-op
(actually, a tail-call) since the callee can just ignore the extra trailing
arguments.  Actually, any null souvenirs would have to be fixed up
to hold non-null references to buffered values, just like in local IR.

Thus, one calling sequence subsumes the other.  This might make
LF-based calls (using linkTo*) easier to generate, since they would
just send exploded-with-souvenir arguments, and let the callee
decide independently whether to ignore or use the components.

As Roland pointed out, one hard part about on-the-fly explosion
is borrowing the extra TOS from the caller, but in the case of LFs
that could be hardwired, since the full size is known before the call.

HTH

— John

> 
>> Details:
>> 1. MethodHandles - invocation and field access always goes through LinkResolver at this point.
>> There are two exceptions here:
>>     - one is when the MethodHandle creation does NOT pass in the calling class information
>>       - in that case there is no check for caller-callee consistency, we need to look at this
>> independently
>>     - one is invokespecial indirect superclass (ACC_SUPER) which performs selection in the java code.
>>        - That is a rathole I won’t follow here - we should fix that anyway - multiple potential
>> approaches.
>> 
>> 2. Reflection:
>>    optimized reflection generates bytecodes, so goes through bytecode path, so goes through
>> LinkResolver.
>>    initial reflection calls JavaCalls::call->JavaCalls::call_helper
>> 
>> 3. JNI:
>>    also goes through JavaCalls::call_helper
>> 
>> JavaCalls::call_helper calls call_stub to invoke the entry_point which is:
>>    normally: method->from_interpreted_entry
>>    debug: method->interpreter_entry
>> 
>> For argument passing, my assumption is that we are ok with the JavaCalls::call_helper path because
>> it always passes by reference
>> and uses the callee adapter from interpreter which knows the declared value types that can be
>> scalarized. So the same adaptor that works for
>> interpreted code works for call_helper where the caller always assumes everything is a reference and
>> passes by reference.
>> 
>> JIT folks - does this work in practice?
> 
> Yes, that seems reasonable but it's very hard to figure this all out by code inspection. I think
> what we need is more tests to find bugs and/or gain confidence in our current design/implementation.
> 
> That said, current JIT optimizations do not rely on the value types attribute but of course there
> might be bugs or implicit assumptions that do not hold.
> 
> However, Ioi's optimization (8206140) relies on the fact that an interpreted callee always knows
> when it's returning null for a value type (and can then deoptimize the compiled caller). It seems
> that the attribute consistency checks cannot guarantee that but I need to take a closer look.
> 
> My take on this is to defer all optimizations that rely on the consistency checks to after we got
> these right (and it's okay if that is after LW1).
> 
> Best regards,
> Tobias
> 
> [1] http://cr.openjdk.java.net/~thartmann/talks/2018-Value_Types_Compiler_Offsite.pdf