[lworld] Handling of missing ValueTypes attributes

Fri Jul 20 13:42:19 UTC 2018

Hi John,

thanks for providing more details here!

On 20.07.2018 00:13, John Rose wrote:
>> I don't understand what you mean by caller/callee in the context of a field access?
>> Also, I'm not sure what you mean by "scalarization of field access"?
> 
> Here's an obtuse answer, although I'm probably missing the
> acuteness of your point:
> 
> (caller : callee : method)  ::  (accessing object : containing object : field)
> 
> The representation choices, and the negotiation of a shared understanding
> of those choices, are similar on the two sides of the "::".

I was just confused by the caller/callee terms in a scenario where there is not necessarily a call
involved. Your clarification helped, thanks!

> A resolved field consists of an offset (small integer) into its container,
> plus an indication of its type (if necessary).
> 
> Scalarization of field access means that the resolved field is loaded
> or stored as contiguous subfields.  A non-scalarized value-type field
> is secretly stored as a buffer pointer.  Resolution of an accessing
> class's field access involves a check to determine the true status
> of the field in the containing class.  To avoid crashes, we must either
> throw an error if resolution detects a mismatch, or prepare the
> accessing class to access the field in its correct format.  In LW1
> we take the former option, which is very strict but fine for initial
> experimentation.

Right, I've tried to make sense of this from a perspective of the JIT doing something special for
"scalarization of field access" but as you made clear, this is just about implementing flattened
fields (basically no difference between JIT and interpreter).

For consistency checking, we need to make sure that the JIT is not missing any checks.

> The rest of this note is squarely post-LW1; here goes:
> 
> Several different formats can be conceived of for value type fields:
> 
> - inline-struct:  a contiguous block of memory directly inside the container, aligned and/or padded
> - inline-exploded:  components puzzled in wherever they fit in the containing object (or registers)
> - word-wise-exploded:  same as inline-struct, but with successive word images separately register-allocated (used by ABIs)
> - object-like-ref:  a reference in the container (e.g., compressed oop), indirecting to a hidden buffer node in the shared heap
> - thread-local-ref:  a reference in a thread-confined variable (register), indirected to a thread-local heap node
> - ref-to-federated:  a reference in the container, indirecting to a hidden buffer area contiguous with the containing object
> - off-heap-ref:  a reference in the container, indirecting to a hidden buffer area not in the main heap
> - inline-with-souvenir-ref: components stored inline, plus optional (nullable) reference to a shareable copy
> 
> The techniques separate broadly into a choice between inlined components
> vs. (hidden) references to buffered values.  There are other choices after that.
> 
> IR graphs of course mainly use the exploded format, allocating component
> values independently to registers, stack, and/or nowhere.  The exploded
> format is also good for out-of-line calling sequences.
> 
> The souvenir format uses the most storage, but is very helpful in IR graphs,
> since it facilitates conversion between exploded and buffered forms.
> It is also sometimes used in calling sequences, for the same reason.
> 
> In memory, the exploded format minimizes fragmentation overhead by
> treating sub-fields exactly like fields, reordering them so that bytes are
> next to bytes, longs next to longs, etc., without regard for the boundaries
> of distinct values stored within the same container.  The trade-off
> is that the fields must be separately located in the container, which
> means the CP cache must be equipped with multiple offsets, perhaps
> one for each value type field.  It also makes hash of the Unsafe API.
> But if we want maximum packing, we could work the details, without
> changing any higher-level APIs or specifications.
> 
> The exploded format also has potential for highly packed flat arrays,
> since fragmentation overheads are multiplied by the length of the array,
> and thus potentially worth reducing.  A two-tiered flat array would
> distribute blocks of objects tightly packed into cache lines, packing
> each block the same way, but as tightly as possible, as if the block
> were an object containing a fixed number of values.  Indexing would
> compute first the block (using a divide) and then the index within
> the block (using a modulo).  Individual fields would picked up
> a varying computed offsets within the block.
> 
> We aren't considering ref-based storage other than object-like
> and thread-local, but I put them there for the record.  The GC
> might be able to eliminate headers and retain locality with a
> federated format.  It could do this secretly and at its own
> discretion, like we do today for thread-locals.  Perhaps
> it would federate when the buffered value is unique to
> its container.  But the complexity would be high, and
> historically we'vehad plenty of trouble, and enough payoff,
> implementing the simpler techniques.
> 
> Any ref-based format has the physical potential to hold a null,
> or to be part of a reference cycle.  The JVM should (a) enforce
> invariants that exclude such things, and (b) be robust if a bug
> breaks such an invariant.  Probably value types will be able
> to contain cycles, but *only* through explicit Java references.
> Consider a buffered value that has an Object field that just
> happens to refer back to the same value.  There's no way to
> exclude such cycles systematically.
>
> Nulls can be systematically excluded from "secret" references,
> by simply taking special action when a null is observed, before
> a bytecode can get hold of it.  Two special actions are relevant:
> (1) Throw NPE, and (2) substitute a defaultvalue.  The first
> is useful when legacy code might have sent a null, and we
> decide that the user model must exclude this with an exception.
> The second is also useful with legacy code, in the more unlikely
> case where we decide to substitute the default value for null.
> (This is sometimes a requested feature, but not necessarily
> one we should agree to pay for.)  The second is useful for
> bootstrapping, sometimes:  If a hidden reference is part of
> an object's layout, and the object is initialized to all-zero-bits,
> then we want to bootstrap the hidden reference to a non-zero
> pointer to its buffered default value.  But if we can't reliably
> do that in all cases, then any read of that field must be prepared
> to zero-check the field and substitute the missing default.
> Perhaps we can avoid this in all cases, but I think we
> currently use this trick, to simplify object initialization.
> We might also need it to bootstrap the defaultvalue bytecode
> itself, by lazily creating the canonical buffered default
> the first time it is actually used.

Thanks for this great summary, very helpful!

>> If compiled code loads a value type from a field (no matter if the field is flattenable or not), it
>> will scalarize the value type if it's not null (i.e., pass it on in registers or on the stack). This
>> relies on the fact that at compile time, the field type is loaded if it's a value type.
> 
> This is a case where the accessing/using/calling code wants a value,
> and intends to immediately explode it into components.  In that case,
> if the container/used/callee object contains a reference, any null must
> be handled by (1) or (2) above, depending on user model.  If the
> null is supposed to be impossible, the JVM should probably still
> fail gracefully if the bad thing appears.

Right, currently we handle this by deoptimizing from compiled code.

> If the caller and callee differ on value-ness, then we have a tug-of-war
> between opinions.  For starters, we should do what's convenient if the
> problem arises, and document it.  Those are the points where we need
> to elevating the first implementation to a robust user model (post-LW1).

Yes, I agree. We need more tests for all the different cases.

> The LFs and linkTo calls will have to use buffered values uniformly.
> That implies that the linkTo calls will have to be given the ability to
> scalarize on the fly.  This probably implies pre-generated adapters.
> Post-LW1 (or intra-LW1 at best) if we don't have it now.  I don't know
> a better approach, though I think Roland and I were on the verge
> of something in this discussion:
> 
> http://mail.openjdk.java.net/pipermail/valhalla-dev/2018-May/004273.html
> 
> One idea would be to expand ref-based calling sequences in place
> by exploding into additional argument positions, obtained by sneaking
> more space from the caller's top-of-stack area.
> 
> We could really use a handshake which allows a callee to borrow more
> space in the caller's argument list, but secretly pay it back on return.
> This is necessary for tail-calls also:  The chain of tail calls can require
> an unbounded amount of extra TOS to store parameters of the various
> tail calls, all without the cooperation of the original non-tail caller of the
> tail-call chain.  A good problem to think about when not otherwise occupied.

Yes, I think there are several options that we need to evaluate carefully. Now that we reached some
reasonably stability with the LW1 prototype, I can hopefully spend some time on thinking this through.

>> I'm not sure about the reflection/jni case but the short answer would be "we don't support
>> scalarization of the return value for LW1". And I don't see any problems with the current code.
> 
> Like the linkTo* methods, JNI access will have to adapt exploded (scalarized)
> arguments and return values to buffered (ref) ones.
> 
> If we can solve the TOS-borrowing problem (described above), we can
> arrange some pretty reasonable adapters that can shift between ref-based
> and exploded formats.
> 
> For bonus points, consider designing a calling sequence which includes
> souvenirs.  They are useful, and the exercise might simplify the adapter
> logic:  When going from references-only to exploded-with-souvenirs,
> you don't change any reference arguments at all, just push in the
> exploded components *after* them in the argument list.  (This works
> fine because of the way we allocate calling sequences left-to-right.)
> 
> Going from exploded-with-souvenirs to references-only is a no-op
> (actually, a tail-call) since the callee can just ignore the extra trailing
> arguments.  Actually, any null souvenirs would have to be fixed up
> to hold non-null references to buffered values, just like in local IR.
> 
> Thus, one calling sequence subsumes the other.  This might make
> LF-based calls (using linkTo*) easier to generate, since they would
> just send exploded-with-souvenir arguments, and let the callee
> decide independently whether to ignore or use the components.
> 
> As Roland pointed out, one hard part about on-the-fly explosion
> is borrowing the extra TOS from the caller, but in the case of LFs
> that could be hardwired, since the full size is known before the call.

Yes, I think that's the hard part and I'm not yet convinced that it's better / easier than the
solution of having a method with two entry points (one calling the other).

Thanks,
Tobias