Unsafe API and explicit VT allocation

Fri Jul 20 22:02:23 UTC 2018

On Jul 20, 2018, at 2:53 AM, Roland Westrelin <rwestrel at redhat.com> wrote:
> 
> Hi John,
> 
> In the current code, Unsafe.getX on a value type argument either
> retrieves the corresponding field from the ValueTypeNode or if that
> can't work triggers an allocation:
> 
> http://hg.openjdk.java.net/valhalla/valhalla/file/d6e90a7411bb/src/hotspot/share/opto/library_call.cpp#l2356
> 
> (that code should also check that the access size matches the field
> size)

Yes, it's pretty rough code.  It should also check the type loaded as
well as the size.  And ciValueKlass::field_index_by_offset is a lossy
function to use with Unsafe, which can validly request *any* offset
and *any* type.  For example, it can perform bytewise loads from
long fields, and there are legitimate uses for this.

But, for *reading* a value type, it's basically the right idea.  The
semantic model is that you allocate a buffer with a stable layout,
and then you slice a few bytes from it with a load instruction.
If you can short-circuit the allocation of the buffer by recognizing
common idioms (when the slice is identical with or contained
in an exploded component of the value) then you take the win.
Otherwise you make the buffer and do the load the slow way.

> Unsafe.putX has not special logic for ValueTypeNode so using it would
> crash C2 but it could be modified with similar logic. We would create a
> new ValueTypeNode with an updated field.

For *writing* a value type there is an output to be produced,
which is the updated value.  As you say, a similar trick can be
done as with reading, but there has to be a user model for
picking up the new value.  Also, for writing, de-aliasing is
crucial.  The JIT can tell if a reference to a value instance
might be used somewhere else, and can clone the buffer
storage in that case.  The current call to vt->allocate does
*not* de-alias the buffer, which would be major bug for
writes.  Indeed, this is at the root of why value type updates
need some extra care and thought with Unsafe.

One way to do updates is with a new suite of Unsafe API points,
withInt, withByte, etc., which produce a new value instance as
output.  That would fit in the model you are proposing; is that
what you were thinking of?

> In your example the allocation is only useless, if we can tell what
> field of the value type is updated, right?

Of course.  That would work with either my API or the one
which (I think) you are implying.

> So what is the benefit of having the allocation be explicit with a new
> unsafe rather than implicit like we have today?

Because of the problem of delivering the updated value, I don't
see how to do value updating without new Unsafe API points.
If I follow your implications, I think I have to introduce a large
number of new withX functions.  There are *many* putX functions,
because of the cross-product of types and access modes.
I suppose the withX functions would only need to be across
types, so perhaps they are kind of like a new access mode.

My proposal is different:  Repurpose the putX functions to
values by introducing two new API points: bufferValue and
unbufferValue to bracket peek-n-poke in a private buffer.

I think it's a more conservative delta on Unsafe.  I don't think
it changes the optimization opportunities for C2, relative to
adding withX functions; it just simplifies the API surface.

It's also closer to the hardware to do it my way, since every
user of Unsafe knows about memory layouts already and
knows that values can be buffered (e.g., in flat array elements).
So why not let them use that knowledge, to drive the familiar
getX and putX functions, instead of new withX functions?

Also, if we push withX functions down into the JIT, then
every field update will involve a separate transaction at
the IR level (to possibly allocate and also de-alias the
memory buffer—the current implementation fails to
de-alias, which would be a major bug).  If you give the
Unsafe user control of the de-aliasing task, the JIT's job
gets simplier, and the user gets more control.

More control, simpler API, less responsibility on JIT:
That's why I think we want 2 more API points, not #types
new API points.  And if you have a way to do it with zero
more Unsafe API points, of course I'm listening.

> (I'm on PTO for 2 weeks starting tomorrow)

Enjoy; sorry I won't see you at the Summit.

— John