methods with scalarized arguments

Thu May 17 18:29:12 UTC 2018

On May 17, 2018, at 7:27 AM, Roland Westrelin <rwestrel at redhat.com> wrote:
> 
> 
> [moving to valhalla-dev]

Thanks; I'm also changing the subject line.

Folks, there are many ideas, both practical and speculative,
floating back and forth on the valhalla-spec-experts list; it's
easy to read them in the archives.  Roland is right to pull
practical discussion of implementation (of something we
know we want to do now!) to this list.

> When it comes to the calling convention, there's one extra bit of
> complexity: method handles. For a method handle invoke to a method m()
> with value arguments, there will be a lambda form with a call to a
> method handle linker. If the LF is compiled as a standalone compiled
> method, the JIT has no way to know values are expected by the method
> that's behind the linker method so it passes buffered values. If method
> m() is JIT'ed so it's expecting scalarized values, the method handle
> call fails.

Sometimes I lump the interpreter and reflection with method handle
linkers.  The MH linkers are different in that they are supposed to run
at almost full speed even when compiled, so they try hard to avoid
full permutations of the argument list.  Which leads to…

Idea:  Avoid full permutations between the entry points by passing the
arguments in an order in which the buffered calling sequence is an
initial sequence of the scalarized calling sequence.  Examples:

signature f(object, point, object)
buffered:  f(R1, R2, R3)
scalarized: f(R1, {R2,R4}, R3)

signature f(object, complex, object)
buffered:  f(R1, R2, R3)
scalarized: f(R1, {F1,F2}, R3)

signature f(object, vector, object)
buffered:  f(R1, R2, R3)
scalarized: f(R1, V1, R3)

Where A[i], F[i], V[i] are the general, floating, and vector registers
allocated internally to Java calling sequences.

Given these assignments, an adapter may need only limited data
movement to convert between calling conventions.

This idea comes from the SPARC ABI, and also from HotSpot's handling
of JNI arguments, which carefully arrange for coherence between sibling
calling sequences.  (In the case of SPARC, it allows a varargs function
to spill register-based arguments into a pre-allocated stack argument
area, which then becomes the va_list.  Thus, varargs mode is memory
only, while the normal mode is register based, with "holes" in the argument
area in case a callee wants to play at varargs.  Similar to our scalarized
vs. buffered distinction.)

> That problem is not specific to Lworld. It exists in MVT with
> __Value. We never solved it. But from previous discussions, it seems the
> way to solve that problem is for every method with value arguments to
> have 2 entry points: a scalarized values entry point and a buffered
> values entry point. In a first implementation, the buffered values entry
> point could fall back to the interpreter and the scalarized values entry
> point be eventually a JIT'ed method.

I like this.  I think you are talking mainly about single-method calls,
where there is no v-table in the way.  (Invokes of static, private, or
final methods, or invokespecial.)  We can think about these, if it helps,
as calls to degenerate v-table hierarchies of depth 1.

> Now assuming we have to have 2 method entry points, why not use the
> buffered value entry point when one of the value arguments is null (or
> maybe null)?

Are you imagining a single nmethod with two entry points?  Currently,
nmethods *do* have two entry points for distinct calling sequences.
This might add two more:  <VEP, UEP> x <Buffered, Scalarized>.

I like this line of thinking very much.

> What entry point to use at a call site could be decided at
> JIT compilation time: either all arguments are statically known to be
> non null and we can go with the scalarized values entry point or we fall
> back to the buffered values entry point. Whether the code being JIT'ed
> is legacy or not doesn't factor explicitly in the decision.

*This* is a very good property.  We could let JIT dig around inside the
implementation to see if it came from a legacy class, but that would be
smelly code, I think.  Better to have each method put out a bit-mask or
some other thing right along side its descriptor, saying "this is where
scalarization happens in my descriptor".  BTW, the bit-map could have
a fixed maximum length; 32 bits is not too small.  Scalarizing can be
restricted to normal arity methods, at least for a start.  Speculation:
Scalarization isn't as valuable for high-arity methods (arity >> 5).

Extra idea, use or toss:  Arrange the buffered and scalarized entry
points of an nmethod (or adapter) with a globally fixed offset between
them.  Then upgrading or downgrading a call is easy to do, even in
assembly code.  For extra points, put a bitmask word in the instruction
stream, immediately before the scalarized entry point, so it is crystal
clear when there is a match or mismatch between caller and callee.

— John