RFR/RFC calling convention change for value types in compiled code

Mon Nov 28 08:20:41 UTC 2016

Hi,

On 11/25/2016 04:30 PM, Tobias Hartmann wrote:
> [...]
>
> [The following thoughts are out of the scope of this but should be considered for future work]
>
> Below code allocates the value type 'v' three times although only one allocation is required:
>
> 1: MyValue v = createInline();
> 2: pass_to_interpreter(v);
> 3: pass_to_interpreter(v);
> 4: v = compiled_return(v); // this compiled callee method does "vreturn v;"
>
> Allocation in line 2 is required because we call interpreted code. In line 3, we allocate again because allocation in line 2 happened in the c2i adapter and the oop is not passed back to the caller. In line 4, we have to load the field values from memory, pass them to the compiled callee and re-allocate the value type on vreturn.
>
> My idea is to solve this problem by having read/write oop arguments. Like this, we could pass an additional oop argument for each VT parameter that can be NULL and set the oop in the c2i adapter or the compiled callee after allocation happened the first time.
> In line 2, we would then pass NULL as oop argument and the adapter would set the oop after allocating the VT. In line 3, we would re-use that oop and the c2i adapter would not allocate again. In line 4, we would not allocate in the callee because we receive a non-null oop.

Passing an oop to compiled code for each VT argument could help us avoid 
some allocations (even if oops are "read-only", i.e., the way the system 
currently works).

I have the following scenario in mind:

interpreted method m1(VT v1) -> compiled method m2(VT v1) -> call 
interpreted method m3(VT v1)

All three methods are passed the same VT v1 instance. In method m1(), VT 
v1 "has" an oop because m1() is interpreted.

With Roland's suggested changes, the oop for VT v1 is not available in 
method m2(), as m2 is compiled (only v1's fields are available there). 
When the interpreted method m3() is called, the oop for VT v1 is created 
again.

If we would pass both the VT oops and fields to compiled methods, we 
could avoid (re-)allocation of VT v1 when m3() is called.

Best regards,

Zoltan

>
> This would also solve the problem of how to avoid allocations when returning to compiled code. For example:
>
> MyValue get() {
>   return createInline();
> }
>
> would always allocate the VT even if we return to compiled code. We could now pass an additional oop to an uninitialized VT as argument to get() to hold the to be returned VT. The buffered VT is then initialized in the callee if the oop argument is non-NULL instead of allocating and returning a new VT (this would of course require interpreter support). The advantage is that we can defer allocation to the caller (recursively), potentially increasing optimization opportunities and avoiding allocations in long call/return chains.
>
> We should also consider passing VTs "by value" only if the field values are available as constants or have to be loaded in the callee anyway. Otherwise, we may load field values from memory just to pass them to the callee when it may be more beneficial to just pass an oop and avoid field loads (especially for large value types).
>
> Thanks,
> Tobias
>
> On 24.11.2016 15:16, Zoltán Majó wrote:
>> Hi Roland,
>>
>>
>> On 11/24/2016 11:25 AM, Roland Westrelin wrote:
>>> http://cr.openjdk.java.net/~roland/valhalla/callingconvention/webrev.00/
>>>
>>> This patch implements a new compiled code calling convention for methods
>>> that take value type arguments: instead of passing a reference to the
>>> value type as an argument, the method is passed each field of the value
>>> type in registers/stack slots.
>>>
>>> The interpreter continues to use references at calls so when going to
>>> compiled from the interpreter (i2c adapter), value type fields are
>>> loaded from the value type object and the value type object is
>>> discarded. When going from compiled code to the interpreter (c2i
>>> adapter), value type objects for every arguments need to be allocated
>>> and initialized with the field values passed by the compiler.
>> the change looks good to me in general. Please find some minor comments below:
>>
>> = src/cpu/x86/vm/sharedRuntime_x86_64.cpp
>>
>> Don't we have one T_VOID too much here?
>>
>> 561         // flattened so, for instance: T_VALUETYPE T_INT T_VALUETYPE
>> 562         // T_INT T_LONG T_VOID T_VOID T_VOID is a value type with a
>>
>>
>> 2 minor typos.
>>
>> 563         // int field an a value type field that itself has 2 fields, a
>> 564         // int and a long
>>
>> =>
>>
>> 563         // int field and a value type field that itself has 2 fields, an
>> 564         // int and a long
>>
>>
>> Maybe it's better to have a more precise condition here:
>>
>> 692     if (i != sig.length()) {
>>
>> =>
>>
>> 692     if (i < sig.length()) {
>>
>>
>> Can you maybe find more suggestive names for j and k (and maybe i) in gen_c2i_adapter_helper()?
>>
>> 766   for (int i = 0, ignored = 0, j = 0, k = 0; i < sig.length(); i++) {
>>
>> Maybe next_vt_arg_comp for j, next_arg_int for k and next_arg_comp for k? These are long variable names, I agree, but it would be helpful if the code indicated which variable represents the interpreter's and which the compiler's point of view.
>>
>> The same could be applied to gen_i2c_adapter()
>>
>> 1029   for (int i = 0, ignored = 0, k = 0; i < sig.length(); i++) {
>>
>>
>> Can we maybe change the method name to reflect that the number of arguments computed is from the point of view of the interpreter?
>>
>>   552 static int compute_total_args_passed_int(const GrowableArray<SigEntry>& sig) {
>>
>> Maybe it would be also good to change the argument name to sig_extended or sig_cc so that it's more obvious that this is the signature used by the calling convention (and not the declared signature).
>>
>> So in SharedRuntime::gen_i2c_adapter, you could change
>>
>> 1018   int total_args_passed = compute_total_args_passed(sig);
>>
>> to
>>
>> 1018   int total_args_passed_int = compute_total_args_passed_int(sig_extended);
>>
>>
>> = src/share/vm/opto/type.hpp
>>
>> 1536   const TypeTuple* const _domain_sig;     // Domain of inputs
>> 1537   const TypeTuple* const _domain_cc;  // Domain of inputs
>>
>> Could you please add a comment to explain what the difference between _domain_sig and _domain_cc is? I see that 'cc' refers to calling convention, but that was not obvious on the first sight. Or maybe you can use _domain_sig_extended instead. (Whatever you may pick, it would be great if it stayed consistent with the terms you use at other places, e.g., in the adapter generation).
>>
>> The comment in src/share/vm/opto/type.cpp describes very well what is happening, maybe you can reuse part of that text.
>>
>>
>> 1558   static const TypeFunc *make(const TypeTuple* domain, const TypeTuple* domain_cc, const TypeTuple* range);
>>
>> You could add a version of the make method that takes only one domain argument for the case domain_sig and domain_sig_extended are the same. (There are more than 30 places where the two are equal and only one place where they are not.)
>>
>>
>> 652   static const TypeTuple *make_domain(ciInstanceKlass* recv, ciSignature *sig, bool is_cc = false);
>>
>> Could you please give a more suggestive name to the 'is_cc'? You don't actually check that we have a calling convention but rather if the special calling convention for VT fields is enabled. Maybe vt_fields_as_args_enabled could work.
>>
>>
>> = src/share/vm/opto/type.cpp
>>
>> Could you please move the extra_value_fields() and collect_value_fields() functions to ciValueKlass? Both functions operate on a ciValueKlass instance and maybe we want to use them later in some other context. Maybe add a short comment summarizing what those functions do.
>>
>>
>> = src/share/vm/opto/callGenerator.cpp
>>
>>   375   // FIXME: late inlining of methods that take value type arguments is
>>   376   // broken: arguments at the call are set up so fields of value type
>>   377   // arguments are passed but code here expects a single argument per
>>   378   // value type (a ValueTypeNode) instead.
>>
>> How difficult do you think it will be to fix late inlining later on? It's fine to do the fix later -- I'm just wondering if there is anything here that would cause a large difficulty for us later on (most likely not, but better to ask).
>>
>>
>> = src/share/vm/runtime/globals.hpp
>>
>> 4183   experimental(bool, ValueTypePassFieldsAsArgs, true,                       \
>> 4184           "Pass each field as an argument at calls")                        \
>>
>> Can you please correct the indentation in the second line?
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>> Roland.