VectorBox enabling

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Mon Nov 20 17:15:37 UTC 2017


Razvan,

IMO it illustrates an important use case which should be supported (but 
isn't right now): rematerialization of scalarized vectors into boxed 
form on deoptimization.

JVM heavily relies on safepoints in generated code, so it's infeasible 
to forbid them in vector code. And calls are not the only source of 
safepoints (though it's still better to spill vectors on stack than to 
allocate boxes on heap): non-counted loops contain a safepoint check on 
the backward branch.

Actually, existing rematerialization support in C2 is very efficient and 
don't affect generated code: all metadata (locations of scalarized 
object components) are stored aside and deopt handler uses it to 
instantiate a box of proper type and initialize it with up-to-date 
values extracted from the context.

So, I wouldn't bother avoiding such cases, but leave it as is for a 
forthcoming enhancement.

Best regards,
Vladimir Ivanov

On 11/18/17 1:52 AM, Lupusoru, Razvan A wrote:
> Hey Vladimir,
> 
> I have noticed that inlining when graph is being created is not equivalent to late inlining. In the following example, I see different behavior:
>      static <S extends Vector.Shape<Vector<?, ?>>> void VecSaxpy(FloatVector.FloatSpecies<S> fspec, float[] a, int a_offset,
>                                                                  float[] b, int b_offset, float alpha) {
>          FloatVector<S> alphaVec = fspec.broadcast(alpha);
>          for (int i = 0; (i + a_offset + fspec.length()) < a.length && (i + b_offset + fspec.length()) < b.length; i += fspec.length()) {
>              FloatVector<S> bv = fspec.fromArray(b, i + b_offset);
>              FloatVector<S> av = fspec.fromArray(a, i + a_offset);
>              bv.add(av.mul(alphaVec)).intoArray(b, i + b_offset);
>          }
>      }
> 
> When late inlining is disabled for Vector API, using -XX:+DebugVectorApi flag, I see following message:
> === NOT eliminating VectorBox due to uses ===
>   196    VectorBox       ===  188 _  193 _ _ ( 49  27  91  95  195 ) [[ 197  199  200 ]]  jdk/incubator/vector/Float256Vector:NotNull:exact * ( java/lang/Object:NotNull *, vectory[8]:{float}, java/lang/Object:NotNull *, memory, memory ) !jvms: BLAS::VecSaxpy @ bci:3
>   231    CallStaticJava  ===  1029  85  197  8  9 ( 230  217  11  12  13  14  1  200  29  1  1  12  217 ) [[ 232 ]] # Static uncommon_trap(reason='speculate_class_check' action='maybe_recompile' debug_id='0')  void ( int ) C=0.000100 entry=0x00007fc30f97de20 BLAS::VecSaxpy @ bci:16 !jvms: BLAS::VecSaxpy @ bci:16
> -------------------------------
> 
> Basically, it fails to eliminate VectorBox (and corresponding allocations) due to use in an uncommon_trap.
> 
> Interestingly enough, when late inlining is enabled, the VectorBox is eliminated as desired because no equivalent uncommon_trap is inserted. I am unsure of the reason for this mismatch in behavior. The difference in performance ends up being about 40x on test case above when tested on AVX2.
> 
> Let me know if you have any recommendations or understanding of the issue noted above.
> 
> Thanks,
> Razvan
> 
> -----Original Message-----
> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
> Sent: Friday, November 17, 2017 8:20 AM
> To: Lupusoru, Razvan A <razvan.a.lupusoru at intel.com>; 'panama-dev at openjdk.java.net' <panama-dev at openjdk.java.net>
> Subject: Re: VectorBox enabling
> 
> Another thing I was curious about is delayed inlining of vector ops:
> 
> +      } else if (should_delay_vector_inlining(callee, jvms)) {
> +        assert(!delayed_forbidden, "delay should be allowed");
> +        return CallGenerator::for_late_inline(callee, cg);
> 
> +bool Compile::should_delay_vector_inlining(ciMethod* call_method,
> JVMState* jvms) {
> +  return call_method->is_vector_method(); }
> 
> Can you elaborate, please, when does it help?
> 
> Best regards,
> Vladimir Ivanov
> 
> On 11/15/17 4:37 AM, Lupusoru, Razvan A wrote:
>> Hi everyone,
>>
>> VectorBox enabling is now mostly complete and appearing to be functional. The VectorBox supports being able to generate objects for all supported Vector objects that have some intrinsic method. This includes GenericMask (subject to some limitation noted below). Additionally, VectorBox nodes can be removed along with their allocations in cases when the objects do not need created. I have tested BLAS (saxpy, sdot) and Sepia demo used in JavaOne and performance has not regressed.
>>
>> Please see attached patch and if there are no concerns, I will merge tomorrow.
>> http://cr.openjdk.java.net/~rlupusoru/panama/webrev_vectorbox_04/
>>
>> Note that in patch you will find some "FIXME" related to masks (namely mask shape and type recovery is not possible at times during intrinsification). After this patch, I will look into solving this problem by potentially having specialized masks for each type and shape combination (as is done for species).
>>
>> The main limitations remaining with VectorBox are as follows:
>>
>> -          If VectorBox is used by any non-intrinsified calls, stores to heap, or runtime calls via deopt, it will generate an object at the original call site. The plan is to move this to slow path when Vector API object identities can be ignored.
>>
>> -          VectorBox for GenericMask does not set the Species field. This will either be fixed in a follow-up patch or the approach for specialized masks will be employed instead.
>>
>> Thanks so much!
>>
>> --Razvan
>>


More information about the panama-dev mailing list