MVT-based vectors: first stab
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Wed Jun 28 16:17:50 UTC 2017
> Nice work.
>
> I notice for the factory you are chaining a sequence of findWithers (vwithfield), one for each field. I am guessing if you construct via the VCC and then unbox that might cause issues with temporary boxing?
Yes, I deliberately tried to avoid any unnecessary boxing to make
JIT-compiler life easier. The main question I had when starting the
experiment was is it possible to completely avoid/eliminate
boxing/buffering in generated code. Because absence of boxing/buffering
operations in generated code is crucial for vectors.
> I think it would be useful to add that functionality to ValueType (with the caveat as with findWither that the VCC constructor is bypassed, which works fine for “vanilla” VCCs that just put fields and have no constraints). If necessary we could spin specific byte code using the builder to reduce the LF and invocation ceremony.
Yes, it looks like a good candidate for inclusion. Method handle chain
is quite complex, so it'll definitely benefit from specialized bytecode
(vdefault; vwithfield DVT.f1; ...; vwithfield DVT.fn; vreturn;)
Best regards,
Vladimir Ivanov
>
> Paul.
>
>
>
>> On 28 Jun 2017, at 06:17, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>> Hi,
>>
>> I started playing with MVT-based vectors recently and want to share my experience so far.
>>
>> For the first step, I decided to avoid specific treatment of super-longs (Long2/4/8) in the JVM we have in Panama and just mark them as VCCs for now:
>> @DeriveValueType
>> final class Long2 { public final long lo, hi; }
>>
>> As a test case I chose sum of array elements:
>>
>> long sum(QLong2[] va) {
>> QLong2 v = QLong2.default; // (0,0)
>> for (int i = 0; i < va.length; i++) {
>> QLong2 vi = va[i];
>> v = QLong2(v.lo + vi.lo, v.hi + vi.hi);
>> }
>> return v.lo + v.hi;
>> }
>>
>> Here are the changes:
>> http://cr.openjdk.java.net/~vlivanov/valhalla/vectors.mvt/webrev.00/
>>
>> SUMMARY & RESULTS
>>
>> I hit some bugs along the way and had to extend the code for Q-types in some places to get decent code shape generated. (I'll shepherd the fixes in the repo separately.) I ended up with the following generated code:
>>
>> http://cr.openjdk.java.net/~vlivanov/valhalla/vectors.mvt/sumArrayL2.log
>>
>> Unfortunately, there are still 2 allocations left in the generated code (out of 5 initially). C2 can't scalarize the accumulator (v) inside the loop and has to put updated components (lo & hi) in allocated heap buffer on every iteration over the array (generated code [1], C2 EA output [2]).
>>
>>
>> GORY DETAILS
>>
>> hotspot/src/share/vm/opto/type.cpp
>>
>> * Type::get_typeflow_type() doesn't distinguish between TypeValueTypePtr & TypeValueType
>>
>>
>> jdk/src/java.base/share/classes/java/lang/invoke/LambdaFormBuilder.java
>>
>> * introduced some LF intrinsics, e.g. LOOP and SELECT_ALTERNATIVE;
>>
>> * had to workaround control flow-related bugs in the bytecode library (jdk.experimental.bytecode): broken JVM state after goto_() (see FIXME marks in the code);
>>
>> * fixed 2-slot types support: wrong local indexes were used
>>
>>
>> jdk/src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java
>>
>> * MHI.unboxResultHandle misses unboxing step for Q-types
>>
>>
>> jdk/src/java.base/share/classes/java/lang/invoke/MethodHandles.java
>> jdk/src/java.base/share/classes/jdk/experimental/value/ValueType.java
>>
>> * had to add a specialization for arraylength on Q-typed array, because C2 can't handle arraylength Object[] when Q[] instance is on stack;
>>
>> jdk/src/java.base/share/classes/jdk/experimental/bytecode/TypedCodeBuilder.java:
>>
>> * changes required to workaround goto_() bug
>>
>> * fixed CP patching support
>>
>>
>> jdk/src/java.base/share/classes/jdk/experimental/value/MethodHandleBuilder.java
>>
>> * fixed CP patching support
>>
>>
>> hotspot/src/share/vm/memory/metachunk.hpp
>> hotspot/src/share/vm/oops/method.hpp
>> Fix optimized JVM build.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] http://cr.openjdk.java.net/~vlivanov/valhalla/vectors.mvt/sumArrayL2.log
>>
>> ;; B7: # B8 <- B9 top-of-loop
>>
>> mov %r9,0x10(%rsp)
>> mov 0x8(%rsp),%r11d
>>
>> ;; B8: # B16 B9 <- B6 B7 Loop: B8-B7 inner
>>
>> mov %r11d,0x8(%rsp)
>> movabs $0x7c0060420,%rsi ; {metadata('valhalla/vector/Long2$Value')}
>> ...
>> callq 0x0000000117579820 ; {runtime_call _new_instance_Java}
>>
>> ;; B9: # B7 B10 <- B8
>>
>> mov %rax,%r9
>> mov 0x10(%rsp),%r11
>> mov 0x10(%r11),%r10
>> mov 0x18(%r11),%r11
>> movslq %ebp,%r8
>> shl $0x4,%r8
>> mov (%rsp),%rcx
>> mov 0x18(%rcx,%r8,1),%rax
>> mov 0x10(%rcx,%r8,1),%r8 ;*vaload
>> add %r11,%rax ;*ladd
>> mov %rax,0x18(%r9)
>> add %r10,%r8 ;*ladd
>> mov %r8,0x10(%r9)
>> inc %ebp ;*iadd
>> cmp 0x8(%rsp),%ebp
>> jl <B7> ;*if_icmpge
>>
>> [2] Pruned EA output (full output in the log)
>> ======== Connection graph for valhalla.vector.VectorTest::sumArrayL2
>> JavaObject NoEscape(NoEscape) NSR [ 1267F 1265F 756F 754F 1581F 1579F [ 1259 1264 532 1525 ]] 1247 Allocate
>> LocalVar [ 1247P [ 1264 ]] 1259 Proj ... #5
>> LocalVar [ 1259 1247P [ 1267b 1265b 532 1525 ]] 1264 CheckCastPP ... #valuetype* valhalla/vector/Long2$Value:NotNull
>> LocalVar [ 324 1264 1247P 307P [ 756b 754b ]] 532 Phi ... #valuetype* valhalla/vector/Long2$Value:NotNull
>> LocalVar [ 1264 324 1247P 307P [ 1581b 1579b ]] 1525 Phi ... #valuetype* valhalla/vector/Long2$Value:NotNull
>>
>> JavaObject NoEscape(NoEscape) [ 1154F [ 1148 1153 ]] 1136 Allocate
>> LocalVar [ 1136P [ 1153 ]] 1148 Proj ... #5
>> LocalVar [ 1148 1136P [ 1154b ]] 1153 CheckCastPP ... #valuetype* valhalla/vector/Long2$Value:NotNull
>>
>> JavaObject NoEscape(NoEscape) [ [ 1053 ]] 1041 Allocate
>> LocalVar [ 1041P [ ]] 1053 Proj ... #5
>>
>> JavaObject NoEscape(NoEscape) [ 653F 651F [ 645 650 ]] 633 Allocate
>> LocalVar [ 633P [ 650 ]] 645 Proj ... #5
>> LocalVar [ 645 633P [ 653b 651b ]] 650 CheckCastPP ... #valuetype* valhalla/vector/Long2$Value:NotNull
>>
>> JavaObject NoEscape(NoEscape) NSR [ 756F 754F 1581F 1579F [ 319 324 532 1525 ]] 307 Allocate
>> LocalVar [ 307P [ 324 ]] 319 Proj ... #5
>> LocalVar [ 319 307P [ 532 1525 ]] 324 CheckCastPP ... #valuetype* valhalla/vector/Long2$Value:NotNull
>> LocalVar [ 324 1264 1247P 307P [ 756b 754b ]] 532 Phi ... #valuetype* valhalla/vector/Long2$Value:NotNull
>> LocalVar [ 1264 324 1247P 307P [ 1581b 1579b ]] 1525 Phi ... #valuetype* valhalla/vector/Long2$Value:NotNull
>>
>> Scalar 1153 CheckCastPP ... #valuetype* valhalla/vector/Long2$Value:NotNull
>> ++++ Eliminated: 1136 Allocate
>> Scalar 1041 Allocate ...
>> ++++ Eliminated: 1041 Allocate
>> Scalar 650 CheckCastPP ... #valuetype* valhalla/vector/Long2$Value:NotNull
>> ++++ Eliminated: 633 Allocate
>
More information about the valhalla-dev
mailing list