Re: 回复:Re:[Vector]fromArray/allTrue performance
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Tue Nov 27 01:36:32 UTC 2018
> > Does the following patch fix the crash for you?
> This patch does fix the crash. Please go ahead to push it.
Ok, pushed [1].
Best regards,
Vladimir Ivanov
[1] http://hg.openjdk.java.net/panama/dev/rev/a511da8eef00
> > Proper way to fix the issue is to introduce missing support on C2 side.
>> Feel free to take care of it. (If anybody is already on it, please,
> speak up!)
> I will work on the missing support on C2 side to fix perf issue of
> Long128Vector.fromArray(), if nobody is on it.
> ------------------------------------------------------------------
> 发件人:Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> 发送时间:2018年11月22日(星期四) 02:47
> 收件人:王卓(卓仁) <zhuoren.wz at alibaba-inc.com>; panama-dev
> <panama-dev at openjdk.java.net>
> 抄 送:周经森(英通) <kingsum.kc at alibaba-inc.com>; 李三红(三红)
> <sanhong.lsh at alibaba-inc.com>
> 主 题:Re: Re:[Vector]fromArray/allTrue performance
>
> Zhuoren,
>
> > Basically, t is caused by type of a PhiNode is used but not set.
> > To reproduce the crash, we must modify JDK code. The change is pure java
> > but can cause crash in C2 compiler thread. Sorry that I cannot reproduce
> > this crash without changing JDK, because getBits is not opposed.
>
> Thanks a lot for the test case and detailed instructions!
>
> Does the following patch fix the crash for you?
>
> diff --git a/src/hotspot/share/opto/library_call.cpp
> b/src/hotspot/share/opto/library_call.cpp
> --- a/src/hotspot/share/opto/library_call.cpp
> +++ b/src/hotspot/share/opto/library_call.cpp
> @@ -8038,7 +8038,7 @@
> default: fatal("%s", type2name(elem_bt)); break;
> }
>
> - Node* operation = VectorInsertNode::make(opd, insert_val,
> idx->get_con());
> + Node* operation = _gvn.transform(VectorInsertNode::make(opd,
> insert_val, idx->get_con()));
> operation = box_vector(operation, vbox_type, elem_bt, num_elem);
> set_vector_result(operation);
>
> I'll push it on your behalf if it fixes the bug.
>
> I'm curious why unit tests don't catch it.
>
> > *Another workaround*for for Long128VectorfromArray using masks
> > This patch can fix the performance issue. With this patch,
> > VectorMatrixUpdateMaskLong.java is 10 - 15 times faster. Of cause
> > /mask0 = longSpecies.maskFromArray(AisNull0, i); /must be moved out of
> > main loop to get rid of maskFromArray performance issue.
> > Please give advice on this workaround.
>
> Proper way to fix the issue is to introduce missing support on C2 side.
> Feel free to take care of it. (If anybody is already on it, please,
> speak up!)
>
> Meanwhile you can extract Long128Vector.fromArray() into a helper method
>
> and use it instead.
>
> Best regards,
> Vladimir Ivanov
>
>
> >
> > diff -r a701d05cc2eb src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Long128Vector.java
> > --- a/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Long128Vector.java Thu Nov 15 17:18:03 2018 -0800
> > +++ b/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Long128Vector.java Tue Nov 20 16:45:18 2018 +0800
> > @@ -42,6 +42,10 @@
> >
> > static final Long128Vector ZERO = new Long128Vector();
> >
> > + static final Long128Vector ZEROONE = allocZeroOne();
> > +
> > + static final Long128Vector ONEZERO = allocOneZero();
> > +
> > static final int LENGTH = SPECIES.length();
> >
> > private final long[] vec; // Don't access directly, use getElements() instead.
> > @@ -58,6 +62,20 @@
> > vec = v;
> > }
> >
> > + static private Long128Vector allocZeroOne() {
> > + long[] zeroOneArray = new long[2];
> > + zeroOneArray[0] = 0xffffffffffffffffl;
> > + zeroOneArray[1] = 0;
> > + return new Long128Vector(zeroOneArray);
> > + }
> > +
> > + static private Long128Vector allocOneZero() {
> > + long[] oneZeroArray = new long[2];
> > + oneZeroArray[0] = 0;
> > + oneZeroArray[1] = 0xffffffffffffffffl;
> > + return new Long128Vector(oneZeroArray);
> > + }
> > +
> > @Override
> > public int length() { return LENGTH; }
> >
> > @@ -1323,7 +1341,20 @@
> > @Override
> > @ForceInline
> > public Long128Vector fromArray(long[] a, int ax, Mask<Long, Shapes.S128Bit> m) {
> > - return zero().blend(fromArray(a, ax), m);
> > + boolean[] bits = ((Long128Mask)m).getBits();
> > + if (bits[0] == false) {
> > + if (bits[1] == false) { // mask is 0 0
> > + return SPECIES.zero();
> > + } else { // mask is 0 1
> > + return SPECIES.fromArray(a, ax).and(ONEZERO);
> > + }
> > + } else {
> > + if (bits[1] == true) { // mask is 1 1
> > + return SPECIES.fromArray(a, ax);
> > + } else { // mask is 0 1
> > + return SPECIES.fromArray(a, ax).and(ZEROONE);
> > + }
> > + }
> > }
> >
> > @Override
> >
> > Regards,
> > Zhuoren
> >
> > ------------------------------------------------------------------
> > 发件人:Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> > 发送时间:2018年11月20日(星期二) 09:39
> > 收件人:王卓(卓仁) <zhuoren.wz at alibaba-inc.com>; panama-dev
> > <panama-dev at openjdk.java.net>
> > 主 题:Re: [Vector]fromArray/allTrue performance
> >
> > Zhuo, thanks for the feedback!
> >
> > Unfortunately, it seems the attachment was stripped by mail server.
> > Do you mind resending it inline?
> >
> > I'll let Intel folks comment on individual cases you refer to, but
> > overall everything marked as "Implementation limitation" is a
> > work-in-progress and will be addressed later at some point.
> >
> > Best regards,
> > Vladimir Ivanov
> >
> > On 19/11/2018 02:41, 王卓(卓仁) wrote:
> > > Hello,
> > > I am Zhuoren from Alibaba JVM team. Glad to take part in project panama.
> > > We have integrated Vector API to Alibaba JDK, and another Alibaba team is now using Vector API to optimize their applications.
> > > Here are some issues we found in our previous work and blocked future optimization, and I am searching for solutions.
> > >
> > > 1. The performance of Long128Species::fromArray(long[] a, int ax, Mask<Long, Shapes.S128Bit> m) is bad. The attached java file is a test for this API.
> > > I checked the performance issue is due to intrinsic failure.
> > > x86.ad:
> > > case Op_VectorLoadMask:
> > > if (UseSSE <= 3) { ret_value = false; }
> > > else if (vlen == 1 || vlen == 2) { ret_value = false; } // Implementation limitation
> > > else if (size_in_bits >= 256 && UseAVX < 2) { ret_value = false; } // Implementation limitation
> > > break;
> > >
> > > I wonder if there will be a fix for this issue, because this API is very important to our optimizations. The fromArray.patch is the workaround we are using. How to improve this workaround is also welcome.
> > >
> > > 2. anyTrue/allTrue on 512 bit
> > > This is another intrinsic failure:
> > > case Op_VectorTest:
> > > if (UseAVX <= 0) { ret_value = false; }
> > > else if (size_in_bits != 128 && size_in_bits != 256) { ret_value = false; } // Implementation limitation
> > > break;
> > > This also blocked some of our 512 bit optimizations, but I have no workaround for this now.
> > >
> > > Please share with me your advice, thanks!
> > >
> > > Regards,
> > > Zhuo
> > >
> >
>
More information about the panama-dev
mailing list