[vector] RFR 8221816: IndexOutOfBoundsException for fromArray/intoArray with unset mask lanes - was: RE: IndexOutOfBoundsException with unset mask lanes

Kharbas, Kishor kishor.kharbas at intel.com
Thu May 30 20:34:33 UTC 2019


Hi Joshua,

Thanks for the patch. Here is the performance of the jmh benchmark on Intel CLX.
Results with slow path are as expected - having an uncommon trap over a regular check does give performance benefit.
I am not sure why the performance gain in fast path (for 1000 case). The same is observed in ARM's case. Do you know why?

Results on Intel CLX:
                                                                       Base                 without expectTrue (patch [1])      UncommonTrap (patch [2])
  1000 fastPath                                          416.251 ± 3.086           538.024 ±  3.415                   538.871 ± 2.779
 10000 fastPath                                           29.426 ± 3.625            30.389 ±  1.389                    30.673 ± 0.354
100000 fastPath                                            2.018 ± 0.060             2.013 ±  0.005                        2.045 ± 0.063                            
  1000 fastPath + 1 slowPath                            N/A                    70.835 ± 10.902                     72.236 ± 0.797
 10000 fastPath + 1 slowPath                           N/A                      6.512 ±  0.466                      19.276 ± 1.769
100000 fastPath + 1 slowPath                          N/A                      0.756 ±  0.016                        1.955 ± 0.017

Thanks,
Kishor

> -----Original Message-----
> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
> Sent: Thursday, May 30, 2019 8:43 AM
> To: Joshua Zhu (Arm Technology China) <Joshua.Zhu at arm.com>; panama-
> dev at openjdk.java.net
> Cc: nd <nd at arm.com>; Kharbas, Kishor <kishor.kharbas at intel.com>
> Subject: Re: [vector] RFR 8221816: IndexOutOfBoundsException for
> fromArray/intoArray with unset mask lanes - was: RE:
> IndexOutOfBoundsException with unset mask lanes
> 
> Nice work, Joshua! I like how it shapes out.
> 
> Good to see some performance numbers as well.
> 
> Kishor did some experiments with masked memory operations, so I'd like to
> hear his thoughts on the patch.
> 
> Overall, I'm happy with the change as it is now. It solves the correctness issue
> without sacrificing performance in most of the cases.
> 
> The only important case which suffers is "masked loops" - when masked
> kernel is used to avoid explicit loop tail processing. Such loops will end with a
> deopt most of the time (unless loop tail is empty). So, as a next step, I'd like
> to see that case to be addressed, so such loops don't trigger uncommon trap.
> 
> In a longer term, it's worth looking into specific optimizations range checks
> for masked accesses.
> 
> Some comments about the code changes:
> 
> src/hotspot/share/opto/opaquenode.hpp:
> 
> You have some changes in ProfileBooleanNode. Why providing 0 as a
> false_count is not enough to get uncommon trap on corresponding branch?
> 
> 
> src/hotspot/share/opto/doCall.cpp:
> 
>   bool Compile::should_delay_vector_inlining(ciMethod* call_method,
> JVMState* jvms) {
> -  return UseVectorApiIntrinsics && call_method->is_vector_method();
> +  return UseVectorApiIntrinsics && call_method->is_vector_method() &&
> +         call_method->intrinsic_id() != vmIntrinsics::_ExpectTrue;
> 
> Why do you need to delay inlining for it?
> 
> 
> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-
> VectorBits.java.template:
> 
> -        if (ax >= 0 && ax + LENGTH <= a.length) {
> +        if (expectTrue(ax >= 0 && ax + LENGTH <= a.length)) {
> 
> Why don't you use VectorIntrinsics.checkIndex() here?
> 
> Best regards,
> Vladimir Ivanov
> 
> On 10/05/2019 09:41, Joshua Zhu (Arm Technology China) wrote:
> > Hi Vladimir,
> >
> >> It looks promising to introduce a variant of
> >> VectorIntrinsics.checkIndex() which is used to guard a fast path and is
> >> annotated with a JIT-compiler hint (akin to
> >> java.lang.invoke.MethodHandleImpl.profileBoolean() [1], but without
> >> profiling logic) to override bytecode profiling info, so JIT always puts an
> >> uncommon trap on the false branch.
> >
> > Thanks for your comments.
> > As you suggested, I introduced VectorIntrinsics.expectTrue() in change [2].
> > It's used as below:
> >      if (expectTrue(bool condition)) {
> >          // fast path
> >      } else {
> >          // slow path: uncommon trap
> >      }
> >
> > I also wrote a jmh case [3] to check the performance.
> > See below table for jmh test results. (In Throughput Mode, Unit: ops/ms)
> >                                      Base       without expectTrue (patch [1])  UncommonTrap
> (patch [2])
> >    1000 fastPath              318.228 ± 22.588       457.967 ± 12.622           457.328 ±
> 11.932
> >   10000 fastPath               21.991 ±  2.496        23.360 ±  0.070            24.744 ±
> 0.213
> > 100000 fastPath                1.613 ±  0.007         1.581 ±  0.031             1.631 ±  0.003
> >    1000 fastPath + 1 slowPath        N/A              57.298 ± 11.033            55.845 ±
> 0.716
> >   10000 fastPath + 1 slowPath        N/A               4.537 ±  0.536            15.164 ±
> 0.098
> > 100000 fastPath + 1 slowPath        N/A               0.577 ±  0.048             1.564 ±
> 0.005
> >
> > [1] http://cr.openjdk.java.net/~jzhu/vectorapi/8221816.OOB/webrev.01/
> > [2]
> http://cr.openjdk.java.net/~jzhu/vectorapi/8221816.OOB/uncommontrap.w
> ebrev.00/
> > [3]
> http://cr.openjdk.java.net/~jzhu/vectorapi/8221816.OOB/IntVectorJmhTest
> .java
> >
> > Please help review and feel free to share your comments.
> > Thanks.
> >
> > Best Regards,
> > Joshua
> >


More information about the panama-dev mailing list