RFR: 8252848: Optimize small primitive arrayCopy operations through partial inlining using AVX-512 masked instructions [v2]

Bhateja, Jatin jatin.bhateja at intel.com
Wed Sep 16 13:08:56 UTC 2020


Hi Nils,
I have closed this pull request-144 and will re-open a new one for partial in-lining.

There is a code overlap with PR-61 because both these issues were related to one parent JBS (JDK-8251871).
Different pull requests PR61 and PR144 were created for each of the sub-tasks (JDK-8252847 and JDK-8252848).
For completeness of the independent patches there is some duplication of assembler routines.

But, I guess it will be difficult to integrate them post review since bot may encounter merge conflicts. 

Is there a way to get them review in parallel as independent patches without creating one unified patch?

Regards,
Jatin 


> -----Original Message-----
> From: hotspot-compiler-dev <hotspot-compiler-dev-retn at openjdk.java.net> On
> Behalf Of Nils Eliasson
> Sent: Tuesday, September 15, 2020 7:24 PM
> To: hotspot-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR: 8252848: Optimize small primitive arrayCopy operations
> through partial inlining using AVX-512 masked instructions [v2]
> 
> On Tue, 15 Sep 2020 10:26:04 GMT, Jatin Bhateja <jbhateja at openjdk.org>
> wrote:
> 
> >> Summary:
> >>
> >> 1) Partial in-lining technique avoids call overhead penalty for
> >> sub-word type small array copy operations with size less than 32
> >> bytes. 2) At runtime, a conditional check based on copy length either
> >> calls an array-copy stub or executes an optimized instruction
> >> sequence using AVX-512 masked instructions emitted at the call site.
> >> 3) New runtime flag
> >> ArrayCopyPartialInlineSize=0/32(default)/64 bytes determines the maximum
> size for partial in-lining.
> >> Performance Results:
> >>   System                               :  CascadeLake Server, Intel(R)
> Xeon(R) Platinum 8280L CPU @ 2.70GHz
> >>   Micros                                :
> test/micro/org/openjdk/bench/java/lang/ArrayCopy*.java
> >>   ArrayCopyPartialInlineSize : 32
> >>
> >> JMH | Block Size | Baseline (ns/op) | Partial Inling (ns/op) | Gain
> >> -- | -- | -- | -- | --
> >> ArrayCopyAligned.testByte | 1 | 5.417 | 2.696 | 2.009272997
> >> ArrayCopyAligned.testByte | 3 | 5.494 | 2.702 | 2.03330866
> >> ArrayCopyAligned.testByte | 5 | 5.417 | 2.637 | 2.05422829
> >> ArrayCopyAligned.testByte | 10 | 5.343 | 2.703 | 1.976692564
> >> ArrayCopyAligned.testByte | 20 | 5.837 | 2.636 | 2.214339909
> >> ArrayCopyAligned.testByte | 70 | 5.86 | 6 | 0.976666667
> >> ArrayCopyAligned.testByte | 150 | 6.766 | 6.906 | 0.979727773
> >> ArrayCopyAligned.testByte | 300 | 7.605 | 7.952 | 0.956363179
> >> ArrayCopyAligned.testByte | 600 | 11.989 | 12.007 | 0.998500874
> >> ArrayCopyAligned.testByte | 1200 | 16.447 | 16.585 | 0.991679228
> >> ArrayCopyAligned.testChar | 1 | 5.02 | 2.828 | 1.775106082
> >> ArrayCopyAligned.testChar | 3 | 5.129 | 2.762 | 1.85698769
> >> ArrayCopyAligned.testChar | 5 | 5.041 | 2.762 | 1.82512672
> >> ArrayCopyAligned.testChar | 10 | 5.716 | 2.762 | 2.069514844
> >> ArrayCopyAligned.testChar | 20 | 5.111 | 5.399 | 0.946656788
> >> ArrayCopyAligned.testChar | 70 | 6.271 | 6.242 | 1.004645947
> >> ArrayCopyAligned.testChar | 150 | 7.45 | 7.599 | 0.980392157
> >> ArrayCopyAligned.testChar | 300 | 9.904 | 10.112 | 0.97943038
> >> ArrayCopyAligned.testChar | 600 | 17.131 | 17.167 | 0.997902953
> >> ArrayCopyAligned.testChar | 1200 | 29.556 | 29.851 | 0.990117584
> >> ArrayCopyUnalignedBoth.testByte | 1 | 5.419 | 2.702 | 2.005551443
> >> ArrayCopyUnalignedBoth.testByte | 3 | 5.558 | 2.636 | 2.108497724
> >> ArrayCopyUnalignedBoth.testByte | 5 | 5.43 | 2.636 | 2.059939302
> >> ArrayCopyUnalignedBoth.testByte | 10 | 5.378 | 2.637 | 2.039438756
> >> ArrayCopyUnalignedBoth.testByte | 20 | 5.914 | 2.636 | 2.243550835
> >> ArrayCopyUnalignedBoth.testByte | 70 | 5.882 | 5.954 | 0.987907289
> >> ArrayCopyUnalignedBoth.testByte | 150 | 6.784 | 6.88 | 0.986046512
> >> ArrayCopyUnalignedBoth.testByte | 300 | 7.635 | 7.968 | 0.958207831
> >> ArrayCopyUnalignedBoth.testByte | 600 | 12.226 | 12.129 | 1.007997362
> >> ArrayCopyUnalignedBoth.testByte | 1200 | 16.992 | 20.717 |
> >> 0.820195974 ArrayCopyUnalignedBoth.testChar | 1 | 5.019 | 2.828 |
> >> 1.774752475 ArrayCopyUnalignedBoth.testChar | 3 | 5.163 | 2.763 |
> >> 1.868621064 ArrayCopyUnalignedBoth.testChar | 5 | 5.042 | 2.827 |
> >> 1.783516095 ArrayCopyUnalignedBoth.testChar | 10 | 5.718 | 2.828 |
> >> 2.021923621 ArrayCopyUnalignedBoth.testChar | 20 | 5.111 | 5.404 |
> >> 0.945780903 ArrayCopyUnalignedBoth.testChar | 70 | 6.367 | 6.235 |
> >> 1.02117081 ArrayCopyUnalignedBoth.testChar | 150 | 7.367 | 8.269 |
> >> 0.890917886 ArrayCopyUnalignedBoth.testChar | 300 | 10.358 | 10.642 |
> >> 0.973313287 ArrayCopyUnalignedBoth.testChar | 600 | 20.84 | 17.522 |
> >> 1.189361945 ArrayCopyUnalignedBoth.testChar | 1200 | 31.895 | 31.892
> >> | 1.000094067 ArrayCopyUnalignedDst.testByte | 1 | 5.455 | 2.637 |
> >> 2.068638604 ArrayCopyUnalignedDst.testByte | 3 | 5.562 | 2.702 |
> >> 2.058475204 ArrayCopyUnalignedDst.testByte | 5 | 5.427 | 2.702 |
> >> 2.008512213 ArrayCopyUnalignedDst.testByte | 10 | 5.367 | 2.696 |
> >> 1.990727003 ArrayCopyUnalignedDst.testByte | 20 | 5.839 | 2.637 |
> >> 2.214258627 ArrayCopyUnalignedDst.testByte | 70 | 5.888 | 5.968 |
> >> 0.986595174 ArrayCopyUnalignedDst.testByte | 150 | 6.785 | 6.773 |
> >> 1.001771741 ArrayCopyUnalignedDst.testByte | 300 | 7.606 | 7.972 |
> >> 0.954089313 ArrayCopyUnalignedDst.testByte | 600 | 11.986 | 21.195 |
> >> 0.565510734 ArrayCopyUnalignedDst.testByte | 1200 | 16.54 | 16.784 |
> >> 0.985462345 ArrayCopyUnalignedDst.testChar | 1 | 5.02 | 2.827 |
> >> 1.775733994 ArrayCopyUnalignedDst.testChar | 3 | 5.131 | 2.762 |
> >> 1.857711803 ArrayCopyUnalignedDst.testChar | 5 | 5.038 | 2.762 |
> >> 1.82404055 ArrayCopyUnalignedDst.testChar | 10 | 5.718 | 2.762 |
> >> 2.070238957 ArrayCopyUnalignedDst.testChar | 20 | 5.113 | 5.401 |
> >> 0.946676541 ArrayCopyUnalignedDst.testChar | 70 | 6.222 | 6.214 |
> >> 1.001287416 ArrayCopyUnalignedDst.testChar | 150 | 7.367 | 8.125 |
> >> 0.906707692 ArrayCopyUnalignedDst.testChar | 300 | 10.204 | 10.082 |
> >> 1.012100774 ArrayCopyUnalignedDst.testChar | 600 | 16.978 | 17.135 |
> >> 0.990837467 ArrayCopyUnalignedDst.testChar | 1200 | 32.351 | 31.996 |
> >> 1.011095137 ArrayCopyUnalignedSrc.testByte | 1 | 5.414 | 2.696 |
> >> 2.008160237 ArrayCopyUnalignedSrc.testByte | 3 | 5.494 | 2.637 |
> >> 2.083428138 ArrayCopyUnalignedSrc.testByte | 5 | 5.431 | 2.637 |
> >> 2.059537353 ArrayCopyUnalignedSrc.testByte | 10 | 5.344 | 2.703 |
> >> 1.977062523 ArrayCopyUnalignedSrc.testByte | 20 | 5.834 | 2.696 |
> >> 2.163946588 ArrayCopyUnalignedSrc.testByte | 70 | 5.883 | 6.009 |
> >> 0.979031453 ArrayCopyUnalignedSrc.testByte | 150 | 6.729 | 6.87 |
> >> 0.979475983 ArrayCopyUnalignedSrc.testByte | 300 | 7.603 | 7.97 |
> >> 0.953952321 ArrayCopyUnalignedSrc.testByte | 600 | 12.004 | 12.16 |
> >> 0.987171053 ArrayCopyUnalignedSrc.testByte | 1200 | 16.534 | 16.643 |
> >> 0.9934507 ArrayCopyUnalignedSrc.testChar | 1 | 5.021 | 2.762 |
> >> 1.81788559 ArrayCopyUnalignedSrc.testChar | 3 | 5.13 | 2.762 |
> >> 1.857349747 ArrayCopyUnalignedSrc.testChar | 5 | 5.042 | 2.827 |
> >> 1.783516095 ArrayCopyUnalignedSrc.testChar | 10 | 5.726 | 2.761 |
> >> 2.073886273 ArrayCopyUnalignedSrc.testChar | 20 | 5.112 | 5.401 |
> >> 0.94649139 ArrayCopyUnalignedSrc.testChar | 70 | 6.113 | 6.227 |
> >> 0.981692629 ArrayCopyUnalignedSrc.testChar | 150 | 7.493 | 7.888 |
> >> 0.949923935 ArrayCopyUnalignedSrc.testChar | 300 | 10.234 | 10.501 |
> >> 0.97457385 ArrayCopyUnalignedSrc.testChar | 600 | 17.175 | 17.142 |
> >> 1.001925096 ArrayCopyUnalignedSrc.testChar | 1200 | 31.926 | 31.987 |
> >> 0.998092975
> >>
> >> Detailed Reports:
> >> Baseline   :
> >>
> [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt]
> (http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_Baseline.txt)
> >> WithOpt   :
> >> [http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/JMH_With_PI
> >> _Opts.txt](http://cr.openjdk.java.net/~jbhateja/8252848/JMH_results/J
> >> MH_With_PI_Opts.txt)
> >
> > Jatin Bhateja has updated the pull request incrementally with one
> additional commit since the last revision:
> >
> >   Update arraycopynode.cpp
> >
> >   Missed safety check.
> 
> This PR includes the changes for JDK-8252847. It makes it hard to review.
> 
> -------------
> 
> PR: https://git.openjdk.java.net/jdk/pull/144


More information about the hotspot-compiler-dev mailing list