RFR(M): 8189113: AARCH64: StringLatin1 inflate intrinsic doesn't use prefetch instruction

Dmitrij Pochepko dmitrij.pochepko at bell-sw.com
Tue May 15 18:07:34 UTC 2018


Thank you for review


On 15.05.2018 20:56, Andrew Haley wrote:
> Again, before, running in L1:
>
> Benchmark                (ALL)  (size)  Mode  Cnt    Score   Error  Units
> StrInflateBench.inflate  32768       8  avgt   10   53.875 ± 0.088  ns/op
> StrInflateBench.inflate  32768      32  avgt   10   58.149 ± 0.735  ns/op
> StrInflateBench.inflate  32768     256  avgt   10  125.529 ± 0.353  ns/op
>
> After:
>
> Benchmark                (ALL)  (size)  Mode  Cnt    Score   Error  Units
> StrInflateBench.inflate  32768       8  avgt   10   50.541 ± 0.029  ns/op
> StrInflateBench.inflate  32768      32  avgt   10   55.591 ± 0.393  ns/op
> StrInflateBench.inflate  32768     256  avgt   10  108.823 ± 1.742  ns/op
>
> Before, missing L1:
>
> Benchmark                  (ALL)  (size)  Mode  Cnt    Score   Error  Units
> StrInflateBench.inflate  1000000       8  avgt   10   57.685 ± 0.225  ns/op
> StrInflateBench.inflate  1000000      32  avgt   10   90.418 ± 0.172  ns/op
> StrInflateBench.inflate  1000000     256  avgt   10  293.611 ± 1.314  ns/op
>
> After:
>
> Benchmark                  (ALL)  (size)  Mode  Cnt    Score   Error  Units
> StrInflateBench.inflate  1000000       8  avgt   10   54.611 ± 0.122  ns/op
> StrInflateBench.inflate  1000000      32  avgt   10  103.166 ± 0.757  ns/op
> StrInflateBench.inflate  1000000     256  avgt   10  237.011 ± 0.703  ns/op
>
> I don't like one thing: the very high overhead.  The fact that the timing
> is never less than 50ns, even when running inside l1, is not pleasing.
> None of this is your fault: it seems to be all of the messing about
> which happens before the intrinsic gets called.
>
> This is OK.
>



More information about the hotspot-compiler-dev mailing list