RFR(M): 8189113: AARCH64: StringLatin1 inflate intrinsic doesn't use prefetch instruction
Dmitrij Pochepko
dmitrij.pochepko at bell-sw.com
Tue May 15 18:07:34 UTC 2018
Thank you for review
On 15.05.2018 20:56, Andrew Haley wrote:
> Again, before, running in L1:
>
> Benchmark (ALL) (size) Mode Cnt Score Error Units
> StrInflateBench.inflate 32768 8 avgt 10 53.875 ± 0.088 ns/op
> StrInflateBench.inflate 32768 32 avgt 10 58.149 ± 0.735 ns/op
> StrInflateBench.inflate 32768 256 avgt 10 125.529 ± 0.353 ns/op
>
> After:
>
> Benchmark (ALL) (size) Mode Cnt Score Error Units
> StrInflateBench.inflate 32768 8 avgt 10 50.541 ± 0.029 ns/op
> StrInflateBench.inflate 32768 32 avgt 10 55.591 ± 0.393 ns/op
> StrInflateBench.inflate 32768 256 avgt 10 108.823 ± 1.742 ns/op
>
> Before, missing L1:
>
> Benchmark (ALL) (size) Mode Cnt Score Error Units
> StrInflateBench.inflate 1000000 8 avgt 10 57.685 ± 0.225 ns/op
> StrInflateBench.inflate 1000000 32 avgt 10 90.418 ± 0.172 ns/op
> StrInflateBench.inflate 1000000 256 avgt 10 293.611 ± 1.314 ns/op
>
> After:
>
> Benchmark (ALL) (size) Mode Cnt Score Error Units
> StrInflateBench.inflate 1000000 8 avgt 10 54.611 ± 0.122 ns/op
> StrInflateBench.inflate 1000000 32 avgt 10 103.166 ± 0.757 ns/op
> StrInflateBench.inflate 1000000 256 avgt 10 237.011 ± 0.703 ns/op
>
> I don't like one thing: the very high overhead. The fact that the timing
> is never less than 50ns, even when running inside l1, is not pleasing.
> None of this is your fault: it seems to be all of the messing about
> which happens before the intrinsic gets called.
>
> This is OK.
>
More information about the hotspot-compiler-dev
mailing list