RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions
Lindenmaier, Goetz
goetz.lindenmaier at sap.com
Wed Jun 1 14:13:34 UTC 2016
Hi Michihiro,
Thanks for contributing to the ppc port!
I've been looking at your changes. Basically they look good. But I'm
not convinced that this helps the average performance, as important
lengths will regress.
We once ananlyzed a benchmark, where we saw 400 million copies of
short arrays which had an average length of 38!! elements. There
were 200 byte array copies and 20 million long array copies.
This should have changed as there are compact strings, now.
I think we had suppressed the array copy stubs and modified
PrintOptoStatistics to collect that data.
As I understand we have the following loops now:
Loop body sizes in elements
before now sizes regressing
byte 32,4,1 128,4,1 32-127
short 16,2,1 16,2,1 none
int 8,1 32,1 8-31
long 4,1 16,4,1 none
Especially with the byte arrays, which are used to store
compact Strings, I would doubt that this helps an average
application. For sizes 32-128 you first have a failing
compare, and then step through a '4' loop instead of a '32'
one.
Further, why don't you use the new instructions in the smaller
loops, too? Like in the long '4' version?
I could not find a measurement showing the effect on jbb2015
or jvm2008. Did you measure these?
Or you could modify your benchmarks for some small, odd
numbers, as 23, 42, 99 ...?
Should you also set the prefetching engine more aggressive as
in the short copy loop?
Best regards,
Goetz.
> -----Original Message-----
> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On
> Behalf Of Michihiro Horie
> Sent: Dienstag, 31. Mai 2016 17:37
> To: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
> Subject: RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs
> by using VSX instructions
>
>
> Dear all,
>
> Could you please review the following webrev?
>
> http://cr.openjdk.java.net/~mdoerr/8158232_PPC_vsx_copy/webrev.00/
>
> This change improves performance of disjoint arraycopy of byte, int, and
> long by using VSX load/store instructions.
>
> Discussion started from:
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-<http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002483.html>
> May/002483.html<http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002483.html>
>
> Performance improvement with micro benchmarks is shown in:
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-<http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002531.html>
> May/002531.html<http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002531.html>
>
> Thank you very much,
>
> Best regards,
> --
> Michihiro Horie,
> IBM Research - Tokyo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160601/81835fca/attachment.html>
More information about the ppc-aix-port-dev
mailing list