RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions

Wed Jun 1 14:13:34 UTC 2016

Hi Michihiro,

Thanks for contributing to the ppc port!

I've been looking at your changes.  Basically they look good. But I'm

not convinced that this helps the average performance, as important

lengths will regress.

We once ananlyzed a benchmark, where we saw 400 million copies of

short arrays which had an average length of 38!! elements. There

were 200 byte array copies and 20 million long array copies.

This should have changed as there are compact strings, now.

I think we had suppressed the array copy stubs and modified

PrintOptoStatistics to collect that data.

As I understand we have the following loops now:

      Loop body sizes in elements

          before        now      sizes regressing

byte      32,4,1       128,4,1      32-127

short     16,2,1       16,2,1        none

int        8,1          32,1         8-31

long       4,1         16,4,1        none

Especially with the byte arrays, which are used to store

compact Strings, I would doubt that this helps an average

application. For sizes 32-128 you first have a failing

compare, and then step through a '4' loop instead of a '32'

one.

Further, why don't you use the new instructions in the smaller

loops, too? Like in the long '4' version?

I could not find a measurement showing the effect on jbb2015

or jvm2008.  Did you measure these?

Or you could modify your benchmarks for some small, odd

numbers, as 23, 42, 99 ...?

Should you also set the prefetching engine more aggressive as

in the short copy loop?

Best regards,

  Goetz.

> -----Original Message-----

> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On

> Behalf Of Michihiro Horie

> Sent: Dienstag, 31. Mai 2016 17:37

> To: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net

> Subject: RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs

> by using VSX instructions

>

>

> Dear all,

>

> Could you please review the following webrev?

>

> http://cr.openjdk.java.net/~mdoerr/8158232_PPC_vsx_copy/webrev.00/

>

> This change improves performance of disjoint arraycopy of byte, int, and

> long by using VSX load/store instructions.

>

> Discussion started from:

> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-<http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002483.html>

> May/002483.html<http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002483.html>

>

> Performance improvement with micro benchmarks is shown in:

> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-<http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002531.html>

> May/002531.html<http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002531.html>

>

> Thank you very much,

>

> Best regards,

> --

> Michihiro Horie,

> IBM Research - Tokyo