RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions

Michihiro Horie HORIE at jp.ibm.com
Thu Jun 2 15:07:00 UTC 2016


Hi Goetz,

>You are saying you could not measure an effect?
We could not observe a big difference, but as you point out, jbb2013 looks
a little difficult to get stable results.

>There still might be a penalty because of the additional instructions
>as the prefetching for small arrays, or a lost opportunity because of
>not unrolling.
>It would be nice to have numbers on this, but I think the effect will
>be rather small so that I’m also fine with the current solution.
>
>How did you test the new solution?
I agree, a penalty might depend on the applications, but the effect will be
rather small. I also tested the latest code by using micro benchmarks and
jbb2013.

Best regards,
--
Michi-hiro
IBM Research - Tokyo



From:	"Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>
To:	Michihiro Horie/Japan/IBM at IBMJP
Cc:	"hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>,
            "ppc-aix-port-dev at openjdk.java.net"
            <ppc-aix-port-dev at openjdk.java.net>
Date:	2016/06/02 19:47
Subject:	RE: RFR(M): 8158232: PPC64: improve byte, int and long array
            copy stubs by	using VSX instructions



Hi Michihiro,

thanks for the new webrev, and thanks Martin for uploading it.

> but there was no special reason on the changes in loop body sizes
You are saying you could not measure an effect? With jbb2013 it’s hard
to get reproducible results, jvm2008 is more simple with that.  But it
needs adaptions to run with Java 8 (or you skip the compiler benchmarks).
Also, there is now jbb2015 which is jbb2013 with some bugs fixed.

The loop body sizes are now the same with and without vsx.
This alleviates my main concerns.
There still might be a penalty because of the additional instructions
as the prefetching for small arrays, or a lost opportunity because of
not unrolling.
It would be nice to have numbers on this, but I think the effect will
be rather small so that I’m also fine with the current solution.

How did you test the new solution?

Best regards,
  Goetz.


From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
Sent: Thursday, June 02, 2016 4:07 AM
To: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
Cc: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
Subject: RE: RFR(M): 8158232: PPC64: improve byte, int and long array copy
stubs by using VSX instructions



Hi Goetz,

Thank you very much for your comments, which is really helpful.

I would fix my code to fit the loop body sizes in elements to the original
ones. We did measurements by using SPECjbb2013, but there was no special
reason on the changes in loop body sizes. They are code after the trial and
error with a few developers.

>But I'm not convinced that this helps the average performance, as
important
>lengths will regress.
:
>Especially with the byte arrays, which are used to store
>compact Strings, I would doubt that this helps an average
>application. For sizes 32-128 you first have a failing
>compare, and then step through a '4' loop instead of a '32'
>one.
Your point makes sense to me.

>Should you also set the prefetching engine more aggressive as
>in the short copy loop?
I would use prefetching engine as in the short copy loop, thank you.

Best regards,
--
Michihiro Horie,
IBM Research - Tokyo

Inactive hide details for "Lindenmaier, Goetz" ---2016/06/01 23:14:42---Hi
Michihiro, Thanks for contributing to the ppc port!"Lindenmaier, Goetz"
---2016/06/01 23:14:42---Hi Michihiro, Thanks for contributing to the ppc
port!

From: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>
To: Michihiro Horie/Japan/IBM at IBMJP, "hotspot-dev at openjdk.java.net" <
hotspot-dev at openjdk.java.net>, "ppc-aix-port-dev at openjdk.java.net" <
ppc-aix-port-dev at openjdk.java.net>
Date: 2016/06/01 23:14
Subject: RE: RFR(M): 8158232: PPC64: improve byte, int and long array copy
stubs by using VSX instructions




Hi Michihiro,

Thanks for contributing to the ppc port!

I've been looking at your changes. Basically they look good. But I'm
not convinced that this helps the average performance, as important
lengths will regress.

We once ananlyzed a benchmark, where we saw 400 million copies of
short arrays which had an average length of 38!! elements. There
were 200 byte array copies and 20 million long array copies.
This should have changed as there are compact strings, now.
I think we had suppressed the array copy stubs and modified
PrintOptoStatistics to collect that data.

As I understand we have the following loops now:

Loop body sizes in elements
before now sizes regressing
byte 32,4,1 128,4,1 32-127
short 16,2,1 16,2,1 none
int 8,1 32,1 8-31
long 4,1 16,4,1 none

Especially with the byte arrays, which are used to store
compact Strings, I would doubt that this helps an average
application. For sizes 32-128 you first have a failing
compare, and then step through a '4' loop instead of a '32'
one.

Further, why don't you use the new instructions in the smaller
loops, too? Like in the long '4' version?

I could not find a measurement showing the effect on jbb2015
or jvm2008. Did you measure these?
Or you could modify your benchmarks for some small, odd
numbers, as 23, 42, 99 ...?

Should you also set the prefetching engine more aggressive as
in the short copy loop?

Best regards,
Goetz.

> -----Original Message-----
> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On
> Behalf Of Michihiro Horie
> Sent: Dienstag, 31. Mai 2016 17:37
> To: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
> Subject: RFR(M): 8158232: PPC64: improve byte, int and long array copy
stubs
> by using VSX instructions
>
>
> Dear all,
>
> Could you please review the following webrev?
>
> http://cr.openjdk.java.net/~mdoerr/8158232_PPC_vsx_copy/webrev.00/
>
> This change improves performance of disjoint arraycopy of byte, int, and
> long by using VSX load/store instructions.
>
> Discussion started from:
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-
> May/002483.html
>
> Performance improvement with micro benchmarks is shown in:
> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-
> May/002531.html
>
> Thank you very much,
>
> Best regards,
> --
> Michihiro Horie,
> IBM Research - Tokyo

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160603/de7a40b7/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160603/de7a40b7/graycol-0001.gif>


More information about the ppc-aix-port-dev mailing list