RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions

Vladimir Kozlov vladimir.kozlov at oracle.com
Thu Jun 2 19:04:16 UTC 2016


Please, hold on pushing this. We are after Feature complete date May 26:

http://openjdk.java.net/projects/jdk9/

All RFE(enhancement) changes should be approved before push.
Please, wait when approval process is finalized.

Regards,
Vladimir

On 6/2/16 11:48 AM, Lindenmaier, Goetz wrote:
> Ok, reviewed.
>
> Thanks for explanations,
>
>    Goetz.
>
> *From:*Michihiro Horie [mailto:HORIE at jp.ibm.com]
> *Sent:* Thursday, June 02, 2016 5:07 PM
> *To:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> *Cc:* hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
> *Subject:* RE: RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions
>
> Hi Goetz,
>
>>You are saying you could not measure an effect?
> We could not observe a big difference, but as you point out, jbb2013 looks a little difficult to get stable results.
>
>>There still might be a penalty because of the additional instructions
>>as the prefetching for small arrays, or a lost opportunity because of
>>not unrolling.
>>It would be nice to have numbers on this, but I think the effect will
>>be rather small so that I’m also fine with the current solution.
>>
>>How did you test the new solution?
> I agree, a penalty might depend on the applications, but the effect will be rather small. I also tested the latest code by using micro benchmarks and jbb2013.
>
> Best regards,
> --
> Michi-hiro
> IBM Research - Tokyo
>
> Inactive hide details for "Lindenmaier, Goetz" ---2016/06/02 19:47:38---Hi Michihiro, thanks for the new webrev, and thanks Mar"Lindenmaier, Goetz" ---2016/06/02 19:47:38---Hi Michihiro, thanks for
> the new webrev, and thanks Martin for uploading it.
>
> From: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>
> To: Michihiro Horie/Japan/IBM at IBMJP
> Cc: "hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>>, "ppc-aix-port-dev at openjdk.java.net
> <mailto:ppc-aix-port-dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net <mailto:ppc-aix-port-dev at openjdk.java.net>>
> Date: 2016/06/02 19:47
> Subject: RE: RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
>
> Hi Michihiro,
>
> thanks for the new webrev, and thanks Martin for uploading it.
>
>> but there was no special reason on the changes in loop body sizes
> You are saying you could not measure an effect? With jbb2013 it’s hard
> to get reproducible results, jvm2008 is more simple with that. But it
> needs adaptions to run with Java 8 (or you skip the compiler benchmarks).
> Also, there is now jbb2015 which is jbb2013 with some bugs fixed.
>
> The loop body sizes are now the same with and without vsx.
> This alleviates my main concerns.
> There still might be a penalty because of the additional instructions
> as the prefetching for small arrays, or a lost opportunity because of
> not unrolling.
> It would be nice to have numbers on this, but I think the effect will
> be rather small so that I’m also fine with the current solution.
>
> How did you test the new solution?
>
> Best regards,
> Goetz.
>
>
> *From:*Michihiro Horie [mailto:HORIE at jp.ibm.com] *
> Sent:* Thursday, June 02, 2016 4:07 AM*
> To:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>*
> Cc:* hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>; ppc-aix-port-dev at openjdk.java.net <mailto:ppc-aix-port-dev at openjdk.java.net>*
> Subject:* RE: RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions
>
> Hi Goetz,
>
> Thank you very much for your comments, which is really helpful.
>
> I would fix my code to fit the loop body sizes in elements to the original ones. We did measurements by using SPECjbb2013, but there was no special reason on the changes in loop body sizes. They are
> code after the trial and error with a few developers.
>
>>But I'm not convinced that this helps the average performance, as important
>>lengths will regress.
> :
>>Especially with the byte arrays, which are used to store
>>compact Strings, I would doubt that this helps an average
>>application. For sizes 32-128 you first have a failing
>>compare, and then step through a '4' loop instead of a '32'
>>one.
> Your point makes sense to me.
>
>>Should you also set the prefetching engine more aggressive as
>>in the short copy loop?
> I would use prefetching engine as in the short copy loop, thank you.
>
> Best regards,
> --
> Michihiro Horie,
> IBM Research - Tokyo
>
> Inactive hide details for "Lindenmaier, Goetz" ---2016/06/01 23:14:42---Hi Michihiro, Thanks for contributing to the ppc port!"Lindenmaier, Goetz" ---2016/06/01 23:14:42---Hi Michihiro, Thanks for
> contributing to the ppc port!
>
> From: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>>
> To: Michihiro Horie/Japan/IBM at IBMJP, "hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>>,
> "ppc-aix-port-dev at openjdk.java.net <mailto:ppc-aix-port-dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net <mailto:ppc-aix-port-dev at openjdk.java.net>>
> Date: 2016/06/01 23:14
> Subject: RE: RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
>
>
> Hi Michihiro,
>
> Thanks for contributing to the ppc port!
>
> I've been looking at your changes. Basically they look good. But I'm
> not convinced that this helps the average performance, as important
> lengths will regress.
>
> We once ananlyzed a benchmark, where we saw 400 million copies of
> short arrays which had an average length of 38!! elements. There
> were 200 byte array copies and 20 million long array copies.
> This should have changed as there are compact strings, now.
> I think we had suppressed the array copy stubs and modified
> PrintOptoStatistics to collect that data.
>
> As I understand we have the following loops now:
>
> Loop body sizes in elements
> before now sizes regressing
> byte 32,4,1 128,4,1 32-127
> short 16,2,1 16,2,1 none
> int 8,1 32,1 8-31
> long 4,1 16,4,1 none
>
> Especially with the byte arrays, which are used to store
> compact Strings, I would doubt that this helps an average
> application. For sizes 32-128 you first have a failing
> compare, and then step through a '4' loop instead of a '32'
> one.
>
> Further, why don't you use the new instructions in the smaller
> loops, too? Like in the long '4' version?
>
> I could not find a measurement showing the effect on jbb2015
> or jvm2008. Did you measure these?
> Or you could modify your benchmarks for some small, odd
> numbers, as 23, 42, 99 ...?
>
> Should you also set the prefetching engine more aggressive as
> in the short copy loop?
>
> Best regards,
> Goetz.
>
>> -----Original Message-----
>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On
>> Behalf Of Michihiro Horie
>> Sent: Dienstag, 31. Mai 2016 17:37
>> To:hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>; ppc-aix-port-dev at openjdk.java.net <mailto:ppc-aix-port-dev at openjdk.java.net>
>> Subject: RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs
>> by using VSX instructions
>>
>>
>> Dear all,
>>
>> Could you please review the following webrev?
>>
>>http://cr.openjdk.java.net/~mdoerr/8158232_PPC_vsx_copy/webrev.00/
>>
>> This change improves performance of disjoint arraycopy of byte, int, and
>> long by using VSX load/store instructions.
>>
>> Discussion started from:
>>http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016- <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002483.html>_
> _> May/002483.html <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002483.html>
>>
>> Performance improvement with micro benchmarks is shown in:
>>http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016- <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002531.html>_
> _> May/002531.html <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002531.html>
>>
>> Thank you very much,
>>
>> Best regards,
>> --
>> Michihiro Horie,
>> IBM Research - Tokyo
>


More information about the ppc-aix-port-dev mailing list