RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions

Volker Simonis volker.simonis at gmail.com
Tue Jun 14 18:18:40 UTC 2016


Hi Vladimir,

are there any news regarding the approval process?
I think this change is ppc only and shouldn't do any harm.
Or maybe w should just change the issue from "Enhancement" to
"(Performance) Bug" if that would simplify the procedure?

Thank you and best regards,
Volker


On Thu, Jun 2, 2016 at 9:04 PM, Vladimir Kozlov
<vladimir.kozlov at oracle.com> wrote:
> Please, hold on pushing this. We are after Feature complete date May 26:
>
> http://openjdk.java.net/projects/jdk9/
>
> All RFE(enhancement) changes should be approved before push.
> Please, wait when approval process is finalized.
>
> Regards,
> Vladimir
>
> On 6/2/16 11:48 AM, Lindenmaier, Goetz wrote:
>>
>> Ok, reviewed.
>>
>> Thanks for explanations,
>>
>>    Goetz.
>>
>> *From:*Michihiro Horie [mailto:HORIE at jp.ibm.com]
>> *Sent:* Thursday, June 02, 2016 5:07 PM
>> *To:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
>> *Cc:* hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
>> *Subject:* RE: RFR(M): 8158232: PPC64: improve byte, int and long array
>> copy stubs by using VSX instructions
>>
>> Hi Goetz,
>>
>>> You are saying you could not measure an effect?
>>
>> We could not observe a big difference, but as you point out, jbb2013 looks
>> a little difficult to get stable results.
>>
>>> There still might be a penalty because of the additional instructions
>>> as the prefetching for small arrays, or a lost opportunity because of
>>> not unrolling.
>>> It would be nice to have numbers on this, but I think the effect will
>>> be rather small so that I’m also fine with the current solution.
>>>
>>> How did you test the new solution?
>>
>> I agree, a penalty might depend on the applications, but the effect will
>> be rather small. I also tested the latest code by using micro benchmarks and
>> jbb2013.
>>
>> Best regards,
>> --
>> Michi-hiro
>> IBM Research - Tokyo
>>
>> Inactive hide details for "Lindenmaier, Goetz" ---2016/06/02 19:47:38---Hi
>> Michihiro, thanks for the new webrev, and thanks Mar"Lindenmaier, Goetz"
>> ---2016/06/02 19:47:38---Hi Michihiro, thanks for
>> the new webrev, and thanks Martin for uploading it.
>>
>> From: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com
>> <mailto:goetz.lindenmaier at sap.com>>
>> To: Michihiro Horie/Japan/IBM at IBMJP
>> Cc: "hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>"
>> <hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>>,
>> "ppc-aix-port-dev at openjdk.java.net
>> <mailto:ppc-aix-port-dev at openjdk.java.net>"
>> <ppc-aix-port-dev at openjdk.java.net
>> <mailto:ppc-aix-port-dev at openjdk.java.net>>
>> Date: 2016/06/02 19:47
>> Subject: RE: RFR(M): 8158232: PPC64: improve byte, int and long array copy
>> stubs by using VSX instructions
>>
>>
>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>>
>>
>> Hi Michihiro,
>>
>> thanks for the new webrev, and thanks Martin for uploading it.
>>
>>> but there was no special reason on the changes in loop body sizes
>>
>> You are saying you could not measure an effect? With jbb2013 it’s hard
>> to get reproducible results, jvm2008 is more simple with that. But it
>> needs adaptions to run with Java 8 (or you skip the compiler benchmarks).
>> Also, there is now jbb2015 which is jbb2013 with some bugs fixed.
>>
>> The loop body sizes are now the same with and without vsx.
>> This alleviates my main concerns.
>> There still might be a penalty because of the additional instructions
>> as the prefetching for small arrays, or a lost opportunity because of
>> not unrolling.
>> It would be nice to have numbers on this, but I think the effect will
>> be rather small so that I’m also fine with the current solution.
>>
>> How did you test the new solution?
>>
>> Best regards,
>> Goetz.
>>
>>
>> *From:*Michihiro Horie [mailto:HORIE at jp.ibm.com] *
>> Sent:* Thursday, June 02, 2016 4:07 AM*
>> To:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com
>> <mailto:goetz.lindenmaier at sap.com>>*
>> Cc:* hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>;
>> ppc-aix-port-dev at openjdk.java.net
>> <mailto:ppc-aix-port-dev at openjdk.java.net>*
>> Subject:* RE: RFR(M): 8158232: PPC64: improve byte, int and long array
>> copy stubs by using VSX instructions
>>
>> Hi Goetz,
>>
>> Thank you very much for your comments, which is really helpful.
>>
>> I would fix my code to fit the loop body sizes in elements to the original
>> ones. We did measurements by using SPECjbb2013, but there was no special
>> reason on the changes in loop body sizes. They are
>> code after the trial and error with a few developers.
>>
>>> But I'm not convinced that this helps the average performance, as
>>> important
>>> lengths will regress.
>>
>> :
>>>
>>> Especially with the byte arrays, which are used to store
>>> compact Strings, I would doubt that this helps an average
>>> application. For sizes 32-128 you first have a failing
>>> compare, and then step through a '4' loop instead of a '32'
>>> one.
>>
>> Your point makes sense to me.
>>
>>> Should you also set the prefetching engine more aggressive as
>>> in the short copy loop?
>>
>> I would use prefetching engine as in the short copy loop, thank you.
>>
>> Best regards,
>> --
>> Michihiro Horie,
>> IBM Research - Tokyo
>>
>> Inactive hide details for "Lindenmaier, Goetz" ---2016/06/01 23:14:42---Hi
>> Michihiro, Thanks for contributing to the ppc port!"Lindenmaier, Goetz"
>> ---2016/06/01 23:14:42---Hi Michihiro, Thanks for
>> contributing to the ppc port!
>>
>> From: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com
>> <mailto:goetz.lindenmaier at sap.com>>
>> To: Michihiro Horie/Japan/IBM at IBMJP, "hotspot-dev at openjdk.java.net
>> <mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net
>> <mailto:hotspot-dev at openjdk.java.net>>,
>> "ppc-aix-port-dev at openjdk.java.net
>> <mailto:ppc-aix-port-dev at openjdk.java.net>"
>> <ppc-aix-port-dev at openjdk.java.net
>> <mailto:ppc-aix-port-dev at openjdk.java.net>>
>> Date: 2016/06/01 23:14
>> Subject: RE: RFR(M): 8158232: PPC64: improve byte, int and long array copy
>> stubs by using VSX instructions
>>
>>
>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>>
>>
>>
>>
>> Hi Michihiro,
>>
>> Thanks for contributing to the ppc port!
>>
>> I've been looking at your changes. Basically they look good. But I'm
>> not convinced that this helps the average performance, as important
>> lengths will regress.
>>
>> We once ananlyzed a benchmark, where we saw 400 million copies of
>> short arrays which had an average length of 38!! elements. There
>> were 200 byte array copies and 20 million long array copies.
>> This should have changed as there are compact strings, now.
>> I think we had suppressed the array copy stubs and modified
>> PrintOptoStatistics to collect that data.
>>
>> As I understand we have the following loops now:
>>
>> Loop body sizes in elements
>> before now sizes regressing
>> byte 32,4,1 128,4,1 32-127
>> short 16,2,1 16,2,1 none
>> int 8,1 32,1 8-31
>> long 4,1 16,4,1 none
>>
>> Especially with the byte arrays, which are used to store
>> compact Strings, I would doubt that this helps an average
>> application. For sizes 32-128 you first have a failing
>> compare, and then step through a '4' loop instead of a '32'
>> one.
>>
>> Further, why don't you use the new instructions in the smaller
>> loops, too? Like in the long '4' version?
>>
>> I could not find a measurement showing the effect on jbb2015
>> or jvm2008. Did you measure these?
>> Or you could modify your benchmarks for some small, odd
>> numbers, as 23, 42, 99 ...?
>>
>> Should you also set the prefetching engine more aggressive as
>> in the short copy loop?
>>
>> Best regards,
>> Goetz.
>>
>>> -----Original Message-----
>>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On
>>> Behalf Of Michihiro Horie
>>> Sent: Dienstag, 31. Mai 2016 17:37
>>> To:hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>;
>>> ppc-aix-port-dev at openjdk.java.net <mailto:ppc-aix-port-dev at openjdk.java.net>
>>> Subject: RFR(M): 8158232: PPC64: improve byte, int and long array copy
>>> stubs
>>> by using VSX instructions
>>>
>>>
>>> Dear all,
>>>
>>> Could you please review the following webrev?
>>>
>>> http://cr.openjdk.java.net/~mdoerr/8158232_PPC_vsx_copy/webrev.00/
>>>
>>> This change improves performance of disjoint arraycopy of byte, int, and
>>> long by using VSX load/store instructions.
>>>
>>> Discussion started from:
>>> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-
>>> <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002483.html>_
>>
>> _> May/002483.html
>> <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002483.html>
>>>
>>>
>>> Performance improvement with micro benchmarks is shown in:
>>> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-
>>> <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002531.html>_
>>
>> _> May/002531.html
>> <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002531.html>
>>>
>>>
>>> Thank you very much,
>>>
>>> Best regards,
>>> --
>>> Michihiro Horie,
>>> IBM Research - Tokyo
>>
>>
>


More information about the ppc-aix-port-dev mailing list