RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions

Volker Simonis volker.simonis at gmail.com
Wed Jun 15 08:49:09 UTC 2016


Thanks Vladimir.

I've updated the issue in JBS accordingly.

Regards,
Volker


On Tue, Jun 14, 2016 at 8:34 PM, Vladimir Kozlov
<vladimir.kozlov at oracle.com> wrote:
> Yes, there was mail sent by Mark about the process:
>
> http://mail.openjdk.java.net/pipermail/jdk9-dev/2016-June/004443.html
>
> "  - If you own a JEP or a small enhancement that is not yet complete then
>     you can request an FC extension as follows: Update the JBS issue to
>     add a comment whose first line is "FC Extension Request".  In that
>     comment describe the remaining work to be done, the risk level, a
>     brief justification, and your best estimate of the date by which the
>     feature will be complete.  Add the label "jdk9-fc-request" to the
>     issue.
> "
>
> Regards,
> Vladimir
>
>
> On 6/14/16 11:18 AM, Volker Simonis wrote:
>>
>> Hi Vladimir,
>>
>> are there any news regarding the approval process?
>> I think this change is ppc only and shouldn't do any harm.
>> Or maybe w should just change the issue from "Enhancement" to
>> "(Performance) Bug" if that would simplify the procedure?
>>
>> Thank you and best regards,
>> Volker
>>
>>
>> On Thu, Jun 2, 2016 at 9:04 PM, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com> wrote:
>>>
>>> Please, hold on pushing this. We are after Feature complete date May 26:
>>>
>>> http://openjdk.java.net/projects/jdk9/
>>>
>>> All RFE(enhancement) changes should be approved before push.
>>> Please, wait when approval process is finalized.
>>>
>>> Regards,
>>> Vladimir
>>>
>>> On 6/2/16 11:48 AM, Lindenmaier, Goetz wrote:
>>>>
>>>>
>>>> Ok, reviewed.
>>>>
>>>> Thanks for explanations,
>>>>
>>>>    Goetz.
>>>>
>>>> *From:*Michihiro Horie [mailto:HORIE at jp.ibm.com]
>>>> *Sent:* Thursday, June 02, 2016 5:07 PM
>>>> *To:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
>>>> *Cc:* hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
>>>> *Subject:* RE: RFR(M): 8158232: PPC64: improve byte, int and long array
>>>> copy stubs by using VSX instructions
>>>>
>>>> Hi Goetz,
>>>>
>>>>> You are saying you could not measure an effect?
>>>>
>>>>
>>>> We could not observe a big difference, but as you point out, jbb2013
>>>> looks
>>>> a little difficult to get stable results.
>>>>
>>>>> There still might be a penalty because of the additional instructions
>>>>> as the prefetching for small arrays, or a lost opportunity because of
>>>>> not unrolling.
>>>>> It would be nice to have numbers on this, but I think the effect will
>>>>> be rather small so that I’m also fine with the current solution.
>>>>>
>>>>> How did you test the new solution?
>>>>
>>>>
>>>> I agree, a penalty might depend on the applications, but the effect will
>>>> be rather small. I also tested the latest code by using micro benchmarks
>>>> and
>>>> jbb2013.
>>>>
>>>> Best regards,
>>>> --
>>>> Michi-hiro
>>>> IBM Research - Tokyo
>>>>
>>>> Inactive hide details for "Lindenmaier, Goetz" ---2016/06/02
>>>> 19:47:38---Hi
>>>> Michihiro, thanks for the new webrev, and thanks Mar"Lindenmaier, Goetz"
>>>> ---2016/06/02 19:47:38---Hi Michihiro, thanks for
>>>> the new webrev, and thanks Martin for uploading it.
>>>>
>>>> From: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com
>>>> <mailto:goetz.lindenmaier at sap.com>>
>>>> To: Michihiro Horie/Japan/IBM at IBMJP
>>>> Cc: "hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>"
>>>> <hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>>,
>>>> "ppc-aix-port-dev at openjdk.java.net
>>>> <mailto:ppc-aix-port-dev at openjdk.java.net>"
>>>> <ppc-aix-port-dev at openjdk.java.net
>>>> <mailto:ppc-aix-port-dev at openjdk.java.net>>
>>>> Date: 2016/06/02 19:47
>>>> Subject: RE: RFR(M): 8158232: PPC64: improve byte, int and long array
>>>> copy
>>>> stubs by using VSX instructions
>>>>
>>>>
>>>>
>>>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> Hi Michihiro,
>>>>
>>>> thanks for the new webrev, and thanks Martin for uploading it.
>>>>
>>>>> but there was no special reason on the changes in loop body sizes
>>>>
>>>>
>>>> You are saying you could not measure an effect? With jbb2013 it’s hard
>>>> to get reproducible results, jvm2008 is more simple with that. But it
>>>> needs adaptions to run with Java 8 (or you skip the compiler
>>>> benchmarks).
>>>> Also, there is now jbb2015 which is jbb2013 with some bugs fixed.
>>>>
>>>> The loop body sizes are now the same with and without vsx.
>>>> This alleviates my main concerns.
>>>> There still might be a penalty because of the additional instructions
>>>> as the prefetching for small arrays, or a lost opportunity because of
>>>> not unrolling.
>>>> It would be nice to have numbers on this, but I think the effect will
>>>> be rather small so that I’m also fine with the current solution.
>>>>
>>>> How did you test the new solution?
>>>>
>>>> Best regards,
>>>> Goetz.
>>>>
>>>>
>>>> *From:*Michihiro Horie [mailto:HORIE at jp.ibm.com] *
>>>> Sent:* Thursday, June 02, 2016 4:07 AM*
>>>> To:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com
>>>> <mailto:goetz.lindenmaier at sap.com>>*
>>>> Cc:* hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>;
>>>> ppc-aix-port-dev at openjdk.java.net
>>>> <mailto:ppc-aix-port-dev at openjdk.java.net>*
>>>> Subject:* RE: RFR(M): 8158232: PPC64: improve byte, int and long array
>>>> copy stubs by using VSX instructions
>>>>
>>>> Hi Goetz,
>>>>
>>>> Thank you very much for your comments, which is really helpful.
>>>>
>>>> I would fix my code to fit the loop body sizes in elements to the
>>>> original
>>>> ones. We did measurements by using SPECjbb2013, but there was no special
>>>> reason on the changes in loop body sizes. They are
>>>> code after the trial and error with a few developers.
>>>>
>>>>> But I'm not convinced that this helps the average performance, as
>>>>> important
>>>>> lengths will regress.
>>>>
>>>>
>>>> :
>>>>>
>>>>>
>>>>> Especially with the byte arrays, which are used to store
>>>>> compact Strings, I would doubt that this helps an average
>>>>> application. For sizes 32-128 you first have a failing
>>>>> compare, and then step through a '4' loop instead of a '32'
>>>>> one.
>>>>
>>>>
>>>> Your point makes sense to me.
>>>>
>>>>> Should you also set the prefetching engine more aggressive as
>>>>> in the short copy loop?
>>>>
>>>>
>>>> I would use prefetching engine as in the short copy loop, thank you.
>>>>
>>>> Best regards,
>>>> --
>>>> Michihiro Horie,
>>>> IBM Research - Tokyo
>>>>
>>>> Inactive hide details for "Lindenmaier, Goetz" ---2016/06/01
>>>> 23:14:42---Hi
>>>> Michihiro, Thanks for contributing to the ppc port!"Lindenmaier, Goetz"
>>>> ---2016/06/01 23:14:42---Hi Michihiro, Thanks for
>>>> contributing to the ppc port!
>>>>
>>>> From: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com
>>>> <mailto:goetz.lindenmaier at sap.com>>
>>>> To: Michihiro Horie/Japan/IBM at IBMJP, "hotspot-dev at openjdk.java.net
>>>> <mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net
>>>> <mailto:hotspot-dev at openjdk.java.net>>,
>>>> "ppc-aix-port-dev at openjdk.java.net
>>>> <mailto:ppc-aix-port-dev at openjdk.java.net>"
>>>> <ppc-aix-port-dev at openjdk.java.net
>>>> <mailto:ppc-aix-port-dev at openjdk.java.net>>
>>>> Date: 2016/06/01 23:14
>>>> Subject: RE: RFR(M): 8158232: PPC64: improve byte, int and long array
>>>> copy
>>>> stubs by using VSX instructions
>>>>
>>>>
>>>>
>>>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hi Michihiro,
>>>>
>>>> Thanks for contributing to the ppc port!
>>>>
>>>> I've been looking at your changes. Basically they look good. But I'm
>>>> not convinced that this helps the average performance, as important
>>>> lengths will regress.
>>>>
>>>> We once ananlyzed a benchmark, where we saw 400 million copies of
>>>> short arrays which had an average length of 38!! elements. There
>>>> were 200 byte array copies and 20 million long array copies.
>>>> This should have changed as there are compact strings, now.
>>>> I think we had suppressed the array copy stubs and modified
>>>> PrintOptoStatistics to collect that data.
>>>>
>>>> As I understand we have the following loops now:
>>>>
>>>> Loop body sizes in elements
>>>> before now sizes regressing
>>>> byte 32,4,1 128,4,1 32-127
>>>> short 16,2,1 16,2,1 none
>>>> int 8,1 32,1 8-31
>>>> long 4,1 16,4,1 none
>>>>
>>>> Especially with the byte arrays, which are used to store
>>>> compact Strings, I would doubt that this helps an average
>>>> application. For sizes 32-128 you first have a failing
>>>> compare, and then step through a '4' loop instead of a '32'
>>>> one.
>>>>
>>>> Further, why don't you use the new instructions in the smaller
>>>> loops, too? Like in the long '4' version?
>>>>
>>>> I could not find a measurement showing the effect on jbb2015
>>>> or jvm2008. Did you measure these?
>>>> Or you could modify your benchmarks for some small, odd
>>>> numbers, as 23, 42, 99 ...?
>>>>
>>>> Should you also set the prefetching engine more aggressive as
>>>> in the short copy loop?
>>>>
>>>> Best regards,
>>>> Goetz.
>>>>
>>>>> -----Original Message-----
>>>>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On
>>>>> Behalf Of Michihiro Horie
>>>>> Sent: Dienstag, 31. Mai 2016 17:37
>>>>> To:hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>;
>>>>> ppc-aix-port-dev at openjdk.java.net
>>>>> <mailto:ppc-aix-port-dev at openjdk.java.net>
>>>>> Subject: RFR(M): 8158232: PPC64: improve byte, int and long array copy
>>>>> stubs
>>>>> by using VSX instructions
>>>>>
>>>>>
>>>>> Dear all,
>>>>>
>>>>> Could you please review the following webrev?
>>>>>
>>>>> http://cr.openjdk.java.net/~mdoerr/8158232_PPC_vsx_copy/webrev.00/
>>>>>
>>>>> This change improves performance of disjoint arraycopy of byte, int,
>>>>> and
>>>>> long by using VSX load/store instructions.
>>>>>
>>>>> Discussion started from:
>>>>> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-
>>>>>
>>>>> <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002483.html>_
>>>>
>>>>
>>>> _> May/002483.html
>>>>
>>>> <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002483.html>
>>>>>
>>>>>
>>>>>
>>>>> Performance improvement with micro benchmarks is shown in:
>>>>> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-
>>>>>
>>>>> <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002531.html>_
>>>>
>>>>
>>>> _> May/002531.html
>>>>
>>>> <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002531.html>
>>>>>
>>>>>
>>>>>
>>>>> Thank you very much,
>>>>>
>>>>> Best regards,
>>>>> --
>>>>> Michihiro Horie,
>>>>> IBM Research - Tokyo
>>>>
>>>>
>>>>
>>>
>


More information about the hotspot-dev mailing list