RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions

Vladimir Kozlov vladimir.kozlov at oracle.com
Tue Jun 14 18:34:05 UTC 2016


Yes, there was mail sent by Mark about the process:

http://mail.openjdk.java.net/pipermail/jdk9-dev/2016-June/004443.html

"  - If you own a JEP or a small enhancement that is not yet complete then
     you can request an FC extension as follows: Update the JBS issue to
     add a comment whose first line is "FC Extension Request".  In that
     comment describe the remaining work to be done, the risk level, a
     brief justification, and your best estimate of the date by which the
     feature will be complete.  Add the label "jdk9-fc-request" to the
     issue.
"

Regards,
Vladimir

On 6/14/16 11:18 AM, Volker Simonis wrote:
> Hi Vladimir,
>
> are there any news regarding the approval process?
> I think this change is ppc only and shouldn't do any harm.
> Or maybe w should just change the issue from "Enhancement" to
> "(Performance) Bug" if that would simplify the procedure?
>
> Thank you and best regards,
> Volker
>
>
> On Thu, Jun 2, 2016 at 9:04 PM, Vladimir Kozlov
> <vladimir.kozlov at oracle.com> wrote:
>> Please, hold on pushing this. We are after Feature complete date May 26:
>>
>> http://openjdk.java.net/projects/jdk9/
>>
>> All RFE(enhancement) changes should be approved before push.
>> Please, wait when approval process is finalized.
>>
>> Regards,
>> Vladimir
>>
>> On 6/2/16 11:48 AM, Lindenmaier, Goetz wrote:
>>>
>>> Ok, reviewed.
>>>
>>> Thanks for explanations,
>>>
>>>    Goetz.
>>>
>>> *From:*Michihiro Horie [mailto:HORIE at jp.ibm.com]
>>> *Sent:* Thursday, June 02, 2016 5:07 PM
>>> *To:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
>>> *Cc:* hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
>>> *Subject:* RE: RFR(M): 8158232: PPC64: improve byte, int and long array
>>> copy stubs by using VSX instructions
>>>
>>> Hi Goetz,
>>>
>>>> You are saying you could not measure an effect?
>>>
>>> We could not observe a big difference, but as you point out, jbb2013 looks
>>> a little difficult to get stable results.
>>>
>>>> There still might be a penalty because of the additional instructions
>>>> as the prefetching for small arrays, or a lost opportunity because of
>>>> not unrolling.
>>>> It would be nice to have numbers on this, but I think the effect will
>>>> be rather small so that I’m also fine with the current solution.
>>>>
>>>> How did you test the new solution?
>>>
>>> I agree, a penalty might depend on the applications, but the effect will
>>> be rather small. I also tested the latest code by using micro benchmarks and
>>> jbb2013.
>>>
>>> Best regards,
>>> --
>>> Michi-hiro
>>> IBM Research - Tokyo
>>>
>>> Inactive hide details for "Lindenmaier, Goetz" ---2016/06/02 19:47:38---Hi
>>> Michihiro, thanks for the new webrev, and thanks Mar"Lindenmaier, Goetz"
>>> ---2016/06/02 19:47:38---Hi Michihiro, thanks for
>>> the new webrev, and thanks Martin for uploading it.
>>>
>>> From: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com
>>> <mailto:goetz.lindenmaier at sap.com>>
>>> To: Michihiro Horie/Japan/IBM at IBMJP
>>> Cc: "hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>"
>>> <hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>>,
>>> "ppc-aix-port-dev at openjdk.java.net
>>> <mailto:ppc-aix-port-dev at openjdk.java.net>"
>>> <ppc-aix-port-dev at openjdk.java.net
>>> <mailto:ppc-aix-port-dev at openjdk.java.net>>
>>> Date: 2016/06/02 19:47
>>> Subject: RE: RFR(M): 8158232: PPC64: improve byte, int and long array copy
>>> stubs by using VSX instructions
>>>
>>>
>>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>> Hi Michihiro,
>>>
>>> thanks for the new webrev, and thanks Martin for uploading it.
>>>
>>>> but there was no special reason on the changes in loop body sizes
>>>
>>> You are saying you could not measure an effect? With jbb2013 it’s hard
>>> to get reproducible results, jvm2008 is more simple with that. But it
>>> needs adaptions to run with Java 8 (or you skip the compiler benchmarks).
>>> Also, there is now jbb2015 which is jbb2013 with some bugs fixed.
>>>
>>> The loop body sizes are now the same with and without vsx.
>>> This alleviates my main concerns.
>>> There still might be a penalty because of the additional instructions
>>> as the prefetching for small arrays, or a lost opportunity because of
>>> not unrolling.
>>> It would be nice to have numbers on this, but I think the effect will
>>> be rather small so that I’m also fine with the current solution.
>>>
>>> How did you test the new solution?
>>>
>>> Best regards,
>>> Goetz.
>>>
>>>
>>> *From:*Michihiro Horie [mailto:HORIE at jp.ibm.com] *
>>> Sent:* Thursday, June 02, 2016 4:07 AM*
>>> To:* Lindenmaier, Goetz <goetz.lindenmaier at sap.com
>>> <mailto:goetz.lindenmaier at sap.com>>*
>>> Cc:* hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>;
>>> ppc-aix-port-dev at openjdk.java.net
>>> <mailto:ppc-aix-port-dev at openjdk.java.net>*
>>> Subject:* RE: RFR(M): 8158232: PPC64: improve byte, int and long array
>>> copy stubs by using VSX instructions
>>>
>>> Hi Goetz,
>>>
>>> Thank you very much for your comments, which is really helpful.
>>>
>>> I would fix my code to fit the loop body sizes in elements to the original
>>> ones. We did measurements by using SPECjbb2013, but there was no special
>>> reason on the changes in loop body sizes. They are
>>> code after the trial and error with a few developers.
>>>
>>>> But I'm not convinced that this helps the average performance, as
>>>> important
>>>> lengths will regress.
>>>
>>> :
>>>>
>>>> Especially with the byte arrays, which are used to store
>>>> compact Strings, I would doubt that this helps an average
>>>> application. For sizes 32-128 you first have a failing
>>>> compare, and then step through a '4' loop instead of a '32'
>>>> one.
>>>
>>> Your point makes sense to me.
>>>
>>>> Should you also set the prefetching engine more aggressive as
>>>> in the short copy loop?
>>>
>>> I would use prefetching engine as in the short copy loop, thank you.
>>>
>>> Best regards,
>>> --
>>> Michihiro Horie,
>>> IBM Research - Tokyo
>>>
>>> Inactive hide details for "Lindenmaier, Goetz" ---2016/06/01 23:14:42---Hi
>>> Michihiro, Thanks for contributing to the ppc port!"Lindenmaier, Goetz"
>>> ---2016/06/01 23:14:42---Hi Michihiro, Thanks for
>>> contributing to the ppc port!
>>>
>>> From: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com
>>> <mailto:goetz.lindenmaier at sap.com>>
>>> To: Michihiro Horie/Japan/IBM at IBMJP, "hotspot-dev at openjdk.java.net
>>> <mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net
>>> <mailto:hotspot-dev at openjdk.java.net>>,
>>> "ppc-aix-port-dev at openjdk.java.net
>>> <mailto:ppc-aix-port-dev at openjdk.java.net>"
>>> <ppc-aix-port-dev at openjdk.java.net
>>> <mailto:ppc-aix-port-dev at openjdk.java.net>>
>>> Date: 2016/06/01 23:14
>>> Subject: RE: RFR(M): 8158232: PPC64: improve byte, int and long array copy
>>> stubs by using VSX instructions
>>>
>>>
>>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>> Hi Michihiro,
>>>
>>> Thanks for contributing to the ppc port!
>>>
>>> I've been looking at your changes. Basically they look good. But I'm
>>> not convinced that this helps the average performance, as important
>>> lengths will regress.
>>>
>>> We once ananlyzed a benchmark, where we saw 400 million copies of
>>> short arrays which had an average length of 38!! elements. There
>>> were 200 byte array copies and 20 million long array copies.
>>> This should have changed as there are compact strings, now.
>>> I think we had suppressed the array copy stubs and modified
>>> PrintOptoStatistics to collect that data.
>>>
>>> As I understand we have the following loops now:
>>>
>>> Loop body sizes in elements
>>> before now sizes regressing
>>> byte 32,4,1 128,4,1 32-127
>>> short 16,2,1 16,2,1 none
>>> int 8,1 32,1 8-31
>>> long 4,1 16,4,1 none
>>>
>>> Especially with the byte arrays, which are used to store
>>> compact Strings, I would doubt that this helps an average
>>> application. For sizes 32-128 you first have a failing
>>> compare, and then step through a '4' loop instead of a '32'
>>> one.
>>>
>>> Further, why don't you use the new instructions in the smaller
>>> loops, too? Like in the long '4' version?
>>>
>>> I could not find a measurement showing the effect on jbb2015
>>> or jvm2008. Did you measure these?
>>> Or you could modify your benchmarks for some small, odd
>>> numbers, as 23, 42, 99 ...?
>>>
>>> Should you also set the prefetching engine more aggressive as
>>> in the short copy loop?
>>>
>>> Best regards,
>>> Goetz.
>>>
>>>> -----Original Message-----
>>>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On
>>>> Behalf Of Michihiro Horie
>>>> Sent: Dienstag, 31. Mai 2016 17:37
>>>> To:hotspot-dev at openjdk.java.net <mailto:hotspot-dev at openjdk.java.net>;
>>>> ppc-aix-port-dev at openjdk.java.net <mailto:ppc-aix-port-dev at openjdk.java.net>
>>>> Subject: RFR(M): 8158232: PPC64: improve byte, int and long array copy
>>>> stubs
>>>> by using VSX instructions
>>>>
>>>>
>>>> Dear all,
>>>>
>>>> Could you please review the following webrev?
>>>>
>>>> http://cr.openjdk.java.net/~mdoerr/8158232_PPC_vsx_copy/webrev.00/
>>>>
>>>> This change improves performance of disjoint arraycopy of byte, int, and
>>>> long by using VSX load/store instructions.
>>>>
>>>> Discussion started from:
>>>> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-
>>>> <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002483.html>_
>>>
>>> _> May/002483.html
>>> <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002483.html>
>>>>
>>>>
>>>> Performance improvement with micro benchmarks is shown in:
>>>> http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-
>>>> <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002531.html>_
>>>
>>> _> May/002531.html
>>> <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002531.html>
>>>>
>>>>
>>>> Thank you very much,
>>>>
>>>> Best regards,
>>>> --
>>>> Michihiro Horie,
>>>> IBM Research - Tokyo
>>>
>>>
>>


More information about the ppc-aix-port-dev mailing list