PPC64 VSX load/store instructions in stubs

Tue Apr 5 17:23:54 UTC 2016

Hi Gustavo,

thanks a lot for your contribution.

Can you please describe if you've run benchmarks and which performance
improvements you saw?

With your change if we're running on Power 8, we will only use the
fast path for arrays with at least 32 elements. For smaller arrays, we
will fall-back to copying only 2 elements at a time which will be
slower than the initial version which copied 4 at a time in that case.

Did you verified your changes on both, little and big endian?

And what about unaligned memory accesses? As far as I read,
lxvd2x/stxvd2x still work, but may be slower. I saw there also exist
instructions for aligned load/stores. Would it make sens
(performance-wise) to use them for the cases where we can be sure that
we have aligned memory accesses?

Thank you and best regards,
Volker

On Fri, Apr 1, 2016 at 10:36 PM, Gustavo Romero
<gromero at linux.vnet.ibm.com> wrote:
> Hi Martin, Hi Volker
>
> Currently VSX load/store instructions are not being used in PPC64 stubs,
> particularly in arraycopy stubs inside generate_arraycopy_stubs() like,
> but not limited to, generate_disjoint_{byte,short,int,long}_copy.
>
> We can speed up mass copy using VSX (Vector-Scalar Extension) load/store
> instruction in processors >= POWER8, the same way it's already done for
> libc memcpy().
>
> This is an initial patch just for jshort_disjoint_arraycopy() VSX vector
> load/store:
>
> http://81.de.7a9f.ip4.static.sl-reverse.com/202539/webrev
>
> What are your thoughts on that? Is there any impediment to use VSX
> instructions in OpenJDK at the moment?
>
> Thank you.
>
> Best regards,
> Gustavo
>