[10] RFR (S): 8189177 - AARCH64: Improve _updateBytesCRC32C intrinsic
White, Derek
Derek.White at cavium.com
Tue Nov 7 22:34:30 UTC 2017
Hi Dmitry,
This looks good!
Thanks,
- Derek
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-
> bounces at openjdk.java.net] On Behalf Of Dmitry Chuyko
> Sent: Thursday, November 02, 2017 5:07 PM
> To: hotspot-compiler-dev at openjdk.java.net
> Subject: Re: [10] RFR (S): 8189177 - AARCH64: Improve _updateBytesCRC32C
> intrinsic
>
> Similar to CRC32 I added private
> MacroAssembler::kernel_crc32c_using_crc32c().
>
> webrev: http://cr.openjdk.java.net/~dchuyko/8189177/webrev.01/
>
> -Dmitry
>
>
> On 10/20/2017 08:45 PM, Dmitry Chuyko wrote:
> > Hello,
> >
> > Please review an improvement of CRC32C calculation on AArch64. It is
> > done pretty similar to a change for JDK-8189176 described in [1].
> >
> > MacroAssembler::kernel_crc32c gets unused table registers. They can be
> > used to make neighbor loads and CRC calculations independent. Adding
> > prologue and epilogue for main by-64 loop makes it applicable starting
> > from len=128 so additional by-32 loop is added for smaller lengths.
> >
> > rfe: https://bugs.openjdk.java.net/browse/JDK-8189177
> > webrev: http://cr.openjdk.java.net/~dchuyko/8189177/webrev.00/
> > benchmark:
> > http://cr.openjdk.java.net/~dchuyko/8189177/crc32c/CRC32CBench.java
> >
> > Results for T88 and A53 [2] are similar to CRC32 change (good), but
> > again splitting pair loads may slow down other CPUs so measurements on
> > different HW are welcome.
> >
> > -Dmitry
> >
> > [1]
> > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-Octob
> > er/027225.html
> > [2]
> > https://bugs.openjdk.java.net/browse/JDK-
> 8189177?focusedCommentId=1412
> > 4535&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel#comment-14124535
> >
More information about the hotspot-compiler-dev
mailing list