RFR(L): 8198894: [PPC64] More generic vector CRC implementation
Doerr, Martin
martin.doerr at sap.com
Fri Mar 2 14:42:12 UTC 2018
Hi,
I just noticed that kernel_crc32_1word_vpmsum can be simplified a little more.
kernel_crc32_1word can be used for the tail so we don't need to generate it separately for the small case.
This is also better for my new implementation which leaves up to 255 bytes remaining.
I also noticed that the unroll factor of 4096 seems to be too large. Half of it results in rather better performance.
I got up to 42 GB/s, now.
New webrev with these 2 minor changes:
http://cr.openjdk.java.net/~mdoerr/8198894_PPC64_CRC32/webrev.01
Best regards,
Martin
From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
Sent: Freitag, 2. März 2018 11:55
To: Doerr, Martin <martin.doerr at sap.com>
Cc: Lindenmaier, Goetz <goetz.lindenmaier at sap.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; Hiroshi H Horii (HORII at jp.ibm.com) <HORII at jp.ibm.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
Subject: Re: RFR(L): 8198894: [PPC64] More generic vector CRC implementation
Hi Martin,
I double checked performance with our micro benchmark.
This change was 5 times faster. In addition, I did not observe degradation with smaller length but have almost equal performance.
Best regards,
--
Michihiro,
IBM Research - Tokyo
[Inactive hide details for "Doerr, Martin" ---2018/03/02 00:49:56---Hi, I have implemented a more generic version of the vector]"Doerr, Martin" ---2018/03/02 00:49:56---Hi, I have implemented a more generic version of the vector instruction based CRC code.
From: "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>
To: "'hotspot-compiler-dev at openjdk.java.net'" <hotspot-compiler-dev at openjdk.java.net<mailto:hotspot-compiler-dev at openjdk.java.net>>
Cc: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com<mailto:goetz.lindenmaier at sap.com>>, "Hiroshi H Horii (HORII at jp.ibm.com<mailto:HORII at jp.ibm.com>)" <HORII at jp.ibm.com<mailto:HORII at jp.ibm.com>>, "Michihiro Horie (HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>)" <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>>, Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>
Date: 2018/03/02 00:49
Subject: RFR(L): 8198894: [PPC64] More generic vector CRC implementation
________________________________
Hi,
I have implemented a more generic version of the vector instruction based CRC code.
It supports CRC32C and Big Endian, too.
The peak performance was even better for large input streams. I got almost 40GB/s.
Some smaller length may be slower than with the old version.
Maybe somebody from IBM would like to double-check performance.
Please review:
http://cr.openjdk.java.net/~mdoerr/8198894_PPC64_CRC32/webrev.00/<https://urldefense.proofpoint.com/v2/url?u=http-3A__cr.openjdk.java.net_-7Emdoerr_8198894-5FPPC64-5FCRC32_webrev.00_&d=DwMFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oecsIpYF-cifqq2i1JEH0Q&m=XoFQj6pl5KM4yrslTVuMvukjPfiyiw1vyK-FHCveKpc&s=VBCFi423koKhU902olYdRrAWTLdINLCJg_2BNXKfysE&e=>
(hoping you like math :))
Best regards,
Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180302/8bca057b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 105 bytes
Desc: image001.gif
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20180302/8bca057b/image001.gif>
More information about the hotspot-compiler-dev
mailing list