RFR (S): JDK-8191328: Avoid unnecessary overhead in CRC32C
Dmitry Chuyko
dmitry.chuyko at bell-sw.com
Thu Nov 16 16:42:01 UTC 2017
On 11/15/2017 09:44 PM, Andrew Haley wrote:
> On 15/11/17 18:38, Vitaly Davidovich wrote:
>> On Wed, Nov 15, 2017 at 12:40 PM, Andrew Haley <aph at redhat.com> wrote:
>>> On 15/11/17 15:38, Alan Bateman wrote:
>>>> Moving the nativeOrder out of the loop make sense but I'm curious about
>>>> the context for improving this implementation.
>>> I wonder about lifting ByteOrder.nativeOrder(). Maybe it fails to
>>> inline because the method is too large: if that happens, we really
>>> lose. I'm not seeing that, though: it seems to be inlined just fine,
>>> and has no effect.
Sure, it is the effect of missing inlining. But you can relatively
easily break it by your tiered JIT settings. Not sure about AOT. Like
(in Hotspot):
-XX:-Inline, -XX:MaxInlineLevel=0 (no wonder to meet this one in wild),
-XX:FreqInlineSize=3, -XX:InlineSmallCode=15..
>>>
>>> In any case, this patch doesn't help anything on my test hardware.
>> Is this with -Xcomp though? That can generate crap code because
>> there's no profiling information. Not that -Xcomp should be the way
>> to test peak performance IMO, but that is the setting that was used I
>> believe.
Another noticeable case is -Xint where absolute times of CRC calculation
are quite long.
Here is a benchmark that is easier to experiment with (no need to build
jdk or to turn off intrinsics):
http://cr.openjdk.java.net/~dchuyko/8191328/CRC32CAltBench.java
Some x86 results:
default tiered
before 380.957 ± 11.621 ns/op
after 350.838 ± 5.149 ns/op
-XX:MaxInlineLevel=0
before 656.791 ± 8.216 ns/op
after 340.999 ± 2.686 ns/op
-Xint
before 36113.441 ± 197.716 ns/op
after 26928.593 ± 133.309 ns/op
-Dmitry
> Shrug; maybe. We shouldn't mess the code up for -Xcomp.
>
More information about the core-libs-dev
mailing list