RFR (M): 8134553: CRC32C implementations for Nehalem x86/amd64 & Westmere+ x86/amd64
Vladimir Kozlov
vladimir.kozlov at oracle.com
Mon Sep 14 17:17:35 UTC 2015
Nice! Let me test it.
I missed in first review - you should not assign to local when you passing parameter:
+ bool supports_clmul;
+ StubRoutines::x86::generate_CRC32C_table(supports_clmul = VM_Version::supports_clmul());
I can fix it myself before push.
Thanks,
Vladimir
On 9/8/15 11:16 AM, Wojtowicz, Tomasz wrote:
> Updated review, including all of the changes you have pointed out in an original e-mail:
> http://cr.openjdk.java.net/~mcberg/8134553/webrev.02/
>
> --
> Thank you,
>
> Tomek
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, September 02, 2015 7:06 PM
> To: Wojtowicz, Tomasz; 'hotspot-compiler-dev at openjdk.java.net'
> Subject: Re: RFR (M): 8134553: CRC32C implementations for Nehalem x86/amd64 & Westmere+ x86/amd64
>
> Nice work Tomek.
> Few things.
>
> Why there are no C1 changes?
> TW>We were interested in speeding up an intrinsic path. I believe it is a similar approach to what ARM and SPARC choose.
>
> templateInterpreter_x86_64.cpp has strange symbols in comments:
> Java� Virtual
> type�long�and�double�have
> TW>I have spotted it earlier - looks like I have uploaded an earlier revision of a file than I originally thought . Fixed.
>
> Code style comments:
> If possible can you not use namespace? You can have named enums and
> local static methods instead.
> TW>I got rid of all namespace CRC32C. Created named enum for chunk sizes.
>
> And macroAssembler_x86.cpp totally goes against our code style.
> I would suggest to move IPL_Alg4 and other local methods into
> MacroAssembler class - 'This->' looks strange here.
> I would also rename all these local methods to have prefix crc32c_.
> TW>Done.
>
> Also align parameters lines to (:
>
> + void IPL_Alg4(INOUT(Register B), uint32_t n,
> + Scratch(Register C), Scratch(Register D),
> Scratch(Register Z),
> + MacroAssembler * This) {
>
>
> + CRC32C::ProcChunk(CRC32C::HIGH, CONSTOrPreCompConstIndex[0],
> CONSTOrPreCompConstIndex[1], IsPclmulqdqSupported,
> + len, buf, crc,
> + A, B, C,
> + AXMM, BXMM, CXMM,
> + D, E, F,
> + this);
> TW>Done.
>
> Instead of Scratch(Register C) use Register tmp1.
> I am fine with IN, INOUT, OUT, Scratch macros but Nehalem() and
> Westmere() is strange to see here - I do not understand their meaning.
> TW>Nehalem and Westmere need different resources (Registers) to obtain carry less multiplication results. In order to not generate separate wrappers I thought about using single functions and then just marked registers which are specific to particular architecture with Westmere(), Nehalem() empty macros. In an effort to aid in understanding of a code. But I could get rid of them easily. Already done.
> I got rid of a IN/INOUT/OUT macros and modified Register names accordingly in/in_out/out
>
> rename CONSTOrPreCompConstInde* to use low case and _.
> TW>Done.
>
> Argument name: IsPclmulqdqSupported --> is_pclmulqdq_supported
> TW>Done
>
> stubRoutines_x86.cpp upper case name are used mostly for constants and
> enums. You have array PCLMULQDQ[] (I would change PCLMULQDQ name too -
> pclmulqdq_table) and next weird definition:
> + #undef CONST
> + static juint x;
> + #define CONST x
> + CONST = PowN[j];
> Keep it local as X_CONST if you want to highlight that it is constant
> for following expression. Simple 'CONST' name is confusing.
>> Both done
>
> Rename crc32c_IPL_Alg2Alt2Fast --> crc32c_IPL_alg2 (to you need alt fast?).
>> I just need Alt2 since Gueron's Alg2 has two alternatives. Done.
>
> Rename GenerateCRC32CTable --> generate_CRC32C_table
> TW>Done
>
> crc32c.h:
> // a) constants table generation
> (C:\Java\jdk9hs-comp\hotspot\src\cpu\x86\vm\stubRoutines_x86.cpp)
> 62
> Use hotspot/src/cpu/x86/vm/stubRoutines_x86.cpp
> TW>Done
>
> Thanks,
> Vladimir
>
> On 8/27/15 11:47 AM, Wojtowicz, Tomasz wrote:
>> I would like to contribute following change:
>>
>> Review details
>>
>> Review Title: CRC32C implementations for Nehalem x86/amd64 &
>> Westmere+ x86/amd64
>>
>> Review ID: #8134553
>>
>> Diff: http://cr.openjdk.java.net/~mcberg/8134553/webrev.01/
>> <http://cr.openjdk.java.net/%7Emcberg/8134553/webrev.01/>
>>
>> Description: Efficient use of a crc32 hardware instruction by
>> division of a problem to a predefined chunks of an increasing size and
>> further by 3 to be computed hiding instruction latencies. x86 delivers
>> up to 8x improvement vs. java library, amd64 stops at even more -> 16x.
>>
>> Performance data are attached to this
>> message for your convenience.
>>
>> No regressions has been observed on
>> hotspot/compiler x86_64.
>>
>> Link: https://bugs.openjdk.java.net/browse/JDK-8134553
>>
>> Author: Tomasz, Wojtowicz
>>
>> --
>>
>> Thank you,
>>
>> Tomek
>>
More information about the hotspot-compiler-dev
mailing list