RFR (M): 8134553: CRC32C implementations for Nehalem x86/amd64 & Westmere+ x86/amd64

Vladimir Kozlov vladimir.kozlov at oracle.com
Wed Sep 16 18:53:05 UTC 2015


On 9/16/15 11:09 AM, Wojtowicz, Tomasz wrote:
> Vladimir,
> 1. As for the jcheck reported errors. Would it be acceptable to check into the main directory of a repository an .editorconfig file (from http://editorconfig.org/ project):
> 	root = true
> 	
> 	[*]
> 	indent_style = space
> 	indent_size = 2
> 	end_of_line = lf
> 	charset = utf-8
> 	trim_trailing_whitespace = true
> 	insert_final_newline = true
> so the "work" would be done during development without any visible interaction.
> There are plugins available for download for the most of the editors currently seen in a wild.

I assume you want to put it into hotspot/ directory.
I am fine with that but it should be separate RFE and reviewed on hotspot-dev at openjdk.java.net

>
> 2. I have fixed:
> 	a) jcheck reported issues
> 	b) "assign to local when you passing parameter"
> 	c) modified CRC32C::{some named enum element} to CRC32C_{some anonymous enum element}
> 	d) 1 back merge issue which I haven't spotted earlier
> both 32/64 bit compilation with GCC is now passing (MS compilers actually never complained).
> New webrev.03:
> http://cr.openjdk.java.net/~mcberg/8134553/webrev.03/

This looks good to me.

Thanks,
Vladimir

>
> --
> Tomek
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Monday, September 14, 2015 11:18 AM
> To: hotspot-compiler-dev at openjdk.java.net
> Cc: Wojtowicz, Tomasz
> Subject: Re: RFR (M): 8134553: CRC32C implementations for Nehalem x86/amd64 & Westmere+ x86/amd64
>
> Tests found:
>
> src/cpu/x86/vm/macroAssembler_x86.cpp:8686: Trailing whitespace
> src/cpu/x86/vm/macroAssembler_x86.hpp:1330: Trailing whitespace
> src/cpu/x86/vm/stubGenerator_x86_32.cpp:3001: Trailing whitespace
> src/cpu/x86/vm/stubGenerator_x86_64.cpp:3968: Trailing whitespace
> src/cpu/x86/vm/stubRoutines_x86.cpp:136: Trailing whitespace
> src/share/vm/interpreter/interpreter.cpp:594: Carriage return (^M)
> src/share/vm/interpreter/templateInterpreter.cpp:426: Carriage return (^M)
> src/cpu/x86/vm/crc32c.h:31: Trailing whitespace
>
> Please, fix.
>
> Also build failed (make sure you build 32- and 64-bit):
>
> hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp: In member function 'void MacroAssembler::crc32c_ipl_alg2_alt2(Register,
> Register, Register, Register, Register, Register, Register, Register, Register, XMMRegister, XMMRegister, XMMRegister,
> bool)':
> hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp:9033:42: error: 'CRC32C' is not a class or namespace
>      uint32_t const_or_pre_comp_const_index[CRC32C::NUM_PRECOMPUTED_CONSTANTS];
>                                             ^
> hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp:9040:5: error: 'const_or_pre_comp_const_index' was not declared in this scope
>        const_or_pre_comp_const_index[1] = *(uint32_t *)StubRoutines::_crc32c_table_addr;
>        ^
> hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp:9049:5: error: 'const_or_pre_comp_const_index' was not declared in this scope
>        const_or_pre_comp_const_index[0] = 1;
>        ^
> hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp:9058:21: error: 'CRC32C' is not a class or namespace
>      crc32c_proc_chunk(CRC32C::HIGH, const_or_pre_comp_const_index[0], const_or_pre_comp_const_index[1], is_pclmulqdq_supported,
>                        ^
> hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp:9058:35: error: 'const_or_pre_comp_const_index' was not declared in this scope
>      crc32c_proc_chunk(CRC32C::HIGH, const_or_pre_comp_const_index[0], const_or_pre_comp_const_index[1], is_pclmulqdq_supported,
>                                      ^
> hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp:9064:21: error: 'CRC32C' is not a class or namespace
>      crc32c_proc_chunk(CRC32C::MIDDLE, const_or_pre_comp_const_index[2], const_or_pre_comp_const_index[3], is_pclmulqdq_supported,
>                        ^
> hotspot/src/cpu/x86/vm/macroAssembler_x86.cpp:9070:21: error: 'CRC32C' is not a class or namespace
>      crc32c_proc_chunk(CRC32C::LOW, const_or_pre_comp_const_index[4], const_or_pre_comp_const_index[5], is_pclmulqdq_supported,
>
> Thanks,
> Vladimir
>
> On 9/14/15 10:17 AM, Vladimir Kozlov wrote:
>> Nice! Let me test it.
>>
>> I missed in first review - you should not assign to local when you passing parameter:
>>
>> +      bool supports_clmul;
>> +      StubRoutines::x86::generate_CRC32C_table(supports_clmul =
>> + VM_Version::supports_clmul());
>>
>> I can fix it myself before push.
>>
>> Thanks,
>> Vladimir
>>
>> On 9/8/15 11:16 AM, Wojtowicz, Tomasz wrote:
>>> Updated review, including all of the changes you have pointed out in an original e-mail:
>>> http://cr.openjdk.java.net/~mcberg/8134553/webrev.02/
>>>
>>> --
>>> Thank you,
>>>
>>> Tomek
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Wednesday, September 02, 2015 7:06 PM
>>> To: Wojtowicz, Tomasz; 'hotspot-compiler-dev at openjdk.java.net'
>>> Subject: Re: RFR (M): 8134553: CRC32C implementations for Nehalem
>>> x86/amd64 & Westmere+ x86/amd64
>>>
>>> Nice work Tomek.
>>> Few things.
>>>
>>> Why there are no C1 changes?
>>> TW>We were interested in speeding up an intrinsic path. I believe it is a similar approach to what ARM and SPARC choose.
>>>
>>> templateInterpreter_x86_64.cpp has strange symbols in comments:
>>> Java  Virtual
>>> type long and double have
>>> TW>I have spotted it earlier - looks like I have uploaded an earlier revision of a file than I originally thought .
>>> Fixed.
>>>
>>> Code style comments:
>>> If possible can you not use namespace? You can have named enums and
>>> local static methods instead.
>>> TW>I got rid of all namespace CRC32C. Created named enum for chunk sizes.
>>>
>>> And macroAssembler_x86.cpp totally goes against our code style.
>>> I would suggest to move IPL_Alg4 and other local methods into
>>> MacroAssembler class - 'This->' looks strange here.
>>> I would also rename all these local methods to have prefix crc32c_.
>>> TW>Done.
>>>
>>> Also align parameters lines to (:
>>>
>>> +  void IPL_Alg4(INOUT(Register B), uint32_t n,
>>> +                Scratch(Register C), Scratch(Register D),
>>> Scratch(Register Z),
>>> +                MacroAssembler * This) {
>>>
>>>
>>> +  CRC32C::ProcChunk(CRC32C::HIGH, CONSTOrPreCompConstIndex[0],
>>> CONSTOrPreCompConstIndex[1], IsPclmulqdqSupported,
>>> +                   len, buf, crc,
>>> +                   A, B, C,
>>> +                   AXMM, BXMM, CXMM,
>>> +                   D, E, F,
>>> +                   this);
>>> TW>Done.
>>>
>>> Instead of Scratch(Register C) use Register tmp1.
>>> I am fine with IN, INOUT, OUT, Scratch macros but Nehalem() and
>>> Westmere() is strange to see here - I do not understand their meaning.
>>> TW>Nehalem and Westmere need different resources (Registers) to
>>> TW>obtain carry less multiplication results. In order to
>>> not generate separate wrappers I thought about using single functions
>>> and then just marked registers which are specific to particular
>>> architecture with Westmere(), Nehalem() empty macros. In an effort to aid in understanding of a code. But I could get rid of them easily. Already done.
>>> I got rid of a IN/INOUT/OUT macros and modified Register names
>>> accordingly in/in_out/out
>>>
>>> rename CONSTOrPreCompConstInde* to use low case and _.
>>> TW>Done.
>>>
>>> Argument name: IsPclmulqdqSupported --> is_pclmulqdq_supported
>>> TW>Done
>>>
>>> stubRoutines_x86.cpp upper case name are used mostly for constants
>>> and enums. You have array PCLMULQDQ[] (I would change PCLMULQDQ name
>>> too -
>>> pclmulqdq_table) and next weird definition:
>>> +  #undef CONST
>>> +  static juint x;
>>> +  #define CONST x
>>> +      CONST = PowN[j];
>>> Keep it local as X_CONST if you want to highlight that it is constant
>>> for following expression. Simple 'CONST' name is confusing.
>>>> Both done
>>>
>>> Rename  crc32c_IPL_Alg2Alt2Fast --> crc32c_IPL_alg2 (to you need alt fast?).
>>>> I just need Alt2 since Gueron's Alg2 has two alternatives. Done.
>>>
>>> Rename GenerateCRC32CTable --> generate_CRC32C_table
>>> TW>Done
>>>
>>> crc32c.h:
>>>     // a) constants table generation
>>> (C:\Java\jdk9hs-comp\hotspot\src\cpu\x86\vm\stubRoutines_x86.cpp)
>>>      62
>>> Use hotspot/src/cpu/x86/vm/stubRoutines_x86.cpp
>>> TW>Done
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 8/27/15 11:47 AM, Wojtowicz, Tomasz wrote:
>>>> I would like to contribute following change:
>>>>
>>>> Review details
>>>>
>>>> Review Title:      CRC32C implementations for Nehalem x86/amd64 &
>>>> Westmere+ x86/amd64
>>>>
>>>> Review ID:          #8134553
>>>>
>>>> Diff: http://cr.openjdk.java.net/~mcberg/8134553/webrev.01/
>>>> <http://cr.openjdk.java.net/%7Emcberg/8134553/webrev.01/>
>>>>
>>>> Description:        Efficient use of a crc32 hardware instruction by
>>>> division of a problem to a predefined chunks of an increasing size
>>>> and further by 3 to be computed hiding instruction latencies. x86
>>>> delivers up to 8x improvement vs. java library, amd64 stops at even more -> 16x.
>>>>
>>>>                                    Performance data are attached to
>>>> this message for your convenience.
>>>>
>>>>                                    No regressions has been observed
>>>> on hotspot/compiler x86_64.
>>>>
>>>> Link: https://bugs.openjdk.java.net/browse/JDK-8134553
>>>>
>>>> Author:                Tomasz, Wojtowicz
>>>>
>>>> --
>>>>
>>>> Thank you,
>>>>
>>>> Tomek
>>>>


More information about the hotspot-compiler-dev mailing list