RFR (L): 7088419 : Use x86 Hardware CRC32 Instruction with java.util.zip.CRC32 and java.util.zip.Adler32

Christian Thalinger christian.thalinger at oracle.com
Wed Jun 26 12:47:31 PDT 2013


On Jun 26, 2013, at 12:18 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:

> On 6/26/13 11:17 AM, Christian Thalinger wrote:
>> For Doxygen parameters you should use @param:
>> 
>> http://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdparam
>> 
>> If we have a Doxygen method comment we don't need comments in header files:
> 
> Doxygen compliance is become a burden for me. I am fine with using '/**' comments but using @param and other Doxygen's syntax is going too far for me. How many people use Doxygen with our sources?

You don't have to do it.  What I was saying is that if you already make the effort to document parameters why not use the correct keywords for it.

-- Chris

> 
> Regards,
> Vladimir
> 
>> 
>> +   // CRC32 code for java.util.zip.CRC32::updateBytes() instrinsic.
>> +   void update_byte_crc32(Register crc, Register val, Register table);
>> +   void kernel_crc32(Register crc, Register buf, Register len, Register table, Register tmp);
>> +   // Fold 128-bit data chunk
>> +   void fold_128bit_crc32(XMMRegister xcrc, XMMRegister xK, XMMRegister xtmp, Register buf, int offset);
>> +   void fold_128bit_crc32(XMMRegister xcrc, XMMRegister xK, XMMRegister xtmp, XMMRegister xbuf);
>> +   // Fold 8-bit data
>> +   void fold_8bit_crc32(Register crc, Register table, Register tmp);
>> +   void fold_8bit_crc32(XMMRegister crc, Register table, XMMRegister xtmp, Register tmp);
>> 
>> src/share/vm/prims/jvm.cpp:
>> 
>> +   PUTPROP(props, "sun.zip.clmulSupported", X86_ONLY(!UseCRC32Intrinsics && UseAVX && UseCLMUL ? "true" :) "false");
>> 
>> When you say "No changes to JDK: this is all in the compiler." does it mean this review has no JDK changes or are there no JDK changes at all?
>> 
>> src/cpu/x86/vm/templateInterpreter_x86_32.cpp:
>> 
>> I thought I've changed this already but apparently I didn't:
>> 
>>       case Interpreter::java_lang_math_exp     : entry_point = ((InterpreterGenerator*)this)->generate_math_entry(kind);     break;
>> 
>> Can we please add a local variable for (InterpreterGenerator*)this (on all architectures)?
>> 
>> src/share/vm/opto/library_call.cpp:
>> 
>> +   crc = _gvn.transform( new (C) URShiftINode(crc, intcon(8)) );
>> +   result = _gvn.transform( new (C) XorINode(crc, result) );
>> +   result = _gvn.transform( new (C) XorINode(result, M1) );
>> 
>> Can we get rid of the spaces after ( and before )?  It's probably from copy-paste.
>> 
>> -- Chris
>> 
>> On Jun 26, 2013, at 10:52 AM, David Chase <david.r.chase at oracle.com> wrote:
>> 
>>> Bug: There is a lovely instruction available on late-model Intel chips
>>> that can be used to accelerate some CRC calculations.
>>> 
>>> Fix: Enhance the compiler to have intrinsics for the CRC methods
>>> that use this instruction, where appropriate.
>>> 
>>> Testing:
>>> Hand testing on Sparc (to ensure no harm was done)
>>> and x86 (to ensure no harm was done, and that it was faster)
>>> plus with JPRT across a range of x86 (32/64, Windows/MacOS/Linux/Solaris)
>>> and other targets to be sure all was well.
>>> 
>>> This is also all based on C code that had already been well-tested.
>>> 
>>> Webrev: http://cr.openjdk.java.net/~drchase/7088419/webrev.05/
>>> No changes to JDK: this is all in the compiler.
>>> It does not interfere with the possibility of parallelizing CRC32 and Adler32,
>>> but that seemed like a tall order for a compiler intrinsic.
>>> 
>>> Guide to the webrev changes:
>>> 
>>>    - Add new instructions to assembler
>>>        - src/cpu/x86/vm/assembler_x86.hpp (declare entrypoints:
>>>          movdqa, pinsrd, pinsrq, pextrd, pextrq, vpclmulqdq)
>>>        - src/cpu/x86/vm/assembler_x86.cpp (emit the bits)
>>>    - Add new operations to macro-assembler
>>>        - src/cpu/x86/vm/macroAssembler_x86.cpp (this includes the
>>>          truly "macro" instructions update_byte_crc32,
>>>          fold_128bit_crc32, fold_8bit_crc32, kernel_crc32)
>>>        - src/cpu/x86/vm/macroAssembler_x86.hpp
>>>    - Add flag and feature checks
>>>        - src/cpu/x86/vm/globals_x86.hpp (arch-specific UseCLMUL)
>>>        - src/share/vm/runtime/globals.hpp (UseCRC32Intrinsics  flag
>>>          and default value)
>>>        - src/cpu/x86/vm/vm_version_x86.hpp (pull feature from cpu info)
>>>        - src/cpu/x86/vm/vm_version_x86.cpp (process flag, print
>>>          feature)
>>>        - src/share/vm/prims/jvm.cpp (set property for jdk lib -- is
>>>          this needed for crc?)
>>>    - Register intrinsic
>>>        - src/share/vm/classfile/vmSymbols.hpp (declare it with
>>>          signature etc)
>>>    - Write stub data (CRC constants and tables)
>>>        - src/share/vm/runtime/stubRoutines.{c,h}pp (declaration of
>>>          statics and accessors for crc32 generated method and
>>>          constants).
>>>        - src/cpu/x86/vm/stubRoutines_x86.{c,h}pp (what are
>>>          _verify_mxcsr_entry and _key_shuffle_mask_addr ? -- answer,
>>>          these are refactored from 32/64-specific files, see below.)
>>>        - src/cpu/x86/vm/stubRoutines_x86_{32,64}.{c,h}pp (code
>>>          refactored into common stubRoutines.{c,h}pp)
>>>        - src/cpu/x86/vm/stubGenerator_x86_{32,64}.cpp (write out
>>>          generated stub procedure, initialize pointer to method and
>>>          data)
>>>    - Add stub call substitution to interpreter
>>>        - src/share/vm/interpreter/templateInterpreter.cpp (declare
>>>          method entrypoints)
>>>        - src/cpu/x86/vm/interpreterGenerator_x86.hpp,
>>>          src/cpu/x86/vm/templateInterpreter_x86_{32,64}.cpp
>>>          (declaration and definition of 32/64-specific glue to call
>>>          the intrinsic stub routine)
>>>        - src/share/vm/interpreter/interpreter.cpp (enum lookup and
>>>          debugging output)
>>>        - src/share/vm/interpreter/abstractInterpreter.hpp (extend
>>>          enumeration of method kinds)
>>>    - Add to C1
>>>        - src/share/vm/c1/c1_LIR.{c,h}pp  (declare new nodes
>>>          LIR_OpUpdateCRC32)
>>>        - src/cpu/sparc/vm/c1_LIRGenerator_sparc.cpp (unimplemented
>>>          stubs for unsupported arch)
>>>        - src/cpu/sparc/vm/c1_LIRAssembler_sparc.cpp (unimplemented
>>>          stubs for unsupported arch)
>>>        - src/share/vm/c1/c1_GraphBuilder.cpp (try inline intrinsic)
>>>        - src/share/vm/c1/c1_Runtime1.cpp (stub routine looker-upper)
>>>        - src/share/vm/c1/c1_LIRGenerator.{c,h}pp (Add case to
>>>          do_Intrinsic for one-byte update)
>>>        - src/cpu/x86/vm/c1_LIRGenerator_x86.cpp (Add cases for x86 CRC
>>>          updates -- byte, array, buffer)
>>>        - src/cpu/x86/vm/c1_LIRAssembler_x86.cpp (Add emit_updatecrc32
>>>          for the the one-byte case from new node).
>>>        - src/share/vm/c1/c1_LIRAssembler.hpp (Add decl for
>>>          emit_updatecrc32)
>>>    - Add to C2
>>>        - src/share/vm/opto/runtime.{c,h}pp (new intrinsic function has
>>>          a type; add calls to generate that type).
>>>        - src/share/vm/opto/escape.cpp (add special case for escape
>>>          analysis assertion)
>>>        - src/share/vm/opto/library_call.cpp (substitution of inline
>>>          glue code to intrinsic stub call).
>>> 
>> 



More information about the hotspot-compiler-dev mailing list