review request: add intrinsics to use AES instructions

Fri Jul 13 16:50:39 PDT 2012

Please review the following webrev which adds intrinsic support to
allow some of the com/sun/crypto/provider methods to use AES
instructions when a processor supports such instructions.

http://cr.openjdk.java.net/~tdeneau/aes-intrinsics/webrev.01/

I do not have a bug number for this change but a description would be
something like the following:

   Modern x86 processors have AES instructions to accelerate AES
   encryption and decryption but Hotspot does not have a way to
   generate such instructions. There is a way to hook in a native
   crypto library using PKCS11 and there are a few native libraries
   that support hardware AES instructions. However, these native
   PKCS11 libraries

      * do not scale well with multiple threads
      * are not supported on all platforms, for instance Hotspot does
        not have PKCS11 support on 64-bit Windows.
      * can be confusing to configure. 

Since this webrev adds intrinsic support for the default
com/sun/crypto/provider classes, they are supported on all platforms
and there is no additional configuration required. Measurements have
shown that they scale very well will multiple threads.

The rest of this mail describes the scope of the intrinsics and
summarizes the source file changes.

-- Tom Deneau

Scope of the Intrinsics
-----------------------
When creating a cipher the application specifies a "transformation"
consisting of "algorithm/mode/padding". For more details see
http://docs.oracle.com/javase/7/docs/api/javax/crypto/Cipher.html

   * These intrinsics kick in only when the algorithm part is "AES". A
     single block in AES is always 16 bytes and there are intrinsics
     for encrypting or decrypting a single block. These single-block
     intrinsics can work with any mode that uses AES and with any of
     the three AES key sizes (128, 192 or 256 bit).

   * A more optimized multi-block intrinsic can kick in if the
     algorithm/mode is "AES/CBC" (Cipher Block Chaining). Again all
     three AES key sizes are supported. There is no technical reason
     why we couldn't do multi-block intrinsics for the other modes
     (eg, ECB) but I want to get some feedback from the reviewers on
     the implementation before charging off on this path.

   * The padding part is handled by java routines outside of these
     intrinsics.

Summary of Changes
------------------
src/cpu/x86/vm/assembler_x86.cpp, hpp
   Defined the aes instructions which are used by the stub routines.

src/cpu/x86/vm/stubGenerator_x86_64.cpp,
   Actual stub code for the aes intrinsics. As described earlier there
   are both single-block and multi-block intrinsic stubs.

   Note that the stubs make use of the "expanded key" which gets
   created each time the key changes. The expanded key is used by both
   the java code and the intrinsic AES instructions.

   The java code stores the "expanded key" in big-endian 32-bit
   integers. The x86 AES instructions require the expanded key to be
   in little-endian 128-bit words. Hence the pshufb instructions to
   get the key into the little-endian format

src/cpu/x86/vm/vm_version_x86.cpp, hpp
   Detect and store the aes capability bit in cpuid. A global boolean
   command line flag UseAES can be used to turn off AES even if the
   hardware supports it.

src/share/vm/classfile/vmSymbols.hpp
src/share/vm/opto/runtime.cpp, hpp
   The usual definitions of class names, method names and signatures
   for the java methods that are being intrinsified and the signatures
   for the stubs

src/share/vm/oops/methodOop.cpp
   Up until now, every intrinsic was replacing a routine that was
   loaded by the "default" (NULL) class  loader.
   com/sun/crypto/provider is not loaded by the default class
   loader so we had to add a check here.

src/share/vm/opto/escape.cpp
   escape analysis knows about certain stubs, but if it sees a leaf
   stub it also checks against a predefined list. So the new intrinsic
   names were added to the list.

src/share/vm/opto/library_call.cpp
src/share/vm/opto/callGenerator.cpp
src/share/vm/opto/doCall.cpp

   The main logic for building up the calls to the stubs at compile
   time, assuming the platform has a stub and the global flags have
   not turned these intrinsics off.

   A new helper routine to load a field from an object was added since
   we ended up loading fields in a few places.

   For best performance, we wanted to hook into the multi-block
   encrypt and decrypt methods such as in CipherBlockChaining.java.
   This code is not AES-specific but handles CBC mode for any
   algorithm. (The algorithm part is handled by the enclosed
   "embeddedCipher" object).

   Thus at runtime we want to do the equivalent of an instanceof check
   on embeddedCipher and either call the stub (if it is AESCrypt) or
   call the original java code (if it is some other algorithm
   type). For the CipherBlockChaining.decrypt there is a further
   runtime check that the source and destination are not the same
   array which, because of the way CBC works would require cloning the
   source (cipher).

   Vladimir added some infrastructure to generate predicated
   intrinsics to solve the above problem. A particular intrinsic need
   only specify that it is predicated, and generate the particular
   guard node which if false will take the Java path. This
   infrastructure can be used for future intrinsics that have to make
   such a runtime choice. These changes from Vladimir are in
   callGenerator.cpp, doCall.cpp, and a small bit in library_call.cpp.

src/share/vm/runtime/globals.hpp
   global flags were added to
      * turn off either AES encryption or AES decryption intrinsics separately
      * turn off the multi-block CBC/AES intrinsics. 

   By default all of the above are on. These are really there for
   testing, for example one could encrypt using Java and decrypt using
   the intrinsics.

   Also, a UseAES flag to ignore the hardware capability as described above.