GCM performance and Unsafe byte array accesses

Anthony Scarpino anthony.scarpino at oracle.com
Tue Sep 1 15:42:09 UTC 2015


Hi Andrew,

Does your alignment changes affect x86 only or should this help all architectures?

In general I don't see a disadvantage and that it could be expanded to other places in crypto too. But I have think about the effects on sparc, so that would need to be tested.  Right now the sparc intrinsic does alignment checking and realigning, so it would be interesting to see if ByteArrays performed better than the intrinsic alignment.   I assume you don't have the hardware to test sparc, right?

Tony

> On Aug 27, 2015, at 10:51 AM, Andrew Haley <aph at redhat.com> wrote:
> 
> I've been looking at the performance of AES/GCM.  The profile is quite
> surprising:
> 
> samples  cum. samples  %        cum. %  symbol name
> 476009   476009        36.7033  36.7033 aescrypt_encryptBlock
> 297239   773248        22.9190  59.6224 ghash_processBlocks
> 195334   968582        15.0615  74.6839 int com.sun.crypto.provider.GCTR.doFinal(byte[], int, int, byte[], int)
> 
> I would have expected aescrypt_encryptBlock and ghash_processBlocks to
> be very high, but that GCTR.doFinal is so high is rather surprising:
> all it has to do is increment a counter, call aescrypt_encryptBlock,
> and xor the result with the plaintext.
> 
> The problem seems to be due to byte accesses in GCTR.doFinal() and
> GaloisCounterMode.update().  Earlier this year I wrote a bunch of new
> Unsafe.get/put-XX-Unaligned methods, and these are ideal for bulk
> accesses to byte arrays.  So, as an experiment I wrote some methods to
> do array accesses and used them to speed up GCM, with this result:
> 
> 492274   492274        40.8856  40.8856    13256.jo                 aescrypt_encryptBlock
> 298185   790459        24.7656  65.6512    13256.jo                 ghash_processBlocks
> 86325    876784         7.1697  72.8209    13256.jo                 int com.sun.crypto.provider.GCTR.update(byte[], int, int, byte[], int)
> 
> GCTR.update() is twice as fast as it was, and overall the performance
> of AES/GCM is 10% better.
> 
> The changes to the GCM code are quite minor:
> 
> diff -r 6940407d544a src/java.base/share/classes/com/sun/crypto/provider/GCTR.java
> --- a/src/java.base/share/classes/com/sun/crypto/provider/GCTR.java     Thu Aug 20 07:36:37 2015 -0700
> +++ b/src/java.base/share/classes/com/sun/crypto/provider/GCTR.java     Thu Aug 27 16:17:25 2015 +0100
> @@ -94,11 +97,12 @@
>         int numOfCompleteBlocks = inLen / AES_BLOCK_SIZE;
>         for (int i = 0; i < numOfCompleteBlocks; i++) {
>             aes.encryptBlock(counter, 0, encryptedCntr, 0);
> -            for (int n = 0; n < AES_BLOCK_SIZE; n++) {
> -                int index = (i * AES_BLOCK_SIZE + n);
> -                out[outOfs + index] =
> -                    (byte) ((in[inOfs + index] ^ encryptedCntr[n]));
> -            }
> +            int index = i * AES_BLOCK_SIZE;
> +            ByteArrays.putLongs(out, outOfs + index,
> +                                (ByteArrays.getLong(in, inOfs + index + 0) ^
> +                                 ByteArrays.getLong(encryptedCntr, 0)),
> +                                (ByteArrays.getLong(in, inOfs + index + 8) ^
> +                                 ByteArrays.getLong(encryptedCntr, 8)));
>             GaloisCounterMode.increment32(counter);
>         }
>         return inLen;
> diff -r 6940407d544a src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java
> --- a/src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java        Thu Aug 20 07:36:37 2015 -0700
> +++ b/src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java        Thu Aug 27 16:17:25 2015 +0100
> @@ -82,11 +82,8 @@
>             // should never happen
>             throw new ProviderException("Illegal counter block length");
>         }
> -        // start from last byte and only go over 4 bytes, i.e. total 32 bits
> -        int n = value.length - 1;
> -        while ((n >= value.length - 4) && (++value[n] == 0)) {
> -            n--;
> -        }
> +        int counter = ByteArrays.getInt(value, value.length - 4, true);
> +        ByteArrays.putInt(value, value.length - 4, counter + 1, true);
>     }
> 
>     // ivLen in bits
> 
> I've attached the full diff.
> 
> So, here's my question: there are many places over the crypto code
> base where we could take advantage of this.  Do you think it makes
> sense to make changes like this?  I can't see any major disadvantages,
> and it's a considerable performance improvement.
> 
> Andrew.
> 
> 
> <GCTR.diffs>



More information about the security-dev mailing list