[Semi-]off-line encryption

Thu Sep 29 17:25:23 UTC 2016

On 29/09/16 17:49, Michael StJohns wrote:
> On 9/29/2016 11:28 AM, Andrew Haley wrote:
>> GCM allows most of the work in an encryption to be done offline (and
>> ahead of time) by other processors, reducing latency and increasing
>> throughput.  It'd be lovely if we could do this in Java, but I can't
>> really see a way to fit this in to the platform security framework.
>> We don't want to do this eagerly, because we don't know that more data
>> will be encrypted and we don't want to speculate.
>>
>> However, if we had a hint that (say) a large stream would need to
>> encrypt a megabyte of data at some time in the future we could
>> precompute a megabyte of keystream.  Has anyone considered this?
> 
> Um.  No.   You can make this work with CTR, but you can't with GCM.  
> With CTR, you just encrypt a stream of zeroes to get an encryption 
> stream and then XOR the encryption stream later with your actual plain 
> text.  GCM (and CCM) tend to compute the integrity tag in parallel with 
> calculating the encryption stream. You'd have to still process all of 
> the plain text (or cipher text) to get the integrity tag.

The keystream doesn't depend on the plaintext or the ciphertext.  Of
course the auth tag can't be calculated before we have the ciphertext
but the keystream can be, and the keystream is the expensive part of
the calculation.  The auth tag doesn't require anything more than a
Galois field multiplication, and that's really fast with current
hardware.  We could arrange it so that there is always some keystream
precalculated, and then the encryption latency would be no more than a
couple of XORs and a GF multiplication.

Maybe there's no point if AES-NI can generate the keystream at 20ish
cycles per block and the GF multiplication can be done in parallel,
but not every target has AES-NI or some equivalent.  And I'm not sure
that every target which does have appropriate instructions can issue
the AES and the GF multiplication instructions at the same time.

Having said all of that, it may be that the overheads of inter-
processor communication would make this inefficient.

Andrew.