Just for curiosity, what was the improvement in performance?

I'm wondering if it might be worthwhile to see if its possible to add a plugin to use the hardware instructions:


>>This change addresses a severe performance regression, first introduced
>>in JDK 8, triggered by the negotiation of a GCM cipher suite in the TLS
>>implementation.  This regression is a result of the poor performance of
>>the implementation of the GHASH function.
>>I first tried to eliminate just the allocations in blockMult while still
>>retaining the byte arrays.  This did not substantially increase
>>performance in my micro-benchmark.  I then replaced the 16-byte arrays
>>with longs, replaced the inner loops with direct bit fiddling on the
>>longs, eliminated data-dependent conditionals (which are generally
>>frowned upon in cryptographic algorithms due to the risk of timing
>>attacks), and split the main loop in two, one for each half of the hash
>>state.  This is the result:
