RFR: 8259065: Optimize MessageDigest.getInstance
Claes Redestad
redestad at openjdk.java.net
Wed Jan 6 01:27:09 UTC 2021
On Wed, 6 Jan 2021 01:05:35 GMT, Claes Redestad <redestad at openjdk.org> wrote:
>> By caching default constructors used in `java.security.Provider::newInstanceUtil` in a `ClassValue`, we can reduce the overhead of allocating instances in a variety of places, e.g., `MessageDigest::getInstance`, without compromising thread-safety or security.
>>
>> On the provided microbenchmark `MessageDigest.getInstance(digesterName)` improves substantially for any `digesterName` - around -90ns/op and -120B/op:
>> Benchmark (digesterName) Mode Cnt Score Error Units
>> GetMessageDigest.getInstance md5 avgt 30 293.929 ± 11.294 ns/op
>> GetMessageDigest.getInstance:·gc.alloc.rate.norm md5 avgt 30 424.028 ± 0.003 B/op
>> GetMessageDigest.getInstance SHA-1 avgt 30 322.928 ± 16.503 ns/op
>> GetMessageDigest.getInstance:·gc.alloc.rate.norm SHA-1 avgt 30 688.039 ± 0.003 B/op
>> GetMessageDigest.getInstance SHA-256 avgt 30 338.140 ± 13.902 ns/op
>> GetMessageDigest.getInstance:·gc.alloc.rate.norm SHA-256 avgt 30 640.037 ± 0.002 B/op
>> GetMessageDigest.getInstanceWithProvider md5 avgt 30 312.066 ± 12.805 ns/op
>> GetMessageDigest.getInstanceWithProvider:·gc.alloc.rate.norm md5 avgt 30 424.029 ± 0.003 B/op
>> GetMessageDigest.getInstanceWithProvider SHA-1 avgt 30 345.777 ± 16.669 ns/op
>> GetMessageDigest.getInstanceWithProvider:·gc.alloc.rate.norm SHA-1 avgt 30 688.040 ± 0.003 B/op
>> GetMessageDigest.getInstanceWithProvider SHA-256 avgt 30 371.134 ± 18.485 ns/op
>> GetMessageDigest.getInstanceWithProvider:·gc.alloc.rate.norm SHA-256 avgt 30 640.039 ± 0.004 B/op
>> Patch:
>> Benchmark (digesterName) Mode Cnt Score Error Units
>> GetMessageDigest.getInstance md5 avgt 30 210.629 ± 6.598 ns/op
>> GetMessageDigest.getInstance:·gc.alloc.rate.norm md5 avgt 30 304.021 ± 0.002 B/op
>> GetMessageDigest.getInstance SHA-1 avgt 30 229.161 ± 8.158 ns/op
>> GetMessageDigest.getInstance:·gc.alloc.rate.norm SHA-1 avgt 30 568.030 ± 0.002 B/op
>> GetMessageDigest.getInstance SHA-256 avgt 30 260.013 ± 15.032 ns/op
>> GetMessageDigest.getInstance:·gc.alloc.rate.norm SHA-256 avgt 30 520.030 ± 0.002 B/op
>> GetMessageDigest.getInstanceWithProvider md5 avgt 30 231.928 ± 10.455 ns/op
>> GetMessageDigest.getInstanceWithProvider:·gc.alloc.rate.norm md5 avgt 30 304.020 ± 0.002 B/op
>> GetMessageDigest.getInstanceWithProvider SHA-1 avgt 30 247.178 ± 11.209 ns/op
>> GetMessageDigest.getInstanceWithProvider:·gc.alloc.rate.norm SHA-1 avgt 30 568.029 ± 0.002 B/op
>> GetMessageDigest.getInstanceWithProvider SHA-256 avgt 30 265.625 ± 10.465 ns/op
>> GetMessageDigest.getInstanceWithProvider:·gc.alloc.rate.norm SHA-256 avgt 30 520.030 ± 0.003 B/op
>>
>> See: https://cl4es.github.io/2021/01/04/Investigating-MD5-Overheads.html#reflection-overheads for context.
>
> I refactored and optimized the lookup code further, getting rid of a number of bottlenecks:
>
> - Cache Constructors in Provider.Service instead of via a ClassValue.
> - Also cache the impl Class, wrap Class and Constructor in WeakReference if not loaded by the null classloader (many builtins will be)
> - Cache EngineDescription in Service, avoiding a lookup on the hot path
> - We were hitting a synchronized method in ProviderConfig.getProvider(). The provider field is volatile already, so I used the double-check idiom here to avoid synchronization on the hot path
> - ServiceKey.hashCode using Objects.hash was cause for allocation, simplified and optimized it.
>
> Benchmark (digesterName) Mode Cnt Score Error Units
> GetMessageDigest.getInstance MD5 avgt 30 143.803 ± 5.431 ns/op
> GetMessageDigest.getInstance:·gc.alloc.rate.norm MD5 avgt 30 280.015 ± 0.001 B/op
Since much of the cost is now the creation of the MessageDigest itself, I added a microbenchmark to stat this overhead:
Benchmark (digesterName) Mode Cnt Score Error Units
GetMessageDigest.cloneInstance MD5 avgt 30 124.922 ± 5.412 ns/op
GetMessageDigest.cloneInstance:·gc.alloc.rate.norm MD5 avgt 30 280.015 ± 0.001 B/op
That means there's no added allocation overhead of calling `MessageDigest.getInstance(digesterName)` compared to cloning an existing instance - which means we get almost all of the benefits without resorting to tricks as caching and cloning an instance at call sites such as the one in `UUID::nameUUIDFromBytes`. The remaining 20ns/op difference should be negligible.
-------------
PR: https://git.openjdk.java.net/jdk/pull/1933
More information about the security-dev
mailing list