RFR: 8259498: Reduce overhead of MD5 and SHA digests
DellCliff
github.com+14116124+dellcliff at openjdk.java.net
Fri Jan 8 22:41:05 UTC 2021
On Sun, 20 Dec 2020 20:27:03 GMT, Claes Redestad <redestad at openjdk.org> wrote:
> - The MD5 intrinsics added by [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that the `int[] x` isn't actually needed. This also applies to the SHA intrinsics from which the MD5 intrinsic takes inspiration
> - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to make it acceptable to use inline and replace the array in MD5 wholesale. This improves performance both in the presence and the absence of the intrinsic optimization.
> - Doing the exact same thing in the SHA impls would be unwieldy (64+ element arrays), but allocating the array lazily gets most of the speed-up in the presence of an intrinsic while being neutral in its absence.
>
> Baseline:
> (digesterName) (length) Cnt Score Error Units
> MessageDigests.digest MD5 16 15 2714.307 ± 21.133 ops/ms
> MessageDigests.digest MD5 1024 15 318.087 ± 0.637 ops/ms
> MessageDigests.digest SHA-1 16 15 1387.266 ± 40.932 ops/ms
> MessageDigests.digest SHA-1 1024 15 109.273 ± 0.149 ops/ms
> MessageDigests.digest SHA-256 16 15 995.566 ± 21.186 ops/ms
> MessageDigests.digest SHA-256 1024 15 89.104 ± 0.079 ops/ms
> MessageDigests.digest SHA-512 16 15 803.030 ± 15.722 ops/ms
> MessageDigests.digest SHA-512 1024 15 115.611 ± 0.234 ops/ms
> MessageDigests.getAndDigest MD5 16 15 2190.367 ± 97.037 ops/ms
> MessageDigests.getAndDigest MD5 1024 15 302.903 ± 1.809 ops/ms
> MessageDigests.getAndDigest SHA-1 16 15 1262.656 ± 43.751 ops/ms
> MessageDigests.getAndDigest SHA-1 1024 15 104.889 ± 3.554 ops/ms
> MessageDigests.getAndDigest SHA-256 16 15 914.541 ± 55.621 ops/ms
> MessageDigests.getAndDigest SHA-256 1024 15 85.708 ± 1.394 ops/ms
> MessageDigests.getAndDigest SHA-512 16 15 737.719 ± 53.671 ops/ms
> MessageDigests.getAndDigest SHA-512 1024 15 112.307 ± 1.950 ops/ms
>
> GC:
> MessageDigests.getAndDigest:·gc.alloc.rate.norm MD5 16 15 312.011 ± 0.005 B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-1 16 15 584.020 ± 0.006 B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-256 16 15 544.019 ± 0.016 B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-512 16 15 1056.037 ± 0.003 B/op
>
> Target:
> Benchmark (digesterName) (length) Cnt Score Error Units
> MessageDigests.digest MD5 16 15 3134.462 ± 43.685 ops/ms
> MessageDigests.digest MD5 1024 15 323.667 ± 0.633 ops/ms
> MessageDigests.digest SHA-1 16 15 1418.742 ± 38.223 ops/ms
> MessageDigests.digest SHA-1 1024 15 110.178 ± 0.788 ops/ms
> MessageDigests.digest SHA-256 16 15 1037.949 ± 21.214 ops/ms
> MessageDigests.digest SHA-256 1024 15 89.671 ± 0.228 ops/ms
> MessageDigests.digest SHA-512 16 15 812.028 ± 39.489 ops/ms
> MessageDigests.digest SHA-512 1024 15 116.738 ± 0.249 ops/ms
> MessageDigests.getAndDigest MD5 16 15 2314.379 ± 229.294 ops/ms
> MessageDigests.getAndDigest MD5 1024 15 307.835 ± 5.730 ops/ms
> MessageDigests.getAndDigest SHA-1 16 15 1326.887 ± 63.263 ops/ms
> MessageDigests.getAndDigest SHA-1 1024 15 106.611 ± 2.292 ops/ms
> MessageDigests.getAndDigest SHA-256 16 15 961.589 ± 82.052 ops/ms
> MessageDigests.getAndDigest SHA-256 1024 15 88.646 ± 0.194 ops/ms
> MessageDigests.getAndDigest SHA-512 16 15 775.417 ± 56.775 ops/ms
> MessageDigests.getAndDigest SHA-512 1024 15 112.904 ± 2.014 ops/ms
>
> GC
> MessageDigests.getAndDigest:·gc.alloc.rate.norm MD5 16 15 232.009 ± 0.006 B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-1 16 15 584.021 ± 0.001 B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-256 16 15 272.012 ± 0.015 B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-512 16 15 400.017 ± 0.019 B/op
>
> For the `digest` micro digesting small inputs is faster with all algorithms, ranging from ~1% for SHA-512 up to ~15% for MD5. The gain stems from not allocating and reading into a temporary buffer once outside of the intrinsic. SHA-1 does not see a statistically gain because the intrinsic is disabled by default on my HW.
>
> For the `getAndDigest` micro - which tests `MessageDigest.getInstance(..).digest(..)` there are similar gains with this patch. The interesting aspect here is verifying the reduction in allocations per operation when there's an active intrinsic (again, not for SHA-1). JDK-8259065 (#1933) reduced allocations on each of these with 144B/op, which means allocation pressure for SHA-512 is down two thirds from 1200B/op to 400B/op in this contrived test.
>
> I've verified there are no regressions in the absence of the intrinsic - which the SHA-1 numbers here help show.
Since `java.util.UUID` and `sun.security.provider.MD5` are both in `java.base`, would it make sense to create new instances by calling `new MD5()` instead of `java.security.MessageDigest.getInstance("MD5")` and bypassing the whole MessageDigest logic?
-------------
PR: https://git.openjdk.java.net/jdk/pull/1855
More information about the security-dev
mailing list