RFR: 8259498: Reduce overhead of MD5 and SHA digests [v2]
Valerie Peng
valeriep at openjdk.java.net
Sat Jan 16 00:25:19 UTC 2021
On Fri, 15 Jan 2021 23:36:35 GMT, Claes Redestad <redestad at openjdk.org> wrote:
>> - The MD5 intrinsics added by [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that the `int[] x` isn't actually needed. This also applies to the SHA intrinsics from which the MD5 intrinsic takes inspiration
>> - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to make it acceptable to use inline and replace the array in MD5 wholesale. This improves performance both in the presence and the absence of the intrinsic optimization.
>> - Doing the exact same thing in the SHA impls would be unwieldy (64+ element arrays), but allocating the array lazily gets most of the speed-up in the presence of an intrinsic while being neutral in its absence.
>>
>> Baseline:
>> (digesterName) (length) Cnt Score Error Units
>> MessageDigests.digest MD5 16 15 2714.307 ± 21.133 ops/ms
>> MessageDigests.digest MD5 1024 15 318.087 ± 0.637 ops/ms
>> MessageDigests.digest SHA-1 16 15 1387.266 ± 40.932 ops/ms
>> MessageDigests.digest SHA-1 1024 15 109.273 ± 0.149 ops/ms
>> MessageDigests.digest SHA-256 16 15 995.566 ± 21.186 ops/ms
>> MessageDigests.digest SHA-256 1024 15 89.104 ± 0.079 ops/ms
>> MessageDigests.digest SHA-512 16 15 803.030 ± 15.722 ops/ms
>> MessageDigests.digest SHA-512 1024 15 115.611 ± 0.234 ops/ms
>> MessageDigests.getAndDigest MD5 16 15 2190.367 ± 97.037 ops/ms
>> MessageDigests.getAndDigest MD5 1024 15 302.903 ± 1.809 ops/ms
>> MessageDigests.getAndDigest SHA-1 16 15 1262.656 ± 43.751 ops/ms
>> MessageDigests.getAndDigest SHA-1 1024 15 104.889 ± 3.554 ops/ms
>> MessageDigests.getAndDigest SHA-256 16 15 914.541 ± 55.621 ops/ms
>> MessageDigests.getAndDigest SHA-256 1024 15 85.708 ± 1.394 ops/ms
>> MessageDigests.getAndDigest SHA-512 16 15 737.719 ± 53.671 ops/ms
>> MessageDigests.getAndDigest SHA-512 1024 15 112.307 ± 1.950 ops/ms
>>
>> GC:
>> MessageDigests.getAndDigest:·gc.alloc.rate.norm MD5 16 15 312.011 ± 0.005 B/op
>> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-1 16 15 584.020 ± 0.006 B/op
>> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-256 16 15 544.019 ± 0.016 B/op
>> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-512 16 15 1056.037 ± 0.003 B/op
>>
>> Target:
>> Benchmark (digesterName) (length) Cnt Score Error Units
>> MessageDigests.digest MD5 16 15 3134.462 ± 43.685 ops/ms
>> MessageDigests.digest MD5 1024 15 323.667 ± 0.633 ops/ms
>> MessageDigests.digest SHA-1 16 15 1418.742 ± 38.223 ops/ms
>> MessageDigests.digest SHA-1 1024 15 110.178 ± 0.788 ops/ms
>> MessageDigests.digest SHA-256 16 15 1037.949 ± 21.214 ops/ms
>> MessageDigests.digest SHA-256 1024 15 89.671 ± 0.228 ops/ms
>> MessageDigests.digest SHA-512 16 15 812.028 ± 39.489 ops/ms
>> MessageDigests.digest SHA-512 1024 15 116.738 ± 0.249 ops/ms
>> MessageDigests.getAndDigest MD5 16 15 2314.379 ± 229.294 ops/ms
>> MessageDigests.getAndDigest MD5 1024 15 307.835 ± 5.730 ops/ms
>> MessageDigests.getAndDigest SHA-1 16 15 1326.887 ± 63.263 ops/ms
>> MessageDigests.getAndDigest SHA-1 1024 15 106.611 ± 2.292 ops/ms
>> MessageDigests.getAndDigest SHA-256 16 15 961.589 ± 82.052 ops/ms
>> MessageDigests.getAndDigest SHA-256 1024 15 88.646 ± 0.194 ops/ms
>> MessageDigests.getAndDigest SHA-512 16 15 775.417 ± 56.775 ops/ms
>> MessageDigests.getAndDigest SHA-512 1024 15 112.904 ± 2.014 ops/ms
>>
>> GC
>> MessageDigests.getAndDigest:·gc.alloc.rate.norm MD5 16 15 232.009 ± 0.006 B/op
>> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-1 16 15 584.021 ± 0.001 B/op
>> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-256 16 15 272.012 ± 0.015 B/op
>> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-512 16 15 400.017 ± 0.019 B/op
>>
>> For the `digest` micro digesting small inputs is faster with all algorithms, ranging from ~1% for SHA-512 up to ~15% for MD5. The gain stems from not allocating and reading into a temporary buffer once outside of the intrinsic. SHA-1 does not see a statistically gain because the intrinsic is disabled by default on my HW.
>>
>> For the `getAndDigest` micro - which tests `MessageDigest.getInstance(..).digest(..)` there are similar gains with this patch. The interesting aspect here is verifying the reduction in allocations per operation when there's an active intrinsic (again, not for SHA-1). JDK-8259065 (#1933) reduced allocations on each of these with 144B/op, which means allocation pressure for SHA-512 is down two thirds from 1200B/op to 400B/op in this contrived test.
>>
>> I've verified there are no regressions in the absence of the intrinsic - which the SHA-1 numbers here help show.
>
> Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision:
>
> - Copyrights
> - Merge branch 'master' into improve_md5
> - Remove unused Unsafe import
> - Harmonize MD4 impl, remove now-redundant checks from ByteArrayAccess (VHs do bounds checks, most of which will be optimized away)
> - Merge branch 'master' into improve_md5
> - Apply allocation avoiding optimizations to all SHA versions sharing structural similarities with MD5
> - Remove unused reverseBytes imports
> - Copyrights
> - Fix copy-paste error
> - Various fixes (IDE stopped IDEing..)
> - ... and 10 more: https://git.openjdk.java.net/jdk/compare/18f8493b...cafa3e49
Changes look good. Thanks.
-------------
Marked as reviewed by valeriep (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/1855
More information about the security-dev
mailing list