New subject: RFR: 8259498: Reduce overhead of MD5 and SHA digests [v2]

8 Jan 2021

      - The MD5 intrinsics - Using VarHandles - Doing the exact

Baseline:

MessageDigests.digest MessageDigests.digest MessageDigests.digest MessageDigests.digest MessageDigests.digest MessageDigests.digest MessageDigests.digest MessageDigests.digest MessageDigests.getAndDigest MessageDigests.getAndDigest MessageDigests.getAndDigest MessageDigests.getAndDigest MessageDigests.getAndDigest MessageDigests.getAndDigest MessageDigests.getAndDigest MessageDigests.getAndDigest

GC: MessageDigests.getAndDigest MessageDigests.getAndDigest MessageDigests.getAndDigest MessageDigests.getAndDigest

Target: Benchmark MessageDigests.digest MessageDigests.digest MessageDigests.digest MessageDigests.digest MessageDigests.digest MessageDigests.digest MessageDigests.digest MessageDigests.digest MessageDigests.getAndDigest MessageDigests.getAndDigest MessageDigests.getAndDigest MessageDigests.getAndDigest MessageDigests.getAndDigest MessageDigests.getAndDigest MessageDigests.getAndDigest MessageDigests.getAndDigest

GC MessageDigests.getAndDigest MessageDigests.getAndDigest MessageDigests.getAndDigest MessageDigests.getAndDigest

For the `digest` micro

For the `getAndDigest`

I've verified there

-------------

Commit messages: - Remove unused Unsafe import - Harmonize MD4 impl, - Merge branch 'master' - Apply allocation - Remove unused reverseBytes imports - Copyrights - Fix copy-paste error - Various fixes (IDE stopped IDEing..) - Add imports - mismatched parens - ... and 8 more:

Changes: https://bugs.openjdk.java.net/browse/JDK-8250902) shows that the `int[] x` isn't actually needed. This also applies to the SHA intrinsics from which the MD5 intrinsic takes inspiration we can simplify the code in `ByteArrayAccess` enough to make it acceptable to use inline and replace the array in MD5 wholesale. This improves performance both in the presence and the absence of the intrinsic optimization. same thing in the SHA impls would be unwieldy (64+ element arrays), but allocating the array lazily gets most of the speed-up in the presence of an intrinsic while being neutral in its absence. (digesterName)  (length)    Cnt     Score      Error   Units MD5        16     15  2714.307 ±   21.133  ops/ms MD5      1024     15   318.087 ±    0.637  ops/ms SHA-1        16     15  1387.266 ±   40.932  ops/ms SHA-1      1024     15   109.273 ±    0.149  ops/ms SHA-256        16     15   995.566 ±   21.186  ops/ms SHA-256      1024     15    89.104 ±    0.079  ops/ms SHA-512        16     15   803.030 ±   15.722  ops/ms SHA-512      1024     15   115.611 ±    0.234  ops/ms MD5        16     15  2190.367 ±   97.037  ops/ms MD5      1024     15   302.903 ±    1.809  ops/ms SHA-1        16     15  1262.656 ±   43.751  ops/ms SHA-1      1024     15   104.889 ±    3.554  ops/ms SHA-256        16     15   914.541 ±   55.621  ops/ms SHA-256      1024     15    85.708 ±    1.394  ops/ms SHA-512        16     15   737.719 ±   53.671  ops/ms SHA-512      1024     15   112.307 ±    1.950  ops/ms :·gc.alloc.rate.norm      MD5        16     15   312.011 ±    0.005    B/op :·gc.alloc.rate.norm    SHA-1        16     15   584.020 ±    0.006    B/op :·gc.alloc.rate.norm  SHA-256        16     15   544.019 ±    0.016    B/op :·gc.alloc.rate.norm  SHA-512        16     15  1056.037 ±    0.003    B/op (digesterName)  (length)    Cnt     Score      Error   Units MD5        16     15  3134.462 ±   43.685  ops/ms MD5      1024     15   323.667 ±    0.633  ops/ms SHA-1        16     15  1418.742 ±   38.223  ops/ms SHA-1      1024     15   110.178 ±    0.788  ops/ms SHA-256        16     15  1037.949 ±   21.214  ops/ms SHA-256      1024     15    89.671 ±    0.228  ops/ms SHA-512        16     15   812.028 ±   39.489  ops/ms SHA-512      1024     15   116.738 ±    0.249  ops/ms MD5        16     15  2314.379 ±  229.294  ops/ms MD5      1024     15   307.835 ±    5.730  ops/ms SHA-1        16     15  1326.887 ±   63.263  ops/ms SHA-1      1024     15   106.611 ±    2.292  ops/ms SHA-256        16     15   961.589 ±   82.052  ops/ms SHA-256      1024     15    88.646 ±    0.194  ops/ms SHA-512        16     15   775.417 ±   56.775  ops/ms SHA-512      1024     15   112.904 ±    2.014  ops/ms :·gc.alloc.rate.norm      MD5        16     15   232.009 ±    0.006    B/op :·gc.alloc.rate.norm    SHA-1        16     15   584.021 ±    0.001    B/op :·gc.alloc.rate.norm  SHA-256        16     15   272.012 ±    0.015    B/op :·gc.alloc.rate.norm  SHA-512        16     15   400.017 ±    0.019    B/op digesting small inputs is faster with all algorithms, ranging from ~1% for SHA-512 up to ~15% for MD5. The gain stems from not allocating and reading into a temporary buffer once outside of the intrinsic. SHA-1 does not see a statistically gain because the intrinsic is disabled by default on my HW. micro - which tests `MessageDigest.getInstance(..).digest(..)` there are similar gains with this patch. The interesting aspect here is verifying the reduction in allocations per operation when there's an active intrinsic (again, not for SHA-1). JDK-8259065 (#1933) reduced allocations on each of these with 144B/op, which means allocation pressure for SHA-512 is down two thirds from 1200B/op to 400B/op in this contrived test. are no regressions in the absence of the intrinsic - which the SHA-1 numbers here help show. remove now-redundant checks from ByteArrayAccess (VHs do bounds checks, most of which will be optimized away) into improve_md5 avoiding optimizations to all SHA versions sharing structural similarities with MD5 https://git.openjdk.java.net/jdk/compare/090bd3af...e1c943c5 href="https://git.openjdk.java.net/jdk/pull/1855/files">https://git.openjdk.java.net/jdk/pull/1855/files href="https://webrevs.openjdk.java.net/?repo=jdk&pr=1855&range=00">https://webrevs.openjdk.java.net/?repo=jdk&pr=1855&range=00 href="https://bugs.openjdk.java.net/browse/JDK-8259498">https://bugs.openjdk.java.net/browse/JDK-8259498 in 8 files changed: 83 ins; 344 del; 222 mod href="https://git.openjdk.java.net/jdk/pull/1855.diff">https://git.openjdk.java.net/jdk/pull/1855.diff https://git.openjdk.java.net/jdk pull/1855/head:pull/1855 href="https://git.openjdk.java.net/jdk/pull/1855">https://git.openjdk.java.net/jdk/pull/1855

RFR: 8259498: Reduce overhead of MD5 and SHA digests

Claes Redestad

DellCliff

DellCliff

Claes Redestad

Claes Redestad

Claes Redestad

Claes Redestad

Valerie Peng

Valerie Peng

Claes Redestad

Valerie Peng

Claes Redestad

Claes Redestad

Valerie Peng

Claes Redestad

Claes Redestad

Valerie Peng

Claes Redestad

tags

participants (3)