RFR: 8259498: Reduce overhead of MD5 and SHA digests

Wed Jan 13 00:59:55 UTC 2021

On Thu, 7 Jan 2021 18:50:05 GMT, Claes Redestad <redestad at openjdk.org> wrote:

>> Removing the UUID clone cache and running the microbenchmark along with the changes in #1933:
>> 
>> Benchmark                                                  (size)   Mode  Cnt    Score    Error   Units
>> UUIDBench.fromType3Bytes                                    20000  thrpt   12    2.182 ±  0.090  ops/us
>> UUIDBench.fromType3Bytes:·gc.alloc.rate                     20000  thrpt   12  439.020 ± 18.241  MB/sec
>> UUIDBench.fromType3Bytes:·gc.alloc.rate.norm                20000  thrpt   12  264.022 ±  0.003    B/op
>> 
>> The goal now is if to simplify the digest code and compare alternatives.
>
> I've run various tests and concluded that the `VarHandle`ized code is matching or improving upon the `Unsafe`-riddled code in `ByteArrayAccess`. I then went ahead and consolidated to use similar code pattern in `ByteArrayAccess` for consistency, which amounts to a good cleanup.
> 
> With MD5 intrinsics disabled, I get this baseline:
> 
> Benchmark                                                  (size)   Mode  Cnt    Score    Error   Units
> UUIDBench.fromType3Bytes                                    20000  thrpt   12    1.245 ±  0.077  ops/us
> UUIDBench.fromType3Bytes:·gc.alloc.rate.norm                20000  thrpt   12  488.042 ±  0.004    B/op
> 
> With the current patch here (not including #1933): 
> Benchmark                                                  (size)   Mode  Cnt    Score    Error   Units
> UUIDBench.fromType3Bytes                                    20000  thrpt   12    1.431 ±  0.106  ops/us
> UUIDBench.fromType3Bytes:·gc.alloc.rate.norm                20000  thrpt   12  408.035 ±  0.006    B/op
> 
> If I isolate the `ByteArrayAccess` changes I'm getting performance neutral or slightly better numbers compared to baseline for these tests:
> 
> Benchmark                                                  (size)   Mode  Cnt    Score    Error   Units
> UUIDBench.fromType3Bytes                                    20000  thrpt   12    1.317 ±  0.092  ops/us
> UUIDBench.fromType3Bytes:·gc.alloc.rate.norm                20000  thrpt   12  488.042 ±  0.004    B/op

Thanks for the performance enhancement, I will take a look.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1855