RFR: 8296548: Improve MD5 intrinsic for x86_64 [v2]

Vladimir Kozlov kvn at openjdk.org
Wed Nov 16 02:22:58 UTC 2022


On Tue, 15 Nov 2022 23:43:12 GMT, Yi-Fan Tsai <duke at openjdk.org> wrote:

>> The LEA instruction loads the effective address, but MD5 intrinsic uses it for computing values than addresses. This usage potentially uses more cycles than ADDs and reduces the throughput.
>> 
>> This change replaces
>>     LEA:  r1 = r1 + rsi * 1 + t
>> with
>>     ADDs: r1 += t; r1 += rsi.
>> 
>> Microbenchmark evaluation shows ~40% performance improvement on Haswell, Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd gen Epyc.
>> 
>> No performance change for the same microbenchmark on Ice Lake and 3rd gen Epyc.
>> 
>> Similar results can be observed with TestMD5Intrinsics and TestMD5MultiBlockIntrinsics. There is ~15% improvement in throughput on Haswell, Broadwell, Skylake, and Cascade Lake.
>
> Yi-Fan Tsai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision:
> 
>  - Merge branch 'openjdk:master' into JDK-8296548
>  - 8296548: Improve MD5 intrinsic for x86_64
>    
>    The LEA instruction loads the effective address, but MD5 intrinsic uses
>    it for computing values than addresses. This usage potentially uses
>    more cycles than ADDs and reduces the throughput.
>    
>    This change replaces
>        LEA:  r1 = r1 + rsi * 1 + t
>    with
>        ADDs: r1 += t; r1 += rsi.
>    
>    Microbenchmark evaluation shows ~40% performance improvement on Haswell,
>    Broadwell, Skylake, and Cascade Lake. There is ~20% improvement on 2nd
>    gen Epyc.
>    
>    No performance change for the same microbenchmark on Ice Lake and 3rd
>    gen Epyc.
>    
>    Similar results can also be observed in TestMD5Intrinsics and
>    TestMD5MultiBlockIntrinsics with a more moderate improvement, e.g. ~15%
>    improvement in throughput on Haswell.

My testing passed.

-------------

PR: https://git.openjdk.org/jdk/pull/11054


More information about the hotspot-dev mailing list