RFR: 8247645: ChaCha20 intrinsics [v3]
Sandhya Viswanathan
sviswanathan at openjdk.org
Thu Nov 10 20:27:37 UTC 2022
On Thu, 10 Nov 2022 20:12:30 GMT, Jamil Nimeh <jnimeh at openjdk.org> wrote:
>> Jamil Nimeh has updated the pull request incrementally with one additional commit since the last revision:
>>
>> replace hi/lo word shuffles and left-right shift/or operations for vpshufd on byte-aligned rotations
>
> using vpshufb (not vpshufd as I typo'ed on my commit message) on AVX/AVX2 for 8-bit and 16-bit left rotations has given us some modest speed gains:
> Before (with intrinsics):
>
> AVX=1
> ChaCha20.encrypt 256 thrpt 40 1338667.215 ± 12012.240 ops/s
> ChaCha20.encrypt 1024 thrpt 40 453682.363 ± 2559.322 ops/s
> ChaCha20.encrypt 4096 thrpt 40 124785.645 ± 394.535 ops/s
> ChaCha20.encrypt 16384 thrpt 40 31788.969 ± 90.770 ops/s
>
> AVX=2
> ChaCha20.encrypt 256 thrpt 40 1893810.127 ± 21870.718 ops/s
> ChaCha20.encrypt 1024 thrpt 40 758024.511 ± 5414.552 ops/s
> ChaCha20.encrypt 4096 thrpt 40 224032.805 ± 935.309 ops/s
> ChaCha20.encrypt 16384 thrpt 40 58112.296 ± 498.048 ops/s
>
> After (using vpshufb):
>
> AVX=1
> Benchmark (dataSize) Mode Cnt Score Error Units
> ChaCha20.encrypt 256 thrpt 40 1447416.349 ± 14054.478 ops/s
> ChaCha20.encrypt 1024 thrpt 40 495844.721 ± 1949.237 ops/s
> ChaCha20.encrypt 4096 thrpt 40 138154.478 ± 411.707 ops/s
> ChaCha20.encrypt 16384 thrpt 40 35165.143 ± 110.483 ops/s
>
> AVX=2
> ChaCha20.encrypt 256 thrpt 40 2020170.211 ± 10507.466 ops/s
> ChaCha20.encrypt 1024 thrpt 40 829644.325 ± 6452.931 ops/s
> ChaCha20.encrypt 4096 thrpt 40 246066.542 ± 1052.905 ops/s
> ChaCha20.encrypt 16384 thrpt 40 64021.363 ± 468.979 ops/s
>
> This was done on the same system that the original benchmarks were done on. None of these changes affect AVX512.
>
> I'm working on a hybrid intrinsic approach to get the best of both worlds for those smaller single-part jobs.
@jnimeh Very nice work overall. I think it would be ok to get this PR integrated and do the hybrid approach as a follow on PR. Your work in general shows very good improvement over base.
-------------
PR: https://git.openjdk.org/jdk/pull/7702
More information about the hotspot-dev
mailing list