Bounds Check Elimination with Fast-Range
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Tue Dec 3 09:32:48 UTC 2019
> but two rounds swirls them all together. Are back-to-back AES rounds
> expensive? Maybe, although that’s how the instructions are designed to
> be used, about 10 of them back to back to do real crypto.
Throughput-oriented implementation should work fine for crypto purposes,
but AESENC does look very good on recent Intel micro-architectures from
latency perspective as well (data from [1] [2]): it improved from 8/1 on
Sandy Bridge and 7/1 on Haswell to 4/1 on Skylake and it's listed (on
uops.info [2]) as 3/1 on Ice Lake which is on par with IMUL (while
processing twice as much bits).
And vector variant (VAESENC) has the same latency as scalar (8->7->4->3
[3]) which looks very appealing for throughput-oriented use cases.
Best regards,
Vladimir Ivanov
[1] https://www.agner.org/optimize/instruction_tables.pdf
[2] https://uops.info/html-lat/ICL/AESENC_XMM_XMM-Measurements.html
[3] https://uops.info/html-instr/VAESENC_XMM_XMM_XMM.html
More information about the hotspot-compiler-dev
mailing list