RFR: 8283694: Improve several operations on x86 [v3]

Ioi Lam iklam at openjdk.java.net
Sat Mar 26 17:45:40 UTC 2022


On Sat, 26 Mar 2022 15:17:24 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:

>> Hi, this patch improves some operations on x86_64:
>> 
>> - Base variable scalar shifts have bad performance implications and should be replaced by their bmi2 counterparts if possible:
>>   + Bounded operands
>>   + Multiple uops both in fused and unfused domains
>>   + May result in flag stall since the operations have unpredictable flag output
>> 
>> - Flag to general-purpose registers operation currently uses `cmovcc`, which requires set up and 1 more spare register for constant, this could be replaced by set, which transforms the sequence:
>> 
>>         xorl dst, dst
>>         sometest
>>         movl tmp, 0x01
>>         cmovlcc dst, tmp
>> 
>>         into:
>> 
>>         sometest
>>         setbcc dst
>>         movzbl dst, dst
>> 
>> This sequence does not need a spare register and without any drawbacks.
>> (Note: `movzx` is often elided and thus has 0 latency)
>> 
>> - Some small improvements:
>>   + Add memory variances to `tzcnt` and `lzcnt`
>>   + Add memory variances to `rolx` and `rorx`
>>   + Add missing `rolx` rules (note that `rolx dst, imm` is actually `rorx dst, size - imm`)
>> 
>> The speedup can be observed for variable shift instructions
>> 
>>         Before:
>>         Benchmark               (size)  Mode  Cnt   Score   Error  Units
>>         Integers.shiftLeft         500  avgt    5   0.836 ± 0.030  us/op
>>         Integers.shiftRight        500  avgt    5   0.843 ± 0.056  us/op
>>         Integers.shiftURight       500  avgt    5   0.830 ± 0.057  us/op
>>         Longs.shiftLeft            500  avgt    5   0.827 ± 0.026  us/op
>>         Longs.shiftRight           500  avgt    5   0.828 ± 0.018  us/op
>>         Longs.shiftURight          500  avgt    5   0.829 ± 0.038  us/op
>> 
>>         After:
>>         Benchmark               (size)  Mode  Cnt   Score   Error  Units
>>         Integers.shiftLeft         500  avgt    5   0.761 ± 0.016  us/op
>>         Integers.shiftRight        500  avgt    5   0.762 ± 0.071  us/op
>>         Integers.shiftURight       500  avgt    5   0.765 ± 0.056  us/op
>>         Longs.shiftLeft            500  avgt    5   0.755 ± 0.026  us/op
>>         Longs.shiftRight           500  avgt    5   0.753 ± 0.017  us/op
>>         Longs.shiftURight          500  avgt    5   0.759 ± 0.031  us/op
>> 
>> For `cmovcc 1, 0`, I have not been able to create a reliable microbenchmark since the benefits are mostly regarding register allocation.
>> 
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
> 
>   cisc

Is it possible to change the RFE title to something more specific? The current title "Improve several operations on x86" makes it difficult to look up similar issues in the future.

Also, does any of the operations apply on 32-bit x86? If not, the RFE title should say x86_64.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7968


More information about the hotspot-compiler-dev mailing list