RFR: 8283694: Improve bit manipulation and boolean to integer conversion operations on x86_64 [v8]

Vladimir Kozlov kvn at openjdk.java.net
Fri Jun 3 15:56:44 UTC 2022


On Fri, 3 Jun 2022 10:25:57 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:

>> Hi, this patch improves some operations on x86_64:
>> 
>> - Base variable scalar shifts have bad performance implications and should be replaced by their bmi2 counterparts if possible:
>>   + Bounded operands
>>   + Multiple uops both in fused and unfused domains
>>   + May result in flag stall since the operations have unpredictable flag output
>> 
>> - Flag to general-purpose registers operation currently uses `cmovcc`, which requires set up and 1 more spare register for constant, this could be replaced by set, which transforms the sequence:
>> 
>>         xorl dst, dst
>>         sometest
>>         movl tmp, 0x01
>>         cmovlcc dst, tmp
>> 
>>         into:
>> 
>>         xorl dst, dst
>>         sometest
>>         setbcc dst
>> 
>> This sequence does not need a spare register and without any drawbacks.
>> (Note: `movzx` does not work since move elision only occurs with different registers for input and output)
>> 
>> - Some small improvements:
>>   + Add memory variances to `tzcnt` and `lzcnt`
>>   + Add memory variances to `rolx` and `rorx`
>>   + Add missing `rolx` rules (note that `rolx dst, imm` is actually `rorx dst, size - imm`)
>> 
>> The speedup can be observed for variable shift instructions
>> 
>>         Before:
>>         Benchmark               (size)  Mode  Cnt   Score   Error  Units
>>         Integers.shiftLeft         500  avgt    5   0.836 ± 0.030  us/op
>>         Integers.shiftRight        500  avgt    5   0.843 ± 0.056  us/op
>>         Integers.shiftURight       500  avgt    5   0.830 ± 0.057  us/op
>>         Longs.shiftLeft            500  avgt    5   0.827 ± 0.026  us/op
>>         Longs.shiftRight           500  avgt    5   0.828 ± 0.018  us/op
>>         Longs.shiftURight          500  avgt    5   0.829 ± 0.038  us/op
>> 
>>         After:
>>         Benchmark               (size)  Mode  Cnt   Score   Error  Units
>>         Integers.shiftLeft         500  avgt    5   0.761 ± 0.016  us/op
>>         Integers.shiftRight        500  avgt    5   0.762 ± 0.071  us/op
>>         Integers.shiftURight       500  avgt    5   0.765 ± 0.056  us/op
>>         Longs.shiftLeft            500  avgt    5   0.755 ± 0.026  us/op
>>         Longs.shiftRight           500  avgt    5   0.753 ± 0.017  us/op
>>         Longs.shiftURight          500  avgt    5   0.759 ± 0.031  us/op
>> 
>> For `cmovcc 1, 0`, I have not been able to create a reliable microbenchmark since the benefits are mostly regarding register allocation.
>> 
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
> 
>   revert conv2b

I will run testing of latest versiuon before sponsoring.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7968


More information about the hotspot-compiler-dev mailing list