RFR: 8283694: Improve bit manipulation and boolean to integer conversion operations on x86_64 [v7]

Quan Anh Mai duke at openjdk.java.net
Thu Jun 2 08:01:35 UTC 2022


On Sat, 16 Apr 2022 11:24:57 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:

>> Hi, this patch improves some operations on x86_64:
>> 
>> - Base variable scalar shifts have bad performance implications and should be replaced by their bmi2 counterparts if possible:
>>   + Bounded operands
>>   + Multiple uops both in fused and unfused domains
>>   + May result in flag stall since the operations have unpredictable flag output
>> 
>> - Flag to general-purpose registers operation currently uses `cmovcc`, which requires set up and 1 more spare register for constant, this could be replaced by set, which transforms the sequence:
>> 
>>         xorl dst, dst
>>         sometest
>>         movl tmp, 0x01
>>         cmovlcc dst, tmp
>> 
>>         into:
>> 
>>         xorl dst, dst
>>         sometest
>>         setbcc dst
>> 
>> This sequence does not need a spare register and without any drawbacks.
>> (Note: `movzx` does not work since move elision only occurs with different registers for input and output)
>> 
>> - Some small improvements:
>>   + Add memory variances to `tzcnt` and `lzcnt`
>>   + Add memory variances to `rolx` and `rorx`
>>   + Add missing `rolx` rules (note that `rolx dst, imm` is actually `rorx dst, size - imm`)
>> 
>> The speedup can be observed for variable shift instructions
>> 
>>         Before:
>>         Benchmark               (size)  Mode  Cnt   Score   Error  Units
>>         Integers.shiftLeft         500  avgt    5   0.836 ± 0.030  us/op
>>         Integers.shiftRight        500  avgt    5   0.843 ± 0.056  us/op
>>         Integers.shiftURight       500  avgt    5   0.830 ± 0.057  us/op
>>         Longs.shiftLeft            500  avgt    5   0.827 ± 0.026  us/op
>>         Longs.shiftRight           500  avgt    5   0.828 ± 0.018  us/op
>>         Longs.shiftURight          500  avgt    5   0.829 ± 0.038  us/op
>> 
>>         After:
>>         Benchmark               (size)  Mode  Cnt   Score   Error  Units
>>         Integers.shiftLeft         500  avgt    5   0.761 ± 0.016  us/op
>>         Integers.shiftRight        500  avgt    5   0.762 ± 0.071  us/op
>>         Integers.shiftURight       500  avgt    5   0.765 ± 0.056  us/op
>>         Longs.shiftLeft            500  avgt    5   0.755 ± 0.026  us/op
>>         Longs.shiftRight           500  avgt    5   0.753 ± 0.017  us/op
>>         Longs.shiftURight          500  avgt    5   0.759 ± 0.031  us/op
>> 
>> For `cmovcc 1, 0`, I have not been able to create a reliable microbenchmark since the benefits are mostly regarding register allocation.
>> 
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits:
> 
>  - Resolve conflict
>  - ins_cost
>  - movzx is not elided with same input and output
>  - fix only the needs
>  - fix
>  - cisc
>  - delete benchmark command
>  - pipe
>  - fix, benchmarks
>  - pipe_class
>  - ... and 5 more: https://git.openjdk.java.net/jdk/compare/e5041ae3...337c0bf3

Hi, may I have a second review for this patch, please? Thank you very much.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7968


More information about the hotspot-compiler-dev mailing list