RFR: 8283694: Improve several operations on x86 [v3]
Ioi Lam
iklam at openjdk.java.net
Sat Mar 26 17:45:40 UTC 2022
On Sat, 26 Mar 2022 15:17:24 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:
>> Hi, this patch improves some operations on x86_64:
>>
>> - Base variable scalar shifts have bad performance implications and should be replaced by their bmi2 counterparts if possible:
>> + Bounded operands
>> + Multiple uops both in fused and unfused domains
>> + May result in flag stall since the operations have unpredictable flag output
>>
>> - Flag to general-purpose registers operation currently uses `cmovcc`, which requires set up and 1 more spare register for constant, this could be replaced by set, which transforms the sequence:
>>
>> xorl dst, dst
>> sometest
>> movl tmp, 0x01
>> cmovlcc dst, tmp
>>
>> into:
>>
>> sometest
>> setbcc dst
>> movzbl dst, dst
>>
>> This sequence does not need a spare register and without any drawbacks.
>> (Note: `movzx` is often elided and thus has 0 latency)
>>
>> - Some small improvements:
>> + Add memory variances to `tzcnt` and `lzcnt`
>> + Add memory variances to `rolx` and `rorx`
>> + Add missing `rolx` rules (note that `rolx dst, imm` is actually `rorx dst, size - imm`)
>>
>> The speedup can be observed for variable shift instructions
>>
>> Before:
>> Benchmark (size) Mode Cnt Score Error Units
>> Integers.shiftLeft 500 avgt 5 0.836 ± 0.030 us/op
>> Integers.shiftRight 500 avgt 5 0.843 ± 0.056 us/op
>> Integers.shiftURight 500 avgt 5 0.830 ± 0.057 us/op
>> Longs.shiftLeft 500 avgt 5 0.827 ± 0.026 us/op
>> Longs.shiftRight 500 avgt 5 0.828 ± 0.018 us/op
>> Longs.shiftURight 500 avgt 5 0.829 ± 0.038 us/op
>>
>> After:
>> Benchmark (size) Mode Cnt Score Error Units
>> Integers.shiftLeft 500 avgt 5 0.761 ± 0.016 us/op
>> Integers.shiftRight 500 avgt 5 0.762 ± 0.071 us/op
>> Integers.shiftURight 500 avgt 5 0.765 ± 0.056 us/op
>> Longs.shiftLeft 500 avgt 5 0.755 ± 0.026 us/op
>> Longs.shiftRight 500 avgt 5 0.753 ± 0.017 us/op
>> Longs.shiftURight 500 avgt 5 0.759 ± 0.031 us/op
>>
>> For `cmovcc 1, 0`, I have not been able to create a reliable microbenchmark since the benefits are mostly regarding register allocation.
>>
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
>
> cisc
Is it possible to change the RFE title to something more specific? The current title "Improve several operations on x86" makes it difficult to look up similar issues in the future.
Also, does any of the operations apply on 32-bit x86? If not, the RFE title should say x86_64.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7968
More information about the hotspot-compiler-dev
mailing list