Integrated: 8283694: Improve bit manipulation and boolean to integer conversion operations on x86_64

Quan Anh Mai duke at openjdk.java.net
Fri Jun 3 20:22:45 UTC 2022


On Sat, 26 Mar 2022 06:14:29 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:

> Hi, this patch improves some operations on x86_64:
> 
> - Base variable scalar shifts have bad performance implications and should be replaced by their bmi2 counterparts if possible:
>   + Bounded operands
>   + Multiple uops both in fused and unfused domains
>   + May result in flag stall since the operations have unpredictable flag output
> 
> - Flag to general-purpose registers operation currently uses `cmovcc`, which requires set up and 1 more spare register for constant, this could be replaced by set, which transforms the sequence:
> 
>         xorl dst, dst
>         sometest
>         movl tmp, 0x01
>         cmovlcc dst, tmp
> 
>         into:
> 
>         xorl dst, dst
>         sometest
>         setbcc dst
> 
> This sequence does not need a spare register and without any drawbacks.
> (Note: `movzx` does not work since move elision only occurs with different registers for input and output)
> 
> - Some small improvements:
>   + Add memory variances to `tzcnt` and `lzcnt`
>   + Add memory variances to `rolx` and `rorx`
>   + Add missing `rolx` rules (note that `rolx dst, imm` is actually `rorx dst, size - imm`)
> 
> The speedup can be observed for variable shift instructions
> 
>         Before:
>         Benchmark               (size)  Mode  Cnt   Score   Error  Units
>         Integers.shiftLeft         500  avgt    5   0.836 ± 0.030  us/op
>         Integers.shiftRight        500  avgt    5   0.843 ± 0.056  us/op
>         Integers.shiftURight       500  avgt    5   0.830 ± 0.057  us/op
>         Longs.shiftLeft            500  avgt    5   0.827 ± 0.026  us/op
>         Longs.shiftRight           500  avgt    5   0.828 ± 0.018  us/op
>         Longs.shiftURight          500  avgt    5   0.829 ± 0.038  us/op
> 
>         After:
>         Benchmark               (size)  Mode  Cnt   Score   Error  Units
>         Integers.shiftLeft         500  avgt    5   0.761 ± 0.016  us/op
>         Integers.shiftRight        500  avgt    5   0.762 ± 0.071  us/op
>         Integers.shiftURight       500  avgt    5   0.765 ± 0.056  us/op
>         Longs.shiftLeft            500  avgt    5   0.755 ± 0.026  us/op
>         Longs.shiftRight           500  avgt    5   0.753 ± 0.017  us/op
>         Longs.shiftURight          500  avgt    5   0.759 ± 0.031  us/op
> 
> For `cmovcc 1, 0`, I have not been able to create a reliable microbenchmark since the benefits are mostly regarding register allocation.
> 
> Thank you very much.

This pull request has now been integrated.

Changeset: 0b35460f
Author:    Quan Anh Mai <anhmdq99 at gmail.com>
Committer: Vladimir Kozlov <kvn at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/0b35460fa00bfdca63a311a7379819cf102dee86
Stats:     602 lines in 8 files changed: 563 ins; 4 del; 35 mod

8283694: Improve bit manipulation and boolean to integer conversion operations on x86_64

Reviewed-by: kvn, dlong

-------------

PR: https://git.openjdk.java.net/jdk/pull/7968


More information about the hotspot-compiler-dev mailing list