RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11]
Roland Westrelin
roland at openjdk.org
Mon Feb 17 15:05:17 UTC 2025
On Mon, 17 Feb 2025 10:36:52 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
> I think we should be able to see the same issue here, actually. Yes. Here a quick benchmark below:
I observe the same:
Warmup
751 3 b TestIntMax::test1 (27 bytes)
Run
Time: 360 550 158
Warmup
1862 15 b TestIntMax::test2 (34 bytes)
Run
Time: 92 116 170
But then with this:
diff --git a/src/hotspot/cpu/x86/x86_64.ad b/src/hotspot/cpu/x86/x86_64.ad
index 8cc4a970bfd..9abda8f4178 100644
--- a/src/hotspot/cpu/x86/x86_64.ad
+++ b/src/hotspot/cpu/x86/x86_64.ad
@@ -12037,16 +12037,20 @@ instruct cmovI_reg_l(rRegI dst, rRegI src, rFlagsReg cr)
%}
-instruct maxI_rReg(rRegI dst, rRegI src)
+instruct maxI_rReg(rRegI dst, rRegI src, rFlagsReg cr)
%{
match(Set dst (MaxI dst src));
+ effect(KILL cr);
ins_cost(200);
- expand %{
- rFlagsReg cr;
- compI_rReg(cr, dst, src);
- cmovI_reg_l(dst, src, cr);
+ ins_encode %{
+ Label done;
+ __ cmpl($src$$Register, $dst$$Register);
+ __ jccb(Assembler::less, done);
+ __ mov($dst$$Register, $src$$Register);
+ __ bind(done);
%}
+ ins_pipe(pipe_cmov_reg);
%}
// ============================================================================
the performance gap narrows:
Warmup
770 3 b TestIntMax::test1 (27 bytes)
Run
Time: 94 951 677
Warmup
1312 15 b TestIntMax::test2 (34 bytes)
Run
Time: 70 053 824
(the number of test2 fluctuates quite a bit). Does it ever make sense to implement `MaxI` with a conditional move then?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2663379660
More information about the core-libs-dev
mailing list