RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v11]

Roland Westrelin roland at openjdk.org
Mon Feb 17 15:05:17 UTC 2025


On Mon, 17 Feb 2025 10:36:52 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> I think we should be able to see the same issue here, actually. Yes. Here a quick benchmark below:

I observe the same:


Warmup
751    3    b        TestIntMax::test1 (27 bytes)
Run
Time: 360 550 158
Warmup
1862   15    b        TestIntMax::test2 (34 bytes)
Run
Time: 92 116 170


But then with this:


diff --git a/src/hotspot/cpu/x86/x86_64.ad b/src/hotspot/cpu/x86/x86_64.ad
index 8cc4a970bfd..9abda8f4178 100644
--- a/src/hotspot/cpu/x86/x86_64.ad
+++ b/src/hotspot/cpu/x86/x86_64.ad
@@ -12037,16 +12037,20 @@ instruct cmovI_reg_l(rRegI dst, rRegI src, rFlagsReg cr)
 %}
 
 
-instruct maxI_rReg(rRegI dst, rRegI src)
+instruct maxI_rReg(rRegI dst, rRegI src, rFlagsReg cr)
 %{
   match(Set dst (MaxI dst src));
+  effect(KILL cr);
 
   ins_cost(200);
-  expand %{
-    rFlagsReg cr;
-    compI_rReg(cr, dst, src);
-    cmovI_reg_l(dst, src, cr);
+  ins_encode %{
+    Label done;
+    __ cmpl($src$$Register, $dst$$Register);
+    __ jccb(Assembler::less, done);
+    __ mov($dst$$Register, $src$$Register);
+    __ bind(done);
   %}
+  ins_pipe(pipe_cmov_reg);
 %}
 
 // ============================================================================


the performance gap narrows:


Warmup
770    3    b        TestIntMax::test1 (27 bytes)
Run
Time: 94 951 677
Warmup
1312   15    b        TestIntMax::test2 (34 bytes)
Run
Time: 70 053 824


(the number of test2 fluctuates quite a bit). Does it ever make sense to implement `MaxI` with a conditional move then?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2663379660


More information about the core-libs-dev mailing list