RFR: 8307351: (CmpI/L(AndI/L reg1 reg2)) on x86 can be optimized
Sandhya Viswanathan
sviswanathan at openjdk.org
Mon May 8 22:22:25 UTC 2023
On Fri, 21 Apr 2023 17:30:39 GMT, Tobias Hotz <duke at openjdk.org> wrote:
> This patch aims to optimize a case where a And-Node followed by a Cmp-Node would not be converted into a single "test" instruction. The Architecture Description file currently only handles the cases where the right operand of the And-Node is a constant, but not if both are a register.
> Before this patch, a "and" followed by a "test" would be emitted, so the removed "and" means 2 bytes less have to be emitted.
> I've attached a JMH Benchmark to demonstrate the performance improvements. Here are the numbers of my Windows 11 machine:
> Before:
>
> Benchmark Mode Cnt Score Error Units
> AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 26,736 ± 0,131 ns/op
> AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 24,305 ± 0,610 ns/op
> AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,052 ± 0,056 ns/op
> AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,355 ± 0,030 ns/op
> AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,587 ± 0,107 ns/op
>
> After:
>
> Benchmark Mode Cnt Score Error Units Improvement
> AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 22,665 ± 0,170 ns/op (~18%)
> AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 18,880 ± 0,123 ns/op (~29%)
> AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,198 ± 0,126 ns/op (unchanged)
> AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,427 ± 0,079 ns/op (unchanged)
> AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,641 ± 0,168 ns/op (unchanged)
>
> As you can see, the cases with a small static mask have not improved as they have already been covered by another match rule. The test with the large static mask (benchmarkStaticLargeAndCmpEqualsLong) has a sorted instruction sequence as the value is moved to a register and it is not being used directly in the instruction, but there is no measurable performance uplift here.
> I've tested my changes using the Tier1 jtreg Tests on Windows.
src/hotspot/cpu/x86/x86_64.ad line 12456:
> 12454: match(Set cr (CmpI (AndI src1 src2) zero));
> 12455:
> 12456: format %{ "testl $src1, $src2\t# long" %}
The format string has "# long" should be "# int" here as this is integer operation.
test/micro/org/openjdk/bench/vm/compiler/x86/AndCmpTestInstruction.java line 2:
> 1: /*
> 2: * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
Copyright year should be 2023.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/13587#discussion_r1187950806
PR Review Comment: https://git.openjdk.org/jdk/pull/13587#discussion_r1187951090
More information about the hotspot-compiler-dev
mailing list