RFR: 8324655: Identify integer minimum and maximum patterns created with if statements [v3]
Emanuel Peter
epeter at openjdk.org
Tue Feb 27 18:01:44 UTC 2024
On Tue, 27 Feb 2024 17:54:53 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:
>> @merykitty I am thinking of a case like this:
>>
>> a = <something very expensive>
>> b = <something cheap>
>> x = (a < b) ? a : b;
>>
>> If in most cases we take `b` (because it is larger), then we might speculatively assign `x = b`, before we have finished computing `a`. That way we can already continue (speculatively) with `x = b`, while `a` is still computing. If the speculation is wrong, then the CPU flushes the pipeline.
>>
>> If this is converted to `max/min`, then we need to wait for `a` to be computed.
>>
>> @merykitty @jaskarth does this make sense as an example for a potential performance regression?
>
> @eme64 I've designed this benchmark, which has a cheap common path and an expensive rare path:
>
> private static final Random RANDOM = new Random();
> @Benchmark
> public void testCheapCommon(Blackhole blackhole) {
> for (int i = 0; i < 300; i++) {
> int a = i & 1; // Cheap
> int b = RANDOM.nextInt(33); // Expensive
> int c = b >= a ? a : b;
> blackhole.consume(c + 20);
> }
> }
>
> The condition `b >= a` should be true in the majority of cases, selecting `a`. I made sure that C2 was creating a MinI here, and it was. Here are the results I got:
>
> Benchmark Mode Cnt Score Error Units
> IfMinMax.testCheapCommon avgt 15 1127.771 ± 10.838 ns/op // Baseline
> IfMinMax.testCheapCommon avgt 15 1104.754 ± 4.097 ns/op // Patch
>
> It seems that there isn't a regression here, and is in fact marginally better (but this may just be noise). Is this what you had in mind for the benchmark?
>
>> I'm asking @jaskarth if it is easier to transform a CMove into a Min/Max instead of trying to look for matching Phis.
>
> @merykitty I did a bit of experimentation with the current benchmarks, and it seems that in many cases the `min/max` patterns aren't turned into `CMove`s, as the `CMove` heuristic is generally quite conservative. Since we're interested in a subset of `CMove`s that are simpler, I think it would be better to group this optimization with the others that operate on CFG diamonds, to find more cases that could be optimized. Though, in general I think this group of optimizations should be refactored to apply to both CFG diamonds and `CMove`s, since a `CMove` is a sort of CFG diamond in itself- but I think this cleanup should be done in a future RFE.
>
>> While you are handling this, following Identity transforms can be added to integer and long Max/Min.
>
> @jatin-bhateja I think this is a good idea, I can take a look at this in a later RFE as suggested.
@jaskarth
> I've designed this benchmark
Nice. Can you also post the generated assembly for Baseline/Patch?
I'm just worried that there is some method call, or something else that does not get cleanly inlined and could mess with the benchmark.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17574#issuecomment-1967305832
More information about the hotspot-compiler-dev
mailing list