RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v5]

Thu Dec 5 09:22:41 UTC 2024

On Wed, 4 Dec 2024 15:45:54 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> Hi @rwestrel this looks very interesting!
>> 
>> Which benchmarks are you referring to?
>> 
>> I just gave it a quick skim, will come back to this later again.
>
>> Which benchmarks are you referring to?
> 
> The one mentioned in the bug: https://github.com/openjdk/jdk/compare/master...mcimadamore:jdk:manual_mismatch_bench?expand=1

@rwestrel it would be nice to see a plot like this, with the benchmark results:
X-axis: increasing loop iterations
Y-axis: time

Similar to what I did here: https://github.com/openjdk/jdk/pull/22070
![image](https://github.com/user-attachments/assets/f62c5800-874c-4d29-9fc7-b46f077a1034)

You could go over loop sizes 500-2000 in steps of 100, just to get a rough sense if your constant threshold of `1000` is roughly right.

Maybe you can even extend the benchmark I wrote there, with MemorySegment cases. That would be useful also for the other efforts where we are working on short running loops:

[JDK-8307084](https://bugs.openjdk.org/browse/JDK-8307084): C2: Vector atomic post loop is not executed for some small trip counts

[JDK-8344085](https://bugs.openjdk.org/browse/JDK-8344085): C2 SuperWord: improve vectorization for small loop iteration count

I just linked these two issues with this RFE on JBS.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2519722350