RFR: 8342692: C2: MemorySegment API slow with short running loops [v2]

Tobias Hartmann thartmann at openjdk.org
Wed Oct 23 12:01:10 UTC 2024


On Tue, 22 Oct 2024 11:53:33 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> To optimize a long counted loop and long range checks in a long or int
>> counted loop, the loop is turned into a loop nest. When the loop has
>> few iterations, the overhead of having an outer loop whose backedge is
>> never taken, has a measurable cost. Furthermore, creating the loop
>> nest usually causes one iteration of the loop to be peeled so
>> predicates can be set up. If the loop is short running, then it's an
>> extra iteration that's run with range checks (compared to an int
>> counted loop with int range checks).
>> 
>> This change doesn't create a loop nest when:
>> 
>> 1- it can be determined statically at loop nest creation time that the
>>    loop runs for a short enough number of iterations
>>   
>> 2- profiling reports that the loop runs for no more than ShortLoopIter
>>    iterations (1000 by default).
>>   
>> For 2-, a guard is added which is implemented as yet another predicate.
>> 
>> While this change is in principle simple, I ran into a few
>> implementation issues:
>> 
>> - while c2 has a way to compute the number of iterations of an int
>>   counted loop, it doesn't have that for long counted loop. The
>>   existing logic for int counted loops promotes values to long to
>>   avoid overflows. I reworked it so it now works for both long and int
>>   counted loops.
>> 
>> - I added a new deoptimization reason (Reason_short_running_loop) for
>>   the new predicate. Given the number of iterations is narrowed down
>>   by the predicate, the limit of the loop after transformation is a
>>   cast node that's control dependent on the short running loop
>>   predicate. Because once the counted loop is transformed, it is
>>   likely that range check predicates will be inserted and they will
>>   depend on the limit, the short running loop predicate has to be the
>>   one that's further away from the loop entry. Now it is also possible
>>   that the limit before transformation depends on a predicate
>>   (TestShortRunningLongCountedLoopPredicatesClone is an example), we
>>   can have: new predicates inserted after the transformation that
>>   depend on the casted limit that itself depend on old predicates
>>   added before the transformation. To solve this cicular dependency,
>>   parse and assert predicates are cloned between the old predicates
>>   and the loop head. The cloned short running loop parse predicate is
>>   the one that's used to insert the short running loop predicate.
>> 
>> - In the case of a long counted loop, the loop is transformed into a
>>   regular loop with a ...
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits:
> 
>  - Merge branch 'master' into JDK-8342692
>  - more
>  - more
>  - more
>  - more
>  - more
>  - fix & test

I didn't look at it yet but submitted some quick testing. The build on Mac AArch64 fails:

[2024-10-23T11:56:28,256Z] /System/Volumes/Data/mesos/work_dir/slaves/7a20d425-e769-4142-b5c1-e3cc2d88e03e-S37429/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/6cb59bf8-52bf-4698-bc7c-7bac27fa71af/runs/66922795-8c84-4b35-8612-4d25564e6c23/workspace/open/src/hotspot/share/opto/loopTransform.cpp:2069:69: error: format specifies type 'long' but the argument has type 'julong' (aka 'unsigned long long') [-Werror,-Wformat]
[2024-10-23T11:56:28,256Z]       tty->print("Unroll %d(%2ld) ", loop_head->unrolled_count()*2, loop_head->trip_count());
[2024-10-23T11:56:28,256Z]                             ~~~~                                    ^~~~~~~~~~~~~~~~~~~~~~~
[2024-10-23T11:56:28,256Z]                             %2llu
[2024-10-23T11:56:28,256Z] /System/Volumes/Data/mesos/work_dir/slaves/7a20d425-e769-4142-b5c1-e3cc2d88e03e-S37429/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/6cb59bf8-52bf-4698-bc7c-7bac27fa71af/runs/66922795-8c84-4b35-8612-4d25564e6c23/workspace/open/src/hotspot/share/opto/loopTransform.cpp:2322:35: error: format specifies type 'long' but the argument has type 'julong' (aka 'unsigned long long') [-Werror,-Wformat]
[2024-10-23T11:56:28,256Z]     tty->print("MaxUnroll  %ld ", cl->trip_count());
[2024-10-23T11:56:28,256Z]                            ~~~    ^~~~~~~~~~~~~~~~
[2024-10-23T11:56:28,256Z]                            %llu

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2431891426


More information about the hotspot-compiler-dev mailing list