RFR: 8326962: C2 SuperWord: cache VPointer
Emanuel Peter
epeter at openjdk.org
Tue Apr 2 15:50:10 UTC 2024
On Tue, 2 Apr 2024 14:03:09 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:
>> This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361).
>>
>> Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores.
>>
>> I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time.
>>
>> There are now only a few cases where we cannot use the cached `VPointer`:
>> - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop.
>> - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way).
>>
>> This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient.
>>
>> **Benchmarking SuperWord Compile Time**
>>
>> I use the same benchmark from https://github.com/openjdk/jdk/pull/18532.
>>
>> On master:
>>
>> C2 Compile Time: 56.816 s
>> IdealLoop: 56.604 s
>> AutoVectorize: 56.192 s
>>
>>
>> With this patch:
>>
>> C2 Compile Time: 49.719 s
>> IdealLoop: 49.509 s
>> AutoVectorize: 49.106 s
>>
>>
>> This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant.
>
> src/hotspot/share/opto/vectorization.hpp line 456:
>
>> 454: // Submodule of VLoopAnalyzer.
>> 455: // We compute and cache the VPointer for every load and store.
>> 456: class VLoopPointers : public StackObj {
>
> Nit: Should we call this `VLoopVPointers` to make the link to `VPointers` and not just some pointers?
Hmm. This looks like a general naming question now. My idea is that the `V` at the beginning of the types just is kind of a "namespace", to say that all types are used for `Vectorization`.
But I guess here we can do it just so everybody knows we are dealing with `VPointers`. I'll make an exception, but don't want to see `V`'s littered everywhere ;)
> src/hotspot/share/opto/vectorization.hpp line 462:
>
>> 460: const VLoopBody& _body;
>> 461:
>> 462: // Array of cached pointers
>
> Maybe make a note that we allocate/cache them lazily upon request.
It is not lazy, they are allocated and cached in `compute_and_cache`. Like all other `VLoopAnalyzer` submodules. Maybe I missed your point 😅
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1548137659
PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1548139526
More information about the hotspot-compiler-dev
mailing list