RFR: 8326962: C2 SuperWord: cache VPointer

Emanuel Peter epeter at openjdk.org
Tue Apr 2 13:11:11 UTC 2024


On Tue, 2 Apr 2024 09:04:45 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361).
> 
> Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores.
> 
> I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time.
> 
> There are now only a few cases where we cannot use the cached `VPointer`:
> - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop.
> - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way).
> 
> This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient.
> 
> **Benchmarking SuperWord Compile Time**
> 
> I use the same benchmark from https://github.com/openjdk/jdk/pull/18532.
> 
> On master:
> 
>     C2 Compile Time:       56.816 s
>          IdealLoop:            56.604 s
>            AutoVectorize:      56.192 s
> 
> 
> With this patch:
> 
>     C2 Compile Time:       49.719 s
>          IdealLoop:            49.509 s
>            AutoVectorize:      49.106 s
> 
> 
> This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant.

src/hotspot/share/opto/superword.cpp line 600:

> 598:       MemNode* s2 = memops.at(j)->as_Mem();
> 599:       if (isomorphic(s1, s2)) {
> 600:         const VPointer& p2 = get_pointer(s2);

Note: a classic example of a quadratic loop, where we compare "all-to-all" memops, thus parse the pointer subgraph repeatedly.

src/hotspot/share/opto/vectorization.cpp line 194:

> 192: 
> 193:   uint bytes = number_of_pointers * sizeof(VPointer);
> 194:   _pointers = (VPointer*)_arena->Amalloc(bytes);

Note: I wish I could use `GrowableArray` here. But I have a `StackObj` that is `NONCOPYABLE`. I thus have to directly construct the `VPointer` into the array, and cannot construct it outside and pass it in. Someday, I hope that `GrowableArray` allows appending with the move-constructor, or something similar.

For now: I simply allocate my own memory, and use the placement-new to construct the `VPointer`s directly into that memory.

src/hotspot/share/opto/vectorization.cpp line 268:

> 266:         if (n1->is_Load() && n2->is_Load()) { continue; }
> 267: 
> 268:         const VPointer& p2 = _pointers.get(n2);

Note: another quadratic loop where we repeatedly parse the pointers.

src/hotspot/share/opto/vectorization.cpp line 788:

> 786:   tty->print_cr(" + scale(%4d) * iv]", _scale);
> 787: }
> 788: #endif

Note: improve printing a bit for `POINTERS` tag of `TraceAutoVectorization`.

src/hotspot/share/opto/vectorization.cpp line 1496:

> 1494:   }
> 1495: }
> 1496: 

Note: moved it up so we can use it anywhere in `vectorization.cpp`.

src/hotspot/share/opto/vectorization.hpp line 726:

> 724: 
> 725:   // Comparable?
> 726:   bool invar_equals(const VPointer& q) const {

Note: had to make some things `const` here, so that I can pass around `const VPointer&`, which I get from `_pointers.get(n)` / `get_pointer(n)`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547524530
PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547529998
PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547530691
PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547531986
PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547532679
PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547534538


More information about the hotspot-compiler-dev mailing list