RFR: 8326962: C2 SuperWord: cache VPointer [v2]

Emanuel Peter epeter at openjdk.org
Tue Apr 2 16:01:25 UTC 2024


> This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361).
> 
> Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores.
> 
> I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time.
> 
> There are now only a few cases where we cannot use the cached `VPointer`:
> - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop.
> - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way).
> 
> This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient.
> 
> **Benchmarking SuperWord Compile Time**
> 
> I use the same benchmark from https://github.com/openjdk/jdk/pull/18532.
> 
> On master:
> 
>     C2 Compile Time:       56.816 s
>          IdealLoop:            56.604 s
>            AutoVectorize:      56.192 s
> 
> 
> With this patch:
> 
>     C2 Compile Time:       49.719 s
>          IdealLoop:            49.509 s
>            AutoVectorize:      49.106 s
> 
> 
> This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant.

Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:

  pointer -> vpointer

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/18577/files
  - new: https://git.openjdk.org/jdk/pull/18577/files/d5ef5e45..386f2ca1

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18577&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18577&range=00-01

  Stats: 54 lines in 4 files changed: 0 ins; 0 del; 54 mod
  Patch: https://git.openjdk.org/jdk/pull/18577.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18577/head:pull/18577

PR: https://git.openjdk.org/jdk/pull/18577


More information about the hotspot-compiler-dev mailing list