AW: [Heads-up] JDK-8308994: C2: Re-implement experimental post loop vectorization
Emanuel Peter
emanuel.peter at oracle.com
Tue May 30 07:45:23 UTC 2023
Hi Pengfei,
great to hear that you are spending time on SuperWord / the auto-vectorization in HotSpot. I agree with your assessment that currently SuperWord is unnecessarily convoluted and has a good bit of legacy code. It would be nice if we could make the code more modular and extensible for future improvements.
Is there a chance that we could see the draft already?
I am also thinking about extending SuperWord in the future. I am currently trying to clean up as much dead code and bugs as possible to clear the way. I have to see how much time I get to spend on extensions. Here you can find some of my ideas (towards the end of the PR description):
https://github.com/openjdk/jdk/pull/14096
It would be good to coordinate a bit so that we can ensure our plans fit together.
Best regards,
Emanuel
________________________________
Von: Pengfei Li <Pengfei.Li at arm.com>
Gesendet: Montag, 29. Mai 2023 05:12
An: hotspot-compiler-dev at openjdk.java.net <hotspot-compiler-dev at openjdk.java.net>
Cc: epeter at openjdk.org <epeter at openjdk.org>; Bhateja, Jatin <jatin.bhateja at intel.com>; nd <nd at arm.com>
Betreff: [Heads-up] JDK-8308994: C2: Re-implement experimental post loop vectorization
Hi,
I'm writing to let you know that I just filed "JDK-8308994: C2: Re-implement experimental post loop vectorization".
[BACKGROUND]
Current post loop vectorization in the C2 compiler has a long history. It was firstly implemented in JDK-8153998 in 2016 as an experimental feature to support x86 AVX-512 vector masks. Due to insufficient maintenance, it had been broken for a very long time. Last year, I took over JDK-8183390 to fix and re-enable this feature. Several issues were fixed and AArch64 SVE vector mask support was added in the meanwhile. We (Arm) proposed to make post loop vectorization non-experimental in future JDK releases. So early in this year (2023), we did a lot of tests on this but found more problems inside.
[PROBLEMS]
Problems include stability, maintainability and performance.
1) Stability issues
Multiple C2 crash or mis-compilation issues were filed on JBS, including JDK-8301657, JDK-8301904, JDK-8301944, JDK-8304774, JDK-8308949 and perhaps more.
2) Maintainability issue
The original implementation was based on multi-versioned post loops and the logic was mixed in SuperWord. But the algorithm for post loop vectorization is actually *not* SLP. As more and more new features were added in SuperWord, legacy code for post loop vectorization is becoming more and more difficult to maintain.
3) Performance issue
Post loop vectorization was expected to bring performance improvement for small-iteration vectorizable loops. But JMH tests show it doesn't. A main reason is that the vector masked post loop is skipped (not executed) if the loop trip count is small due to zero-trip guard of the main loop. That's a major defect of current multi-versioning framework. (See JDK-8307084 for more details.)
[ACTIONS]
For better stability, maintainability and performance, we now propose to deprecate current multi-versioning framework and completely re-implement the experimental post loop vectorization, for both x86 AVX-512 and AArch64 SVE. Our new proposal is to add a standalone ideal loop phase (outside SuperWord) to do vector mask transformation directly on the original scalar post loop.
We have been working on this internally for a while. So far we have finished a draft patch. I will push the patch for review soon after it passes all tests and becomes polished enough.
--
Thanks,
Pengfei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-compiler-dev/attachments/20230530/87658875/attachment.htm>
More information about the hotspot-compiler-dev
mailing list