RFR: 8350988: Consolidate Identity of self-inverse operations
Hannes Greule
hgreule at openjdk.org
Mon Mar 3 14:06:02 UTC 2025
On Mon, 3 Mar 2025 09:20:11 GMT, Damon Fenacci <dfenacci at openjdk.org> wrote:
> I'm not totally sure I fully get what you mean here: does this optimization hinder vectorization in some cases? Does this result in a slowdown? (BTW do you have benchmark results?) Should we possibly try to detect this early and avoid simplifying?
What happens basically comes down to this check: https://github.com/openjdk/jdk/blob/885338b5f38ed05d8b91efc0178b371f2f89310e/src/hotspot/share/opto/superword.cpp#L1759
Without my change, `_num_work_vecs` is 3 (I assume, I didn't debug that part) as we have one load and two reverse bytes operations. `_num_reductions` is 1, the xor. With my change, when we come to this check, `_num_work_vecs` is 1 (That part I checked with the debugger), as we only have the load left. So superword does not consider vectorization to be profitable.
My benchmark code: https://gist.github.com/SirYwell/a76578dc5f3c10cd08b768a3bd39a988
Results on my machine (Ryzen 9 3900X):
mainline
Benchmark Mode Cnt Score Error Units
DoubledReverseBytes.doubleReverse thrpt 3 3287,042 ± 398,656 ops/ms
DoubledReverseBytes.folded thrpt 3 418,627 ± 20,797 ops/ms
this pr
Benchmark Mode Cnt Score Error Units
DoubledReverseBytes.doubleReverse thrpt 3 419,369 ± 24,974 ops/ms
DoubledReverseBytes.folded thrpt 3 415,469 ± 88,714 ops/ms
You can see the almost 8x speedup due to vectorization that happens on mainline but not anymore with my change.
I don't think this should block this change. Detecting such situations also seems like a rather complicated workaround.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23851#issuecomment-2694504151
More information about the hotspot-compiler-dev
mailing list