RFR: 8332163: C2 SuperWord: refactor PacksetGraph and SuperWord::output into VTransformGraph [v11]
Christian Hagedorn
chagedorn at openjdk.org
Thu Jul 4 09:35:35 UTC 2024
On Wed, 3 Jul 2024 14:21:03 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> The original PR was [here](https://github.com/openjdk/jdk/pull/19261), it got too chaotic.
>>
>> I added some extra tests for this in: https://github.com/openjdk/jdk/pull/19558
>> I extracted some refactorings to: https://github.com/openjdk/jdk/pull/19573
>>
>> We used to have:
>> - `PacksetGraph`: this detects cycles introduces by packs, and schedules/reorders the memops.
>> - `SuperWord::apply_vectorization`: creates `VectorNodes` directly from the `PackSet`.
>>
>> In my blog, I have published lots of ideas for SuperWord / AutoVectorization improvements:
>> https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html
>>
>> Many ideas are based on the "VectorTransform IR": cost-model, if-conversion, direct widening of scalars to vectors, additional optimizations/features with shuffle/pack/extract, handling more reduction patterns, etc.
>>
>> I now decided to name it `VTransform`, which is essencially a graph `VtransformGraph` of nodes `VTransformNodes` that resemble the C2 Node on purpose, because the `VTransform` models the C2 graph after vectorization. We can now model the transformation from scalar-loop to vectorized-loop without modifying the C2 graph yet.
>>
>> The new code has these steps:
>> - Given the `PackSet` from `SuperWord`, we create a `VTransformGraph` with `SuperWordVTransformBuilder`.
>> - [Not yet: all sorts of optimizations / checks on the `VTransformGraph`, in future RFE's]
>> - We then schedule the `VTransformGraph`, and check for cycles.
>> - Once we are ready to commit to vectorization, we call `VTransformGraph::apply_vectorization` which lets each individual `VTransformNode::apply` generate the new vectorized C2 nodes.
>>
>> **Testing**
>>
>> Regression testing passed.
>>
>> Performance testing: no significant change in performance (as expected).
>
> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>
> rm newline
Looks already very good now, thanks for the updates and the offline discussion! I like having the new files which better separates the code.
I only have a few other mostly minor comments.
src/hotspot/share/opto/superword.cpp line 1866:
> 1864: _mem_ref_for_main_loop_alignment,
> 1865: _aw_for_main_loop_alignment
> 1866: NOT_PRODUCT( COMMA trace)
Suggestion:
NOT_PRODUCT(COMMA trace)
src/hotspot/share/opto/superword.cpp line 2652:
> 2650: DEBUG_ONLY( \
> 2651: if (_trace._align_vector) { \
> 2652: tty->print(" " #node ": "); \
Suggestion:
if (_trace._align_vector) { \
tty->print(" " #node ": "); \
src/hotspot/share/opto/superword.cpp line 2820:
> 2818: #ifdef ASSERT
> 2819: if (_trace._align_vector) {
> 2820: tty->print_cr("\nVLoopTransformGraph::adjust_pre_loop_limit_to_align_main_loop_vectors:");
Suggestion:
tty->print_cr("\nVLoopTransform::adjust_pre_loop_limit_to_align_main_loop_vectors:");
src/hotspot/share/opto/superword.cpp line 3036:
> 3034: bool SuperWord::same_generation(Node* a, Node* b) const {
> 3035: return a != nullptr && b != nullptr && _clone_map.same_gen(a->_idx, b->_idx);
> 3036: }
Just noticed this by coincidence: This method is dead and can be removed. Can probably also be done in the scope of this PR instead of doing an extra RFE just for that.
src/hotspot/share/opto/superwordVTransformBuilder.cpp line 180:
> 178:
> 179: // Either get existing vtnode vector input (when input is a pack), or else make a
> 180: // new one vector vtnode for the input (e.g. for Replicate or PopulateIndex).
Suggestion:
// Either get the existing vtnode vector input (when input is a pack), or else make a
// new vector vtnode for the input (e.g. for Replicate or PopulateIndex).
src/hotspot/share/opto/superwordVTransformBuilder.hpp line 48:
> 46: _vtransform(vtransform)
> 47: {
> 48: assert(_vtransform.is_empty(), "constructor is passed an empty vtransform");
How about naming this `has_graph()`?
src/hotspot/share/opto/vtransform.cpp line 79:
> 77: if (post_visited.test(use->_idx)) { continue; }
> 78: if (pre_visited.test(use->_idx)) {
> 79: // Circle detected!
The Circle! Cycle?
Suggestion:
// Cycle detected!
src/hotspot/share/opto/vtransform.cpp line 84:
> 82: // and discover that use is also pre_visited but not post_visited. Thus, use
> 83: // lies on that path from "root" to vtn, and the edge (vtn, use) closes a
> 84: // circle.
Suggestion:
// cycle.
src/hotspot/share/opto/vtransform.cpp line 97:
> 95: _schedule.at_put_grow(rpo_idx--, vtn); // assign rpo_idx
> 96: }
> 97:
Suggestion:
src/hotspot/share/opto/vtransform.cpp line 253:
> 251: }
> 252:
> 253: VTransformApplyResult VTransformBoolVectorNode::apply(const VLoopAnalyzer& vloop_analyzer, const GrowableArray<Node*>& vnode_idx_to_transformed_node) const {
Maybe wrap this long line:
Suggestion:
VTransformApplyResult VTransformBoolVectorNode::apply(const VLoopAnalyzer& vloop_analyzer,
const GrowableArray<Node*>& vnode_idx_to_transformed_node) const {
Same at other apply methods.
src/hotspot/share/opto/vtransform.hpp line 33:
> 31: // - Models the transformation of the scalar loop to vectorized loop:
> 32: // It is a "C2 subgraph" -> "C2 subgraph" mapping.
> 33: // - The VTransform contains a graph (VTransformGraph), which consists
Suggestion:
// - The VTransform contains a graph (VTransformGraph), which consists of
src/hotspot/share/opto/vtransform.hpp line 46:
> 44: //
> 45: // - Schedule:
> 46: // - Compute linearization of the graph, into an order that respects
To make it explicit:
Suggestion:
// - Compute linearization of the VTransformGraph, into an order that respects
src/hotspot/share/opto/vtransform.hpp line 99:
> 97: uint vector_length() const { return _vector_length; }
> 98: uint vector_width() const { return _vector_width; }
> 99: NOT_PRODUCT( void trace(VTransformNode* vtn) const; )
I suggest to be explicit for parameters:
Suggestion:
NOT_PRODUCT( void trace(VTransformNode* vtnode) const; )
src/hotspot/share/opto/vtransform.hpp line 103:
> 101:
> 102: #ifndef PRODUCT
> 103: // Convenience method for tracing flags.
Suggestion:
// Convenience class for tracing flags.
src/hotspot/share/opto/vtransform.hpp line 233:
> 231: CountedLoopNode* cl() const { return _vloop.cl(); }
> 232: int iv_stride() const { return cl()->stride_con(); }
> 233: bool in_bb(const Node* n) const { return _vloop.in_bb(n); }
Seems to be dead now and can be removed
Suggestion:
src/hotspot/share/opto/vtransform.hpp line 266:
> 264: _req(req),
> 265: _in(vtransform.arena(), req, req, nullptr),
> 266: _out(vtransform.arena(), 4, 0, nullptr)
Just noticed this here and totally unrelated, but there are a lot of places where we create a `GrowableArray` without a filler. So, we just pass `nullptr` or 0 (latter should probably be replaced by `nullptr`). Maybe `GrowableArray` should provide a constructor without a filler. Might be worth to do in an RFE at some point.
src/hotspot/share/opto/vtransform.hpp line 336:
> 334: VTransformNode(vtransform, n->req()), _node(n) {}
> 335: Node* node() const { return _node; }
> 336: virtual VTransformScalarNode* isa_Scalar() override { return this; }
General comment about vtnode classes: The IDE suggests here and at other places that:
Clang-Tidy: 'virtual' is redundant since the function is already declared 'override'
So, you could get rid of `virtual` wherever you have `override`.
src/hotspot/share/opto/vtransform.hpp line 345:
> 343: // Wrapper node for nodes outside the loop that are inputs to nodes in the loop.
> 344: // Since we want the loop-internal nodes to be able to reference all inputs as vtnodes,
> 345: // we must wrap the inputs that are outside the loop also into special vtnodes.
Suggestion:
// we must wrap the inputs that are outside the loop into special vtnodes, too.
src/hotspot/share/opto/vtransform.hpp line 377:
> 375: };
> 376:
> 377: // Transform introduces a shift-count node, that truncates the shift count for a vector shift.
Suggestion:
// Transform introduces a shift-count node that truncates the shift count for a vector shift.
src/hotspot/share/opto/vtransform.hpp line 421:
> 419: }
> 420:
> 421: const GrowableArray<Node*> nodes() const { return _nodes; }
Shouldn't this return a const reference?
-------------
PR Review: https://git.openjdk.org/jdk/pull/19719#pullrequestreview-2156605981
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665249564
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665399616
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665400589
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665402864
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665338136
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665321530
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665419275
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665420256
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665421545
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665424645
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1664284232
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665255316
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665263052
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665263650
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665326827
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665331427
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665359824
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665340856
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665361650
PR Review Comment: https://git.openjdk.org/jdk/pull/19719#discussion_r1665376985
More information about the hotspot-compiler-dev
mailing list