RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization

Emanuel Peter epeter at openjdk.org
Mon Jun 26 14:59:25 UTC 2023


On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li <pli at openjdk.org> wrote:

> ## TL;DR
> 
> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request.

A few more comments, did not have much time today. More tomorrow ;)

src/hotspot/share/opto/vmaskloop.cpp line 403:

> 401:       int opc = node->Opcode();
> 402:       BasicType bt = elem_bt(node);
> 403:       int vlen = Matcher::max_vector_size(bt);

Theoretically, different `bt` can have different `Matcher::vector_width_in_bytes`. So `vlen` would not always correspond to `MaxVectorSize / element_size`. It just means that here you would end up checking for a shorter length than maybe expected? But maybe that is intended, it depends on how you generate the nodes later.

src/hotspot/share/opto/vmaskloop.cpp line 424:

> 422:         int vopc = 0;
> 423:         if (node->is_Mem()) {
> 424:           vopc = node->is_Store() ? Op_StoreVectorMasked : Op_LoadVectorMasked;

Mabye just for good measure: add an assert that it can only be a Load or a Store.

src/hotspot/share/opto/vmaskloop.cpp line 429:

> 427:         }
> 428:         if (vopc == 0 ||
> 429:             !Matcher::match_rule_supported_vector_masked(vopc, vlen, bt)) {

Do all nodes need to be maskable? Or is it enough if only load/store are maskable?

src/hotspot/share/opto/vmaskloop.cpp line 442:

> 440: // nodes to bail out for complex loops
> 441: bool VectorMaskedLoop::analyze_loop_body_nodes() {
> 442:   VectorSet tracked(_arena);

This is probably a good case where you could use `ResourceMark rm;` and just put the `VectorSet` on the default resource arena.

src/hotspot/share/opto/vmaskloop.cpp line 465:

> 463:   for (int idx = 0; idx < n_nodes; idx++) {
> 464:     Node* node = _body_nodes.at(idx);
> 465:     if ((node->is_Mem() && node->as_Mem()->is_Store())) {

Suggestion:

    if ((node->is_Mem() && node->is_Store())) {

src/hotspot/share/opto/vmaskloop.cpp line 474:

> 472:         if (!in_body(out)) {
> 473:           trace_msg(node, "Node has out-of-loop user found");
> 474:           return false;

Can this be handled in the future with a extract node? I guess you would have to extract it from a variable element, as the last iteration is not always the same.

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/14581#pullrequestreview-1497718513
PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1241683613
PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1241665519
PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1241668030
PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1242324850
PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1242330208
PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1242333503


More information about the hotspot-dev mailing list