RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v3]

Mon Apr 12 07:06:34 UTC 2021

On Fri, 9 Apr 2021 21:52:45 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Revert changes to VectorLoadMask and add a VectorMaskCast for the same element size mask casting
>
> src/hotspot/share/opto/vectornode.cpp line 1236:
> 
>> 1234:         if (is_vector_mask) {
>> 1235:           if (in_vt->length_in_bytes() == out_vt->length_in_bytes() &&
>> 1236:               Matcher::match_rule_supported(Op_VectorMaskCast)) {
> 
> There's `Matcher::match_rule_supported_vector()` specifically for vector nodes.

OK, I will use this instead. Thanks!

> src/hotspot/share/opto/vectornode.cpp line 1237:
> 
>> 1235:           if (in_vt->length_in_bytes() == out_vt->length_in_bytes() &&
>> 1236:               Matcher::match_rule_supported(Op_VectorMaskCast)) {
>> 1237:             // VectorUnbox (VectorBox vmask) ==> VectorMaskCast (vmask)
> 
> It's better to implement it as a 2-step transformation and place it in `VectorLoadMaskNode::Ideal()`:
> VectorUnbox (VectorBox vmask) ==> VectorLoadMask (VectorStoreMask vmask) => VectorMaskCast (vmask)

Thanks for your comments. Yes, theoretically it's better to place it in `VectorLoadMaskNode::Ideal()`. Unfortunately, we met an issue that is related to optimization for `VectorStoreMask`. Considering the following case:
   LoadVector                                              LoadVector
           |                                                    |
   VectorLoadMask  (double)                          VectorLoadMask (double)
               |                                               |
           VectorUnbox   (long)          ==>           VectorStoreMask  (double)
                                                               |
                                                      VectorLoadMask  (long)
This case loads the masking values for a double type, and does a bitwise `and` operation. Since the type is mismatched, the compiler will try to do `VectorUnbox (VectorBox vmask) ==> VectorLoadMask (VectorStoreMask vmask)`. However, since there is the transformation `VectorStoreMask (VectorLoadMask value)  ==> value`, the above `VectorStoreMaskNode` will be optimized out. The final graph looks like:

   LoadVector                                                  LoadVector
          |                                                      /      \
        VectorLoadMask  (double)                               /         \
              |                      ==>     VectorLoadMask (double)      \
         VectorStoreMask (double)                                   VectorLoadMask (long)
                   |
            VectorLoadMask (long)
Since the two `VectorLoadMaskNode` have different element type, the GVN cannot optimize out one. So finally there will be two similar  `VectorLoadMaskNode`s. That's also why I override the `cmp/hash` for `VectorLoadMaskNode` in the first version.

So I prefer to add `VectorUnbox (VectorBox vmask) ==> VectorMaskCast (vmask)` directly.

> src/hotspot/share/opto/vectornode.hpp line 1247:
> 
>> 1245:     assert(type2aelembytes(in_vt->element_basic_type()) == type2aelembytes(vt->element_basic_type()), "element size must match");
>> 1246:   }
>> 1247: 
> 
> FTR there's a way to avoid platform-specific changes (in AD files) if we don't envision any non-trivial conversions to be supported: just disable matching of the node by overriding `Node::ideal_reg()`:
> virtual uint ideal_reg() const { return 0; } // not matched in the AD file

Good idea, I will try this way. Thanks!

-------------

PR: https://git.openjdk.java.net/jdk/pull/3238