RFR: 8346664: C2: Optimize mask check with constant offset [v17]

Wed Feb 5 08:19:19 UTC 2025

On Sun, 2 Feb 2025 21:36:03 GMT, Matthias Ernst <duke at openjdk.org> wrote:

>> Fixes [JDK-8346664](https://bugs.openjdk.org/browse/JDK-8346664): extends the optimization of masked sums introduced in #6697 to cover constant values, which currently break the optimization.
>> 
>> Such constant values arise in an expression of the following form, for example from `MemorySegmentImpl#isAlignedForElement`:
>> 
>> 
>> (base + (index + 1) << 8) & 255
>> => MulNode
>> (base + (index << 8 + 256)) & 255
>> => AddNode
>> ((base + index << 8) + 256) & 255
>> 
>> 
>> Currently, `256` is not being recognized as a shifted value. This PR enables further reduction:
>> 
>> 
>> ((base + index << 8) + 256) & 255
>> => MulNode (this PR)
>> (base + index << 8) & 255
>> => MulNode (PR #6697)
>> base & 255 (loop invariant)
>> 
>> 
>> Implementation notes:
>> * I verified that the originating issue "scaled varhandle indexed with i+1"  (https://mail.openjdk.org/pipermail/panama-dev/2024-December/020835.html) is resolved with this PR.
>> * ~in order to stay with the flow of the current implementation, I refrained from solving general (const & mask)==0 cases, but only those where const == _ << shift.~
>> * ~I modified existing test cases adding/subtracting from the index var (which would fail with current C2). Let me know if would like to see separate cases for these.~
>
> Matthias Ernst has updated the pull request incrementally with one additional commit since the last revision:
> 
>   jlong, not long

I have a few more coming, but need to hop off the train real quick ;)

src/hotspot/share/opto/mulnode.cpp line 2059:

> 2057: 
> 2058: // Returns a lower bound on the number of trailing zeros in expr, or -1 if the number
> 2059: // cannot be determined.

Why not just return `0` if we cannot determine it? That would still be a correct lower bound, right?

src/hotspot/share/opto/mulnode.cpp line 2081:

> 2079:     const TypeInt* rhs_t = phase->type(expr->in(2))->isa_int();
> 2080:     if (rhs_t == nullptr || !rhs_t->is_con()) {
> 2081:       return -1;

Suggestion:

  // Pattern: expr = (x << shift)
  if (expr->Opcode() == Op_LShift(bt)) {
    const TypeInt* shift_t = phase->type(expr->in(2))->isa_int();
    if (shift_t == nullptr || !shift_t->is_con()) {
      return -1;

src/hotspot/share/opto/mulnode.cpp line 2083:

> 2081:       return -1;
> 2082:     }
> 2083:     return rhs_t->get_con() & (type2aelembytes(bt) * BitsPerByte - 1);

Suggestion:

    // We need to truncate the shift, as it may not have been canonicalized yet.
    // T_INT:  0..31 -> shift_mask = 4 * 8 - 1 = 31
    // T_LONG: 0..63 -> shift_mask = 8 * 8 - 1 = 63
    jint shift_mask = type2aelembytes(bt) * BitsPerByte - 1;
    return shift_t->get_con() & shift_mask;

src/hotspot/share/opto/mulnode.cpp line 2102:

> 2100: //   (AndL (ConL (_ << #N)) #M)
> 2101: // The M and N values must satisfy ((-1 << N) & M) == 0.
> 2102: static bool AndIL_is_zero_element_under_mask(const PhaseGVN* phase, const Node* expr, const Node* mask, BasicType bt) {

Suggestion:

// Checks whether expr is neutral element (zero) under mask. We have:
//   (AndX expr mask)
// The X in AndX must be I or L, depending on bt.
//
// We split the bits of expr into MSB and LSB, where LSB represents
// all trailing zeros of expr:
//   MSB    LSB
//   xxxxxx 0000000000
//
// We check if the mask has no one bits in the corresponding higher
// bits, i.e. if the number of trailing zeros is larger or equal to the
// bit width of the expr, i.e. if the number of leading zeros for mask
// is greater or equal to the number of bits in MSB:
//   000000 00000yyyyy -> (AndX expr mask) = 0                 -> return true
//   0000yy yyyyyyyyyy -> (AndX expr mask) = 0000zz 0000000000 -> return false
//
static bool AndIL_is_zero_element_under_mask(const PhaseGVN* phase, const Node* expr, const Node* mask, BasicType bt) {

src/hotspot/share/opto/mulnode.cpp line 2104:

> 2102: static bool AndIL_is_zero_element_under_mask(const PhaseGVN* phase, const Node* expr, const Node* mask, BasicType bt) {
> 2103:   jint expr_trailing_zeros = AndIL_min_trailing_zeros(phase, expr, bt);
> 2104:   if (expr_trailing_zeros < 0) {

It feels a little strange that the number of trailing zeros could be negative...
That's why I would return 0 if we can prove nothing. It is still clear that we can do nothing here if it is zero, so we can just compare `<= 0`.

Or what was the reason for returning `-1`?

src/hotspot/share/opto/mulnode.cpp line 2111:

> 2109:   if (mask_t == nullptr || mask_t->lo_as_long() < 0) {
> 2110:     return false;
> 2111:   }

Suggestion:

  // When the mask is negative, it has the most significant bit set.
  const TypeInteger* mask_t = phase->type(mask)->isa_integer(bt);
  if (mask_t == nullptr || mask_t->lo_as_long() < 0) {
    return false;
  }

src/hotspot/share/opto/mulnode.cpp line 2113:

> 2111:   }
> 2112: 
> 2113:   jint mask_bit_width = mask_t->hi_as_long() == 0 ? 0 : (BitsPerLong - count_leading_zeros(mask_t->hi_as_long()));

I would just split this into 2 separate things.
Suggestion:

  // Is the mask always zero?
  if (mask_t->hi_as_long() == 0) {
    assert(mask_t->lo_as_long() == 0, "checked earlier");
    return true;
  }
  jint mask_bit_width =  BitsPerLong - count_leading_zeros(mask_t->hi_as_long());

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/22856#pullrequestreview-2594812563
PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1942353461
PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1942358554
PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1942370863
PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1942402331
PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1942405802
PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1942407646
PR Review Comment: https://git.openjdk.org/jdk/pull/22856#discussion_r1942411441