[lworld] RFR: 8267665: [lworld] Scalarization of nullable inline types

Thu Sep 2 07:54:55 UTC 2021

This patch implements support for scalarizing nullable inline types (`.ref`) in C2 compiled code. This enables full scalarization whenever C2 is able to prove that a local, argument, return, field or array element is an inline type, even when mixing with constant `null` or object/interface types. As a result, buffer allocations are avoided and field loads are folded. For example, in the following method, no allocation or store remains:

int test(boolean b, int x, Object arg) {
  Object val = new Point(x, x);
  if (b) {
    val = (Point.ref)arg;
  }
  method(); // Not inlined
  if (val == null) {
    return 42;
  }
  return ((Point.ref)val).x;
}

C2 compiles this to (pseudocode):

int test(boolean b, int x, Object arg) {
  bool isInit = true;
  if (b) {
    isInit = (arg != null);
    x = isInit ? ((Point.ref)arg).x : 0;
  }
  method(); // Keep x and isInit in the safepoint debug info for re-allocation on deopt
  return isInit ? x : 42;
}

For more examples, see the new tests added to `TestNullableInlineTypes.java` that would fail without this patch. The `failOn = {ALLOC ...` IR verification rules ensure that scalarization works and no allocation remains in the generated code. 

To achieve this,  a new `is_init` input is propagated with the inline type nodes in addition to the optional buffer oop input and the field values. Before accessing the field values, a null check is required (for example, see `Parse::do_field_access`). Such null checks are then rewired to check the `is_init` input instead (see `GraphKit::null_check_common`, `GraphKit::cast_not_null` and `CmpPNode::Ideal`). The `is_init` input is also propagated through safepoint debug info to use it  on deoptimization to decide if re-allocation is required or null should be passed to the interpreter.

Here is a high level overview of the changes:
- All places where we would previously just load an inline type ptr now apply scalarization.
- Support for `null` in the safepoint debug info and in deoptimization code. The `is_init` information is propagated through the safepoint debug info and checked before re-allocation during deoptimization.
- Fixed checkcast to properly replace a scalarized inline type in the parsing map.
- Greatly improved the `PhiNode::Ideal` transformation that pushed inline type nodes through phis. The code can now handle complex shapes like other phi inputs and data loops. See `PhiNode::push_inline_types_through`. This is required to reliably optimize complex loops.
- Removed the `ScalarizeInlineTypes` debug flag because all the logic just became too complex and hard to maintain. For the same reason, I decided to not introduce a new flag to disable scalarization of nullable inline types.
- Fixed and improved unsafe access and getClass intrinsics.
- Strengthened array store checks and replaced the corresponding runtime checks by asserts.
- Fixed several unrelated bugs that showed up during extensive testing and removed dead code.
- Applied test-driven-development by adding a new (IR verification) test for each new code path and optimization.
- Strengthened IR verification rules. `LOAD` and `STORE` rules are now matching all accesses. Added lots of additional rules to existing tests.

There are still some rough edges though, that I'll address with follow-up changes:
- For simplicity, the `is_init` input is currently an oop that needs to be null checked.
- We should get rid of the weird `InlineTypeNode`/`InlineTypePtrNode` duality. With the current code, they are only needed to differentiate between always-buffered and maybe-buffered. There is lots of code that can be removed/simplified with that.
- When an inline type is always buffered, it's better to use the oop for safepoint debug info instead of scalarizing to avoid keeping field loads alive. The corresponding logic in `InlineTypeBaseNode::make_scalar_in_safepoints` needs to be improved.
- Scalarization in the calling convention is not (yet) supported.

Tested with tier1-5 and valhalla specific compiler stress testing.

Thanks,
Tobias

-------------

Commit messages:
 - Final refactoring
 - More refactoring
 - More bug fixes after running stress testing. Tier1-5 and valhalla stress testing now pass
 - Merge branch 'lworld' into JDK-8267665
 - Removed ScalarizeInlineTypes flag, refactoring
 - Adjusted comments
 - Fixed unsafe accesses and lots of other bugs. Refactoring, cleanups. Finally, all tests (even mach5 tier1-3) pass
 - GraphKit cleanups - all tests pass
 - Match rules fix - all tests pass
 - Fixed unsafe accesses, properly replace casts in map
 - ... and 16 more: https://git.openjdk.java.net/valhalla/compare/994f6cfe...45258fbc

Changes: https://git.openjdk.java.net/valhalla/pull/543/files
 Webrev: https://webrevs.openjdk.java.net/?repo=valhalla&pr=543&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8267665
  Stats: 2995 lines in 44 files changed: 2438 ins; 199 del; 358 mod
  Patch: https://git.openjdk.java.net/valhalla/pull/543.diff
  Fetch: git fetch https://git.openjdk.java.net/valhalla pull/543/head:pull/543

PR: https://git.openjdk.java.net/valhalla/pull/543