RFR: 8228622: [lworld] Ineffective codegeneration for flattened arrays checks causes large performance regression on List iteration

Wed Sep 11 14:38:39 UTC 2019

http://cr.openjdk.java.net/~roland/8228622/webrev.00/

In:

for (Integer i : list) {
}

i is loaded from an Object array so there's a flatten array check. It is
then casted to Integer. If the Integer cast doesn't fail, then the
flatten array check should not succeed. This change transforms:

if (array flattened) {
  res = new buffer;
  copyto(res);
} else {
}
if (!res instanceof Integer) {
  unc at checkcast
}

to

if (array flattened) {
  unc at aaload
}
if (!res instanceof Integer) {
  unc at checkcast
}

This is performed in several steps.

The type system carries a new attribute for TypeInstPtr:
flatten_array. It is set for the newly allocated buffer in the flat
array branch.

CmpPNode::Value now returns a constant for a type check where the
comparison is between a TypeInstPtr with flatten_array set to true and a
constant klass for which flatten_array can't be true.

That makes the CmpPNode for the Integer type check a candidate for split
if. Because the type check is (CmpP (LoadKlass (Phi ...))) special logic
is required to push the LoadKlass through Phi first.

After split if, we have:

if (array flattened) {
  res = new buffer;
  copyto(res);
  unc at checkast
}
if (!res instanceof Integer) {
  unc at checkcast
}

Now the buffer allocation and initialization is useless. So
CallStaticJavaNode::Ideal has logic to move the unc up to the allocation
and remove the allocation and initialization.

Finally this patch also implements flattened array checks using the
storage properties.

All together, performance with EnableValhalla on or off is identical for
2 of the 3 tests. In the 3rd one, the flattened array check can't be
hoisted and so performance is still ~10% behind with +EnableValhalla
which is still much better than what we currently get.

Roland.