[lworld] RFR: 8255046: [lworld] JIT should make use of array layout encoding in markWord
skuksenko at openjdk.java.net
Fri Nov 13 02:58:10 UTC 2020
On Thu, 12 Nov 2020 14:47:56 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> This patch re-implements the flat array check in C1 and C2 by using the mark word bits instead of the layout helper from the array Klass (see [JDK-8247299](https://bugs.openjdk.java.net/browse/JDK-8247299)). Unfortunately, this turned out to be far from trivial to implement in C2.
>> I've introduced a new FlatArrayCheck macro node to wrap the logic of the new check and to make it easier for the loop unswitching optimization to detect and hoist the check. One major problem is that we can't use immutable memory anymore because we are loading the mark word which is mutable (`AliasType::_is_rewritable` is `true`). Although the bits we are interested in are in fact immutable (we check for `markWord::unlocked_value`), we need to use raw memory to not break anti dependency analysis. As a result, flat array checks are not hoisted out of loops anymore and loop unswitching fails. `PhaseIdealLoop::move_flat_array_check_out_of_loop` will attempt to still move flat array checks out of loops by walking up the memory edge to before the loop and re-wiring the check accordingly.
>> This patch also fixes an existing issue in Escape Analysis that was only triggered now that we are able to fold the flat array check in more cases due to the `::Ideal`/`::Value `transformations.
>> @kuksenko, could you please evaluate performance of this change? Disabling the `UseArrayMarkWordCheck` flag allows to switch back to the old check using the layout helper.
> Looks good to me.
Performance behavior is as expected.
aaload operation got speedup up +15% on some scenarios.
In general, cost of aaload operation now is 103%-106% from legacy (non valhalla) cost.
and it looks like the minimally possible overhead of aaload operation.
Unfortunately, if array is locked aaload got 20% slowdown (also expected). locking on an array object is quite rare situation.
As for aastore operation - performance unchanged. aastore operation has the same cost in valhalla and in legacy world long before that. The reason of that is either hotspot has precise type information (and doesn't generate checks) or ArrayStoreException check is generated (ArrayStoreException check dominates aastore performance).
Note: I considered benchmarks when ArrayFlattened check can't be hoisted out loop (ref to array is loaded on every loop iteration).
More information about the valhalla-dev