Update on PEA in C2 (Episode 4)
Liu, Xin
xxinliu at amazon.com
Thu Jul 13 19:42:31 UTC 2023
Hi,
We would like to update what we have done in C2 PEA in the last couple
of months.
We rootcaused some runtime errors. There are 2 reasons.
1) we need to replace the old object with the materialized object in
SafePointNode, or we will end up with
wrong objects after deoptimisation.
2) we need to replace the old object with the materialized object at
Parse::do_exits. We have to track
allocation state inter-procedurally when the method is inlined.
GraphKit::backfill_materialized() scans the inputs of a SafePointNode
and do the replacement. By fixing the
runtime error, C2 PEA starts running non-trivial Java programs.
We look into 2 examples from Graal website:
https://www.graalvm.org/22.1/examples/java-performance-examples/
blender.java is the kernel of sunflow. Sunflow is a ray tracer in Java.
C2 PEA makes it 38.58% faster due to
allocation reduction. Bender.java with C2 PEA still has 14% performance
gap comparing with Graal CE. Graal
PEA features a memory Read/Write replacement and can simplify a double
modulo to an integer modulo. We file a
JBS issue (JDK-8309636) but don't want to sidetracked by it.
In dacapo/sunflow, we measure the same execution time . The Geomean of
allocation rate reduces from
6716.596Mb/s to 5755.249 Mb/s , or 14.31%. Average of allocation rate
reduces from 7141.490 Mb/s to 6080.981
Mb/s , or 14.85%.
CountUppercase.java is a typical java program with stream API. We found
that C2 PEA has 30% more allocation than
default. The problem comes from object composition. I will explain it later.
For hotspot:tier-1 test, we still have 12 known failures. 3 of them are
due to object composition as well. 7
are locked up due to AbstractQueuedSynchronizer.
==============================
TEST TOTAL PASS FAIL ERROR
>> jtreg:test/hotspot/jtreg:tier1 2227 2210
4 8 <<
==============================
Remain problem: object composition
An object may contain fields of other objects. Those objects form a
directed cyclic graph. One revelation is
that it's impossible to get an object materialized individually. We
believe the minimal unit of
materialization is a strongly connected component of object graph.
Besides correctness, it also has problem for EA/SR. If we can't clone
the entire strongly connected
componenet, the original object will retain the connection of those
materialized objects. We materialize those
objects because they escape. The escapement will proprogate to the
original object over Field(-F>). As result,
the original object can't be eliminated or scalar replaced. We have
added an option 'PEAParanoid' to detect
this issue.
Graal PEA has a node called CommitAllocationNode which groups all
relevant VirtualObject nodes and processes
them in 2 passes.
https://github.com/oracle/graal/blob/2f3a8d5ab0cd538bd323fa29812509873e6f7807/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/replacements/DefaultJavaLoweringProvider.java#L900
We plan to materialize an object using DFS. It traverses all other
virtual objects through fields. We
expect to fix the performance issue of CountUppercase.java and some
regression failures with this feature.
We also refactored the implementation. The goal is to align the key data
structure 'aliases' to Graal
PEA. 'aliases' maps one node to a virtual object, so we can recognize
some nodes are aliases of virtual
objects in DFS. By moving almost all merging logic to MergeProcessor, it
is now less intrusive in
merge_common. Here is the PR:
https://github.com/navyxliu/jdk/pull/55
thanks,
--lx
More information about the hotspot-compiler-dev
mailing list