Update on PEA in C2 (Episode 4)

Liu, Xin xxinliu at amazon.com
Thu Jul 13 19:42:31 UTC 2023


  Hi,

We would like to update what we have done in C2 PEA in the last couple 
of months.

We rootcaused some runtime errors. There are 2 reasons.
1) we need to replace the old object with the materialized object in 
SafePointNode, or we will end up with
wrong objects after deoptimisation.

2) we need to replace the old object with the materialized object at 
Parse::do_exits. We have to track
allocation state inter-procedurally when the method is inlined.

GraphKit::backfill_materialized() scans the inputs of a SafePointNode 
and do the replacement. By fixing the
runtime error, C2 PEA starts running non-trivial Java programs.

We look into 2 examples from Graal website: 
https://www.graalvm.org/22.1/examples/java-performance-examples/

blender.java is the kernel of sunflow. Sunflow is a ray tracer in Java. 
C2 PEA makes it 38.58% faster due to
allocation reduction.  Bender.java with C2 PEA still has 14% performance 
gap comparing with Graal CE.  Graal

PEA features a memory Read/Write replacement and can simplify a double 
modulo to an integer modulo. We file a
JBS issue (JDK-8309636) but don't want to sidetracked by it.

In dacapo/sunflow, we measure the same execution time . The Geomean of 
allocation rate reduces from
  6716.596Mb/s to 5755.249 Mb/s , or 14.31%. Average of allocation rate 
reduces from 7141.490 Mb/s to 6080.981
  Mb/s , or 14.85%.

CountUppercase.java is a typical java program with stream API. We found 
that C2 PEA has 30% more allocation than
default. The problem comes from object composition. I will explain it later.

For hotspot:tier-1 test, we still have 12 known failures. 3 of them are 
due to object composition as well. 7
are locked up due to AbstractQueuedSynchronizer.

==============================
    TEST                                              TOTAL  PASS FAIL ERROR
 >> jtreg:test/hotspot/jtreg:tier1                     2227 2210     
4     8 <<
==============================

Remain problem: object composition

An object may contain fields of other objects. Those objects form a 
directed cyclic graph. One revelation is
that it's impossible to get an object materialized individually. We 
believe the minimal unit of
materialization is a strongly connected component of object graph.

Besides correctness, it also has problem for EA/SR. If we can't clone 
the entire strongly connected
componenet, the original object will retain the connection of those 
materialized objects. We materialize those
objects because they escape. The escapement will proprogate to the 
original object over Field(-F>). As result,
the original object can't be eliminated or scalar replaced. We have 
added an option 'PEAParanoid' to detect
this issue.

Graal PEA has a node called CommitAllocationNode which groups all 
relevant VirtualObject nodes and processes
them in 2 passes.
https://github.com/oracle/graal/blob/2f3a8d5ab0cd538bd323fa29812509873e6f7807/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/replacements/DefaultJavaLoweringProvider.java#L900

We plan to materialize an object using DFS. It traverses all other 
virtual objects through fields. We
expect to fix the performance issue of CountUppercase.java and some 
regression failures with this feature.

We also refactored the implementation. The goal is to align the key data 
structure 'aliases' to Graal
PEA. 'aliases' maps one node to a virtual object, so we can recognize 
some nodes are aliases of virtual
objects in DFS. By moving almost all merging logic to MergeProcessor, it 
is now less intrusive in
merge_common. Here is the PR:
https://github.com/navyxliu/jdk/pull/55

thanks,

--lx



More information about the hotspot-compiler-dev mailing list