[Exp] First prototype of new acmp bytecode
tobias.hartmann at oracle.com
Thu Mar 8 16:39:07 UTC 2018
I've added type speculation on non-nullness to avoid the null check when emitting the new acmp (it
only helps for non-jsr292 methods if -XX:TypeProfileLevel is set to > 111). We might want to add
more speculation in the future but I think that should be enough for now. I've also fixed the code
in macro.cpp and converted the test to the jtreg format:
Here is a simple JMH benchmark that tests some common cases:
Benchmark Mode Cnt Score Error Units
NewAcmpBenchmark.newCmp thrpt 200 108.911 ± 0.086 ops/us
NewAcmpBenchmark.newCmpDoubleNull thrpt 200 88.206 ± 4.792 ops/us
NewAcmpBenchmark.newCmpDoubleNullFalse thrpt 200 72.742 ± 7.563 ops/us
NewAcmpBenchmark.newCmpField thrpt 200 107.090 ± 0.083 ops/us
NewAcmpBenchmark.oldCmp thrpt 200 114.466 ± 0.077 ops/us
-XX:-TieredCompilation -XX:+UseNewAcmp -XX:ValueBasedClasses=compiler/valhalla/valuetypes/MyValue
Benchmark Mode Cnt Score Error Units
NewAcmpBenchmark.newCmp thrpt 200 101.480 ± 0.260 ops/us
NewAcmpBenchmark.newCmpDoubleNull thrpt 200 90.429 ± 4.741 ops/us
NewAcmpBenchmark.newCmpDoubleNullFalse thrpt 200 81.230 ± 4.115 ops/us
NewAcmpBenchmark.newCmpField thrpt 200 102.224 ± 0.019 ops/us
NewAcmpBenchmark.oldCmp thrpt 200 114.336 ± 0.239 ops/us
In the worst case, if we need to emit the new acmp and the first operand is not null, there is a
performance impact of 6.80% (see newCmp).
However, in many cases we can use static type information to optimize. For example, if we know that
one operand is a value type, we can emit a "double null check". This causes the performance impact
to disappear into the noise (see newCmpDoubleNull). If we know in addition that one operand is
always non-null, we can emit a static false. This improves performance by ~11% (high error) compared
to old acmp.
There is one pitfall. If we compare two object fields, C2 optimizes old acmp to directly compare the
narrow oops (no need to decode). With the new acmp, we need to decode the oop because we use derived
oops for perturbation. Surprisingly, the newCmpField benchmark shows that the regression is even
lower than in the newCmp case (4.5%). That's probably because the comparison is always false and
therefore the CPUs branch prediction works better, mitigating the performance impact of the
The last benchmark (oldCmp) verifies that if C2 is able to determine that one operand is not a value
type, we can use the old acmp and performance is equal to the baseline.
I will re-run the tests with type speculation enabled to see how much of a difference that makes.
I think this is stable enough to be pushed to the Exp branch. Any objections?
On 23.02.2018 14:22, Tobias Hartmann wrote:
> Hi John,
> On 21.02.2018 22:04, John Rose wrote:
>> You might even be able to get rid of the special node type (arity=3),
>> if the cases where CmpP sees a derived oop can be recognized
>> as perturbations. I don't think we do CmpP on derived oops in
>> any other circumstance (no C-style pointer/limit loops).
> Yes, we only use CmpP with AddP inputs for raw pointer comparisons (for example, in
> PhaseMacroExpand::expand_allocate_common) and we can easily filter these out.
> Here's the new webrev:
> Changes include:
> - Using derived oops for perturbation
> - Got rid of all CastX2P usages
> - Removed additional input edge from CmpP
> - Factored common code into separate methods
> - Swap operand optimization to avoid null checks in new acmp
> - Added code to ensure OrX is folded to null check or constant false if possible
> - Interface supertype support
> I've executed performance runs with -XX:-TieredCompilation -XX:ValueBasedClasses= -XX:+UseNewAcmp
> and there is no significant performance difference with SPECjvm2008 and SPECjbb2015 (and some of our
> internal benchmarks).
> - JMH benchmarks
> - Type speculation on (non-)nullness
> - Fix changes in macro.cpp
> - Convert test to jtreg format
> Best regards,
More information about the valhalla-dev