[performance] Inline heuristics, scalar replacement and cold prolog of hot loop.
Sergey Kuksenko
sergey.kuksenko at oracle.com
Fri Feb 1 19:26:25 UTC 2019
Hi, All.
I'd like to raise some discussion - if it's possible to make C2 inline
heuristics better.
Code (just measure cost of iteration):
Map<Integer, Integer>map;
...
for (Iterator<Integer> iterator =map.keySet().iterator(); iterator.hasNext(); ) {
int v = iterator.next();
s += v;
}
For examples, I found that the iterator is scalarized if collection size
is quite small (10 or less) and it isn't scalarized is collection size
is large enough (10000, 1000000 elements). Direct cost of iterator
allocation and GC pressure is negligible small is our collection is
large enough. But, if iterator's state contains object references the
cost of GC write barriers may be large.
The reason why the iterator is scalarized for small collections and is
not scalarized for large collections is MinInliningThreshold option. So
we have a hot loop, but relatively cold prolog for the loop and
MinInliningThreshold prevents inlining methods called from the prolog.
At the same moment inlining such prolog methods may have significant
impact to loop performance, particularly fro such widely used pattern as
iteration.
You can find simple benchmarks for that here:
http://cr.openjdk.java.net/~skuksenko/hotspot/inline/
Here my results.
default:
Benchmark (size) Mode Cnt Score Error Units
Hash.sumIter 10000 thrpt 15 2929.112 ± 42.927 ops/s
Hash.sumIterHidden 10000 thrpt 15 3379.195 ± 30.589 ops/s
Tree.sumIter 10000 thrpt 15 3618.466 ± 126.785 ops/s
Tree.sumIterHidden 10000 thrpt 15 3576.429 ± 129.217 ops/s
-XX:MinInliningThreshold=0
Benchmark (size) Mode Cnt Score Error Units
Hash.sumIter 10000 thrpt 15 5392.184 ± 115.039 ops/s
Hash.sumIterHidden 10000 thrpt 15 3357.754 ± 60.636 ops/s
Tree.sumIter 10000 thrpt 15 4128.407 ± 56.748 ops/s
Tree.sumIterHidden 10000 thrpt 15 3601.293 ± 123.135 ops/s
So G1 write barriers costs 60% performance of HashMap iterator and 14%
performance of TreeMap iterator. Other GC also have cost, but G1 has
highest.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20190201/c86c1f91/attachment.html>
More information about the hotspot-compiler-dev
mailing list