RFR: 8276662: Scalability bottleneck in SymbolTable::lookup_common() [v2]
Ioi Lam
iklam at openjdk.java.net
Fri Nov 19 04:15:42 UTC 2021
On Thu, 18 Nov 2021 23:25:19 GMT, Derek White <drwhite at openjdk.org> wrote:
>> Symbol table lookup had an optimization added when the symbol table was split into a shared table (for CDS?) and a local table. The optimization tries to track which table successfully found a symbol, so it can try that table first the next time.
>>
>> Symbol table lookup is used in many JVM operations, including classloading, serialization, and reflection.
>>
>> At startup time, more symbols will be from the shared table, but over time lookup can will be from a mix of local and shared symbols (eg user classes still have java.lang.String fields or subclass from java.lang.Object), resulting in multiple threads fighting over the value of this global variable.
>>
>> With enough threads and cores, this can result in "true sharing" cache line contention.
>>
>> This fix solves the scalability issue by checking the shared table first "early on", and when enough local symbols have been added, then check the local table first.
>>
>> Other options would also solve the the scaling problem, but may change the behavior that we're trying to optimize, or add more overhead or complexity than warranted, such as:
>> - Statically preferring the shared or local table
>> - Using a thread-local variable to track which table to search first
>> - Using a NUMA-aware set of N variables distributed over M threads.
>
> Derek White has updated the pull request incrementally with one additional commit since the last revision:
>
> Fix scalability with THREAD_LOCAL
Marked as reviewed by iklam (Reviewer).
I did a quick check of the latest patch. Small start-up runs like this are susceptible to effects of dynamic frequency scaling due to CPU heat, so I interleaved the runs just to be safe:
for i in 1 2 3 4 5; do
for v in old new; do
echo $v.$i.txt
perf stat -o $v.$i.txt -r 500 ./jdk_$v/bin/java \
-Xmx128m -Xshare:on -version 2> /dev/null
done
done
old.1.txt: 0.044077 +- 0.000425 seconds time elapsed ( +- 0.96% )
old.2.txt: 0.043140 +- 0.000385 seconds time elapsed ( +- 0.89% )
old.3.txt: 0.044393 +- 0.000415 seconds time elapsed ( +- 0.93% )
old.4.txt: 0.043779 +- 0.000407 seconds time elapsed ( +- 0.93% )
old.5.txt: 0.043221 +- 0.000390 seconds time elapsed ( +- 0.90% )
geomean = 0.043719
new.1.txt: 0.043791 +- 0.000405 seconds time elapsed ( +- 0.93% )
new.2.txt: 0.042716 +- 0.000364 seconds time elapsed ( +- 0.85% )
new.3.txt: 0.043203 +- 0.000382 seconds time elapsed ( +- 0.88% )
new.4.txt: 0.043437 +- 0.000398 seconds time elapsed ( +- 0.92% )
new.5.txt: 0.043012 +- 0.000382 seconds time elapsed ( +- 0.89% )
geomean = 0.043230
So at least on my machine (10 year old dual socket Xeon/Sandybridge) the patch doesn't seem to introduce any regression.
-------------
PR: https://git.openjdk.java.net/jdk/pull/6400
More information about the hotspot-runtime-dev
mailing list