RFR: 8276662: Scalability bottleneck in SymbolTable::lookup_common()
Ioi Lam
iklam at openjdk.java.net
Fri Nov 19 02:33:40 UTC 2021
On Tue, 16 Nov 2021 19:14:41 GMT, Derek White <drwhite at openjdk.org> wrote:
>> Symbol table lookup had an optimization added when the symbol table was split into a shared table (for CDS?) and a local table. The optimization tries to track which table successfully found a symbol, so it can try that table first the next time.
>>
>> Symbol table lookup is used in many JVM operations, including classloading, serialization, and reflection.
>>
>> At startup time, more symbols will be from the shared table, but over time lookup can will be from a mix of local and shared symbols (eg user classes still have java.lang.String fields or subclass from java.lang.Object), resulting in multiple threads fighting over the value of this global variable.
>>
>> With enough threads and cores, this can result in "true sharing" cache line contention.
>>
>> This fix solves the scalability issue by checking the shared table first "early on", and when enough local symbols have been added, then check the local table first.
>>
>> Other options would also solve the the scaling problem, but may change the behavior that we're trying to optimize, or add more overhead or complexity than warranted, such as:
>> - Statically preferring the shared or local table
>> - Using a thread-local variable to track which table to search first
>> - Using a NUMA-aware set of N variables distributed over M threads.
>
> Yes, the thread-local should fix the scalability issue (I only tested on a small machine). And logically this performance hint makes more sense per-thread, not globally.
>
> On the other hand HotSpot doesn't through around C++ thread locals everywhere. "_thread" is pervasive enough to make it worthwhile. If there are concerns about using too many thread locals I wasn't sure if it be worth burning one for this purpose.
>
> If we want to use THREAD_LOCAL, we may need this:
>
>
> #ifdef USE_LIBRARY_BASED_TLS_ONLY
> static volatile bool _lookup_shared_first = false;
> #else
> static THREAD_LOCAL bool _lookup_shared_first = false;
> #endif
>
>
> (Not that I know what USE_LIBRARY_BASED_TLS_ONLY is really for :-)
@dwhite-intel do you have any performance results that you can share? Also, the PR description needs to be updated to reflect the final version.
-------------
PR: https://git.openjdk.java.net/jdk/pull/6400
More information about the hotspot-runtime-dev
mailing list