RFR: 8330969: scalability issue with loaded JVMTI agent

Serguei Spitsyn sspitsyn at openjdk.org
Wed Apr 24 16:11:50 UTC 2024


This is a fix of the following JVMTI scalability issue. A closed benchmark with millions of virtual threads shows 4X overhead when a JVMTI agent has been loaded. For instance, this is observable when an app is executed under control of the Oracle Studio `collect` utility.

The benchmark takes a little bit more than 3 sec without any JVMTI agent:

Total:               in 3045 ms


The benchmark takes more than ~3.2X of the above when executed with the `collect` utility:

Creating experiment database test.1.er (Process ID: 25262) ...
Picked up JAVA_TOOL_OPTIONS: -agentlib:collector
Total:               in 9864 ms


With the fix in place the overhead of a JVMTI agent is around 1.2X:

Creating experiment database test.1.er (Process ID: 26442) ...
Picked up JAVA_TOOL_OPTIONS: -agentlib:collector
Total:               in 3765 ms


The most of the overhead is taken by two functions:
 - `JvmtiVTMSTransitionDisabler::start_VTMS_transition()`
 - `JvmtiVTMSTransitionDisabler::finish_VTMS_transition()`
 
 Oracle Studio Performance Analyzer `err_print utility shows the following performance data for these functions:
 ```
% er_print -viewmode expert -metrics ie.%totalcpu -csingle JvmtiVTMSTransitionDisabler::start_VTMS_transition test.1.er

Attr. Total     Name
CPU
  sec.      %
=============== Callers
42.930  50.06   SharedRuntime::notify_jvmti_vthread_mount(oopDesc*, unsigned char, JavaThread*)
21.505  25.08   JvmtiVTMSTransitionDisabler::VTMS_vthread_end(_jobject*)
21.315  24.86   JvmtiVTMSTransitionDisabler::VTMS_vthread_unmount(_jobject*, bool)

=============== Stack Fragment
81.407  94.94   JvmtiVTMSTransitionDisabler::start_VTMS_transition(_jobject*, bool)

=============== Callees
 4.083   4.76   java_lang_Thread::set_is_in_VTMS_transition(oopDesc*, bool)
 0.140   0.16   __tls_get_addr
 0.120   0.14   JNIHandles::resolve_external_guard(_jobject*)



% er_print -viewmode expert -metrics ie.%totalcpu -csingle JvmtiVTMSTransitionDisabler::finish_VTMS_transition test.1.er

Attr. Total     Name
CPU
  sec.      %
=============== Callers
47.363  52.59   SharedRuntime::notify_jvmti_vthread_unmount(oopDesc*, unsigned char, JavaThread*)
21.355  23.71   JvmtiVTMSTransitionDisabler::VTMS_vthread_mount(_jobject*, bool)
21.345  23.70   JvmtiVTMSTransitionDisabler::VTMS_vthread_start(_jobject*)

=============== Stack Fragment
64.145  71.22   JvmtiVTMSTransitionDisabler::finish_VTMS_transition(_jobject*, bool)

=============== Callees
25.288  28.08   java_lang_Thread::set_is_in_VTMS_transition(oopDesc*, bool)
 0.240   0.27   __tls_get_addr
 0.200   0.22   JavaThread::set_is_in_VTMS_transition(bool)
 0.190   0.21   JNIHandles::resolve_external_guard(_jobject*)


The main source of this overhead (~90% of overhead) is atomic increment and decrement of the global counter `VTMS_transition_count`:
 - `Atomic::inc(&_VTMS_transition_count)`;
 - `Atomic::dec(&_VTMS_transition_count)`;
 

The fix is to replace this global counter with mark bits `_VTMS_transition_mark` distributed over all `JavaThread`'s.
If these lines are commented out or replaced with the distributed thread-local marks the main performance overhead is gone:

% er_print -viewmode expert -metrics ie.%totalcpu -csingle JvmtiVTMSTransitionDisabler::start_VTMS_transition test.2.er

Attr. Total    Name
CPU
 sec.      %
============== Callers
1.801  64.29   SharedRuntime::notify_jvmti_vthread_mount(oopDesc*, unsigned char, JavaThread*)
0.580  20.71   JvmtiVTMSTransitionDisabler::VTMS_vthread_unmount(_jobject*, bool)
0.420  15.00   JvmtiVTMSTransitionDisabler::VTMS_vthread_end(_jobject*)

============== Stack Fragment
0.630  22.50   JvmtiVTMSTransitionDisabler::start_VTMS_transition(_jobject*, bool)

============== Callees
1.931  68.93   java_lang_Thread::set_is_in_VTMS_transition(oopDesc*, bool)
0.220   7.86   __tls_get_addr
0.020   0.71   JNIHandles::resolve_external_guard(_jobject*)



% er_print -viewmode expert -metrics ie.%totalcpu -csingle JvmtiVTMSTransitionDisabler::finish_VTMS_transition test.2.er

Attr. Total    Name
CPU
 sec.      %
============== Callers
1.661  39.15   JvmtiVTMSTransitionDisabler::VTMS_vthread_mount(_jobject*, bool)
1.351  31.84   JvmtiVTMSTransitionDisabler::VTMS_vthread_start(_jobject*)
1.231  29.01   SharedRuntime::notify_jvmti_vthread_unmount(oopDesc*, unsigned char, JavaThread*)

============== Stack Fragment
0.500  11.79   JvmtiVTMSTransitionDisabler::finish_VTMS_transition(_jobject*, bool)

============== Callees
2.972  70.05   java_lang_Thread::set_is_in_VTMS_transition(oopDesc*, bool)
0.350   8.25   JavaThread::set_is_in_VTMS_transition(bool)
0.340   8.02   __tls_get_addr
0.080   1.89   JNIHandles::resolve_external_guard(_jobject*)


The rest of the overhead (~10% of total overhead) is taken by calls to the function `java_lang_Thread::set_is_in_VTMS_transition()`. The plan is to address this in a separate fix. But it is expected to be a little bit more tricky.


Testing:
 - Tested with mach5 tiers 1-6

-------------

Commit messages:
 - 8330969: scalability issue with loaded JVMTI agent

Changes: https://git.openjdk.org/jdk/pull/18937/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18937&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8330969
  Stats: 38 lines in 5 files changed: 14 ins; 11 del; 13 mod
  Patch: https://git.openjdk.org/jdk/pull/18937.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18937/head:pull/18937

PR: https://git.openjdk.org/jdk/pull/18937


More information about the serviceability-dev mailing list