[jdk18] RFR: 8273107: RunThese24H times out with "java.lang.management.ThreadInfo.getLockName()" is null
Daniel D.Daugherty
dcubed at openjdk.java.net
Wed Dec 15 01:15:25 UTC 2021
On Tue, 14 Dec 2021 21:16:02 GMT, Daniel D. Daugherty <dcubed at openjdk.org> wrote:
> RunThese24H sometimes times out with a couple of error msgs:
> - "java.lang.management.ThreadInfo.getLockName()" is null
> - ForkJoin common pool thread stuck
>
> The '"java.lang.management.ThreadInfo.getLockName()" is null' error msg was
> due to RunThese's use of an older JCK test suite which has since been fixed.
>
> The 'ForkJoin common pool thread stuck' failure mode is likely due to a thread
> spending a lot of time in ObjectSynchronizer::monitors_iterate() due to a
> VM_ThreadDump::doit() call. I say "likely" because I've never been able to
> reproduce this failure mode in testing outside of Mach5. With the Mach5
> sightings, all we have are thread dumps and core files and not a live process.
>
> The VM_ThreadDump::doit() call is trying to gather owned monitor information
> for all threads in the system. I've seen sightings of this failure mode with > 2000
> threads. I've also seen passing runs with > 1.7 million monitors on the in-use list.
> Imagine searching a larger in-use list for > 2000 threads. It just doesn't scale.
I have logging in place for all of the RunThese configs to gather a little
bit of 'monitorinflation' info. So far a new sighting has not happened,
but I'm hopeful that the stars will align sometime soon.
I have learned thru my own test jobs that a RunThese24H job doesn't
always include calls to VM_ThreadDump::doit() so it seems that there's
yet another factor in these RunThese24H failures.
The proposed solution in this patch is to call deflate_idle_monitors() from
VM_ThreadDump::doit() with a new and optional ObjectMonitorsHashtable
parameter. As deflate_idle_monitors() and deflate_monitor_list() do their
work, owned ObjectMonitors are added to the ObjectMonitorsHashtable
indexed by the owning JavaThread. After the deflate_idle_monitors() call
is finished VM_ThreadDump::doit() can find and return information about
the owned ObjectMonitors for each JavaThread without doing a linear walk
of the in-use monitor list for each JavaThread. A JavaThread* is used to
lookup a LinkedList in the ObjectMonitorsHashtable and if one exists, then
the JavaThread* owns all the ObjectMonitors on that LinkedList.
Also, the usual work done by deflate_idle_monitors() and deflate_monitor_list()
will serve to reduce the length of the in-use monitor list which makes the
in-use monitor list more efficient for other parts of the system.
This fix has been tested with Mach5 Tier[1-8]. I've also done some targeted
memory leak testing since ObjectMonitorsHashtable is new (see the bug
report for details). I've rebased the fix to the latest JDK18 repo as of 2021.12.14
mid-day and I'm doing some sanity checking with Mach5 Tier[1-3].
It would be good to hear from @dholmes-ora, @coleenp, @fisk and @robehn
since they were my reviewers on:
JDK-8253064 monitor list simplifications and getting rid of TSM
https://bugs.openjdk.java.net/browse/JDK-8253064
-------------
PR: https://git.openjdk.java.net/jdk18/pull/25
More information about the hotspot-runtime-dev
mailing list