[jdk18] RFR: 8273107: RunThese24H times out with "java.lang.management.ThreadInfo.getLockName()" is null [v3]

Coleen Phillimore coleenp at openjdk.java.net
Fri Dec 17 14:38:30 UTC 2021


On Fri, 17 Dec 2021 01:25:00 GMT, Daniel D. Daugherty <dcubed at openjdk.org> wrote:

>> RunThese24H sometimes times out with a couple of error msgs:
>> - "java.lang.management.ThreadInfo.getLockName()" is null
>> - ForkJoin common pool thread stuck
>> 
>> The '"java.lang.management.ThreadInfo.getLockName()" is null' error msg was
>> due to RunThese's use of an older JCK test suite which has since been fixed.
>> 
>> The 'ForkJoin common pool thread stuck' failure mode is likely due to a thread
>> spending a lot of time in ObjectSynchronizer::monitors_iterate() due to a
>> VM_ThreadDump::doit() call. I say "likely" because I've never been able to
>> reproduce this failure mode in testing outside of Mach5. With the Mach5
>> sightings, all we have are thread dumps and core files and not a live process.
>> 
>> The VM_ThreadDump::doit() call is trying to gather owned monitor information
>> for all threads in the system. I've seen sightings of this failure mode with > 2000
>> threads. I've also seen passing runs with > 1.7 million monitors on the in-use list.
>> Imagine searching a larger in-use list for > 2000 threads. It just doesn't scale.
>
> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision:
> 
>   coleenp CR - cleanup type safety

Approve again. This makes me a lot happier.

Re: both Robbin's and Erik's comments. I think Robbin's should be an RFE and Erik's should be another bug report and fixed separately.  This change fixes the timeout that's been observed multiple times and preserves existing behavior (with existing bugs). I think some redesign is in order to make this correct all the time.

-------------

Marked as reviewed by coleenp (Reviewer).

PR: https://git.openjdk.java.net/jdk18/pull/25


More information about the hotspot-runtime-dev mailing list