Are there limits to SA's ability to produce a stack trace for a thread?
David Holmes
david.holmes at oracle.com
Wed Apr 15 05:56:29 UTC 2020
Hi Chris,
On 15/04/2020 1:37 pm, Chris Plummer wrote:
> Hello,
>
> [Sorry this email got kind of long. To cut to the chase, I want to know
> if there are times where it is acceptable for SA to not be able to
> produce a stack trace for a thread. Details below if you are interested.]
How does the SA currently attempt to get a stacktrace for a thread? If
the current mechanism has limitations then perhaps that will be
addressed now that we have per-thread handshakes? With handshakes the
target thread will always be brought to a state where the stack is
walkable. That said I thought all existing mechanisms used a safepoint
VM op to get a stacktrace for a different thread.
Cheers,
David
-----
> We have a number of SA tests that request a thread dump, look for a
> specific symbol in the thread dump, and fail if the symbol is not found.
> Normally what they are looking for is LingeredApp.main() which should be
> in the stack trace of the main thread. ClhsdbJstack.java is one such
> test. It expects the main thread to look like:
>
> "main" #1 prio=5 tid=0x000001d6301de800 nid=0x3258 waiting on condition
> [0x0000007fc1dff000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> JavaThread state: _thread_blocked
> - java.lang.Thread.sleep(long) @bci=0, pc=0x000001d640d0f417,
> Method*=0x00000008000e8898 (Interpreted frame)
> - jdk.test.lib.apps.LingeredApp.main(java.lang.String[]) @bci=54,
> line=499, pc=0x000001d640d0a1b3, Method*=0x000001d658673ba0 (Interpreted
> frame)
>
> But sometimes all it gets is:
>
> "main" #1 prio=5 tid=0x00007fab2e802000 nid=0x2303 runnable
> [0x0000000000000000]
> java.lang.Thread.State: RUNNABLE
> JavaThread state: _thread_in_java
>
> This results in the test failing because it does not find
> LingeredApp.main in the output. The state for the passing case is always
> _thread_blocked and for the failing case _thread_in_java. This has been
> reported by the following CR:
>
> [1] JDK-8242411 - serviceability/sa/ClhsdbCDSJstackPrintAll.java fails
> with Test ERROR java.lang.RuntimeException: 'LingeredApp.main' missing
> from stdout/stderr
>
> After starting, LingeredApp.main sits in a loop:
>
> while (Files.exists(path)) {
> // Touch the lock to indicate our readiness
> setLastModified(theLockFileName, epoch());
> Thread.sleep(spinDelay);
> }
>
> So it's basically waiting for the lock file to be deleted. By default
> spinDelay is 1 second. I suspected the issue I was seeing was due to
> asking for the thread dump when not blocked on the sleep(), so I changed
> spingDelay to 1ms. That made this missing stack trace issue much easier
> to reproduce, plus a several other bugs that are filed, but normally
> rarely reproduce:
>
> [2] JDK-8231634 - SA stack walking fails with "illegal bci"
> [3] JDK-8240781 - serviceability/sa/ClhsdbJdis.java fails with
> "java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for
> length 1"
> [4] JDK-8211923 - [Testbug] serviceability/sa/ClhsdbFindPC.java
> ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
> [5] JDK-8204994 - SA might fail to attach to process with "Windbg Error:
> WaitForEvent failed"
>
> The "illegal bci" failure I haven't looked into much, but is likely an
> SA bug due to SA having issues with (and probably making assumptions
> about) the state of the stack.
>
> The two ArrayIndexOutOfBoundsException bugs are dups. They fail because
> the stack trace of the main thread is missing, and some String splitting
> logic in the test therefore fails and produces the
> ArrayIndexOutOfBoundsException.
>
> I'm not sure about the "WaitForEvent failed". It could be unrelated.
>
> I can probably make these all go away buy having Lingered.main() spawn a
> helper thread to do the above loop in. That would keep the main thread
> stable (blocked on a Thread.join). However, it also would hide some
> issues(like the "illegal bci" failure).
>
> The main reason for the email is to ask what are the expectations of
> SA's ability to dump a thread's stack trace. Is it expected that
> sometimes the thread will be in a state that prevents dumping the stack?
> I know for example that the reason we sometimes don't see a stack is
> because thread.getLastJavaVFrameDbg() is returning null. Basically SA
> throws up its hands and says "I can't do it"? Is that acceptable in some
> cases.
>
> thanks,
>
> Chris
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8242411
> [2] https://bugs.openjdk.java.net/browse/JDK-8231634
> [3] https://bugs.openjdk.java.net/browse/JDK-8240781
> [4] https://bugs.openjdk.java.net/browse/JDK-8211923
> [5] https://bugs.openjdk.java.net/browse/JDK-8204994
More information about the serviceability-dev
mailing list