Are there limits to SA's ability to produce a stack trace for a thread?

David Holmes david.holmes at oracle.com
Wed Apr 15 05:56:29 UTC 2020


Hi Chris,

On 15/04/2020 1:37 pm, Chris Plummer wrote:
> Hello,
> 
> [Sorry this email got kind of long. To cut to the chase, I want to know 
> if there are times where it is acceptable for SA to not be able to 
> produce a stack trace for a thread. Details below if you are interested.]

How does the SA currently attempt to get a stacktrace for a thread? If 
the current mechanism has limitations then perhaps that will be 
addressed now that we have per-thread handshakes? With handshakes the 
target thread will always be brought to a state where the stack is 
walkable. That said I thought all existing mechanisms used a safepoint 
VM op to get a stacktrace for a different thread.

Cheers,
David
-----


> We have a number of SA tests that request a thread dump, look for a 
> specific symbol in the thread dump, and fail if the symbol is not found. 
> Normally what they are looking for is LingeredApp.main() which should be 
> in the stack trace of the main thread. ClhsdbJstack.java is one such 
> test. It expects the main thread to look like:
> 
> "main" #1 prio=5 tid=0x000001d6301de800 nid=0x3258 waiting on condition 
> [0x0000007fc1dff000]
>     java.lang.Thread.State: TIMED_WAITING (sleeping)
>     JavaThread state: _thread_blocked
>   - java.lang.Thread.sleep(long) @bci=0, pc=0x000001d640d0f417, 
> Method*=0x00000008000e8898 (Interpreted frame)
>   - jdk.test.lib.apps.LingeredApp.main(java.lang.String[]) @bci=54, 
> line=499, pc=0x000001d640d0a1b3, Method*=0x000001d658673ba0 (Interpreted 
> frame)
> 
> But sometimes all it gets is:
> 
> "main" #1 prio=5 tid=0x00007fab2e802000 nid=0x2303 runnable 
> [0x0000000000000000]
>     java.lang.Thread.State: RUNNABLE
>     JavaThread state: _thread_in_java
> 
> This results in the test failing because it does not find 
> LingeredApp.main in the output. The state for the passing case is always 
> _thread_blocked and for the failing case _thread_in_java. This has been 
> reported by the following CR:
> 
> [1] JDK-8242411 - serviceability/sa/ClhsdbCDSJstackPrintAll.java fails 
> with Test ERROR java.lang.RuntimeException: 'LingeredApp.main' missing 
> from stdout/stderr
> 
> After starting, LingeredApp.main sits in a loop:
> 
>              while (Files.exists(path)) {
>                  // Touch the lock to indicate our readiness
>                  setLastModified(theLockFileName, epoch());
>                  Thread.sleep(spinDelay);
>              }
> 
> So it's basically waiting for the lock file to be deleted. By default 
> spinDelay is 1 second. I suspected the issue I was seeing was due to 
> asking for the thread dump when not blocked on the sleep(), so I changed 
> spingDelay to 1ms. That made this missing stack trace issue much easier 
> to reproduce, plus a several other bugs that are filed, but normally 
> rarely reproduce:
> 
> [2] JDK-8231634 - SA stack walking fails with "illegal bci"
> [3] JDK-8240781 - serviceability/sa/ClhsdbJdis.java fails with 
> "java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for 
> length 1"
> [4] JDK-8211923 - [Testbug] serviceability/sa/ClhsdbFindPC.java 
> ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
> [5] JDK-8204994 - SA might fail to attach to process with "Windbg Error: 
> WaitForEvent failed"
> 
> The "illegal bci" failure I haven't looked into much, but is likely an 
> SA bug due to SA having issues with (and probably making assumptions 
> about) the state of the stack.
> 
> The two ArrayIndexOutOfBoundsException bugs are dups. They fail because 
> the stack trace of the main thread is missing, and some String splitting 
> logic in the test therefore fails and produces the 
> ArrayIndexOutOfBoundsException.
> 
> I'm not sure about the "WaitForEvent failed". It could be unrelated.
> 
> I can probably make these all go away buy having Lingered.main() spawn a 
> helper thread to do the above loop in. That would keep the main thread 
> stable (blocked on a Thread.join). However, it also would hide some 
> issues(like the "illegal bci" failure).
> 
> The main reason for the email is to ask what are the expectations of 
> SA's ability to dump a thread's stack trace. Is it expected that 
> sometimes the thread will be in a state that prevents dumping the stack? 
> I know for example that the reason we sometimes don't see a stack is 
> because thread.getLastJavaVFrameDbg() is returning null. Basically SA 
> throws up its hands and says "I can't do it"? Is that acceptable in some 
> cases.
> 
> thanks,
> 
> Chris
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8242411
> [2] https://bugs.openjdk.java.net/browse/JDK-8231634
> [3] https://bugs.openjdk.java.net/browse/JDK-8240781
> [4] https://bugs.openjdk.java.net/browse/JDK-8211923
> [5] https://bugs.openjdk.java.net/browse/JDK-8204994


More information about the serviceability-dev mailing list