Are there limits to SA's ability to produce a stack trace for a thread?

Chris Plummer chris.plummer at oracle.com
Wed Apr 15 06:13:35 UTC 2020


On 4/14/20 10:56 PM, David Holmes wrote:
> Hi Chris,
>
> On 15/04/2020 1:37 pm, Chris Plummer wrote:
>> Hello,
>>
>> [Sorry this email got kind of long. To cut to the chase, I want to 
>> know if there are times where it is acceptable for SA to not be able 
>> to produce a stack trace for a thread. Details below if you are 
>> interested.]
>
> How does the SA currently attempt to get a stacktrace for a thread? If 
> the current mechanism has limitations then perhaps that will be 
> addressed now that we have per-thread handshakes? With handshakes the 
> target thread will always be brought to a state where the stack is 
> walkable. That said I thought all existing mechanisms used a safepoint 
> VM op to get a stacktrace for a different thread.
Hi David,

I don't know the details, but I think there is no safepointing involved. 
Consider that you can also get thread stack traces from a JVM core file, 
which could be produced at any given moment in the JVM's execution. Note 
SA has lots of safeguards around this. It has ways of verifying if 
something points to what it is really suppose to point to. For example 
if SA thinks something is a pointer to a heap object, it verifies that 
the object header contains a pointer to something that is known to be an 
Klass, which it can determine by looking at the vtable (and then cross 
referencing with symbolic info it pulls from the executable). I'm just 
not too sure to what extent this applies to thread stack walking. As I 
mentioned below, thread.getLastJavaVFrameDbg() sometimes returns null, 
thus no stack trace.

   /** This should only be used by a debugger. Uses the current frame
       guess to attempt to get the topmost JavaVFrame.
       (getLastJavaVFrame, as a port of the VM's routine, assumes the
       VM is at a safepoint.) */
   public JavaVFrame getLastJavaVFrameDbg() {

So it seems that since we are not at a safepoint, and there's a lot of 
"guessing" in the implementation. I gather from this the thread won't 
always be in a state where we can determine the last vframe, and 
therefore not in a state where SA can produce a stack trace.

thanks,

Chris
>
> Cheers,
> David
> -----
>
>
>> We have a number of SA tests that request a thread dump, look for a 
>> specific symbol in the thread dump, and fail if the symbol is not 
>> found. Normally what they are looking for is LingeredApp.main() which 
>> should be in the stack trace of the main thread. ClhsdbJstack.java is 
>> one such test. It expects the main thread to look like:
>>
>> "main" #1 prio=5 tid=0x000001d6301de800 nid=0x3258 waiting on 
>> condition [0x0000007fc1dff000]
>>     java.lang.Thread.State: TIMED_WAITING (sleeping)
>>     JavaThread state: _thread_blocked
>>   - java.lang.Thread.sleep(long) @bci=0, pc=0x000001d640d0f417, 
>> Method*=0x00000008000e8898 (Interpreted frame)
>>   - jdk.test.lib.apps.LingeredApp.main(java.lang.String[]) @bci=54, 
>> line=499, pc=0x000001d640d0a1b3, Method*=0x000001d658673ba0 
>> (Interpreted frame)
>>
>> But sometimes all it gets is:
>>
>> "main" #1 prio=5 tid=0x00007fab2e802000 nid=0x2303 runnable 
>> [0x0000000000000000]
>>     java.lang.Thread.State: RUNNABLE
>>     JavaThread state: _thread_in_java
>>
>> This results in the test failing because it does not find 
>> LingeredApp.main in the output. The state for the passing case is 
>> always _thread_blocked and for the failing case _thread_in_java. This 
>> has been reported by the following CR:
>>
>> [1] JDK-8242411 - serviceability/sa/ClhsdbCDSJstackPrintAll.java 
>> fails with Test ERROR java.lang.RuntimeException: 'LingeredApp.main' 
>> missing from stdout/stderr
>>
>> After starting, LingeredApp.main sits in a loop:
>>
>>              while (Files.exists(path)) {
>>                  // Touch the lock to indicate our readiness
>>                  setLastModified(theLockFileName, epoch());
>>                  Thread.sleep(spinDelay);
>>              }
>>
>> So it's basically waiting for the lock file to be deleted. By default 
>> spinDelay is 1 second. I suspected the issue I was seeing was due to 
>> asking for the thread dump when not blocked on the sleep(), so I 
>> changed spingDelay to 1ms. That made this missing stack trace issue 
>> much easier to reproduce, plus a several other bugs that are filed, 
>> but normally rarely reproduce:
>>
>> [2] JDK-8231634 - SA stack walking fails with "illegal bci"
>> [3] JDK-8240781 - serviceability/sa/ClhsdbJdis.java fails with 
>> "java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for 
>> length 1"
>> [4] JDK-8211923 - [Testbug] serviceability/sa/ClhsdbFindPC.java 
>> ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
>> [5] JDK-8204994 - SA might fail to attach to process with "Windbg 
>> Error: WaitForEvent failed"
>>
>> The "illegal bci" failure I haven't looked into much, but is likely 
>> an SA bug due to SA having issues with (and probably making 
>> assumptions about) the state of the stack.
>>
>> The two ArrayIndexOutOfBoundsException bugs are dups. They fail 
>> because the stack trace of the main thread is missing, and some 
>> String splitting logic in the test therefore fails and produces the 
>> ArrayIndexOutOfBoundsException.
>>
>> I'm not sure about the "WaitForEvent failed". It could be unrelated.
>>
>> I can probably make these all go away buy having Lingered.main() 
>> spawn a helper thread to do the above loop in. That would keep the 
>> main thread stable (blocked on a Thread.join). However, it also would 
>> hide some issues(like the "illegal bci" failure).
>>
>> The main reason for the email is to ask what are the expectations of 
>> SA's ability to dump a thread's stack trace. Is it expected that 
>> sometimes the thread will be in a state that prevents dumping the 
>> stack? I know for example that the reason we sometimes don't see a 
>> stack is because thread.getLastJavaVFrameDbg() is returning null. 
>> Basically SA throws up its hands and says "I can't do it"? Is that 
>> acceptable in some cases.
>>
>> thanks,
>>
>> Chris
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8242411
>> [2] https://bugs.openjdk.java.net/browse/JDK-8231634
>> [3] https://bugs.openjdk.java.net/browse/JDK-8240781
>> [4] https://bugs.openjdk.java.net/browse/JDK-8211923
>> [5] https://bugs.openjdk.java.net/browse/JDK-8204994




More information about the serviceability-dev mailing list