RFR(S): 8247533: SA stack walking sometimes fails with sun.jvm.hotspot.debugger.DebuggerException: get_thread_regs failed for a lwp

Thu Jun 25 04:08:29 UTC 2020

On 6/24/20 6:53 PM, Yasumasa Suenaga wrote:
> On 2020/06/25 10:00, Chris Plummer wrote:
>> On 6/24/20 5:17 PM, Yasumasa Suenaga wrote:
>>> On 2020/06/25 3:22, Chris Plummer wrote:
>>>> On 6/24/20 12:01 AM, Yasumasa Suenaga wrote:
>>>>> On 2020/06/24 15:32, Chris Plummer wrote:
>>>>>> Hi Yasumasa ,
>>>>>>
>>>>>> I think LinuxAMD64CFrame is used for pstack and what I've been 
>>>>>> looking at has been jstack, and in particular 
>>>>>> AMD64CurrentFrameGuess, which does use "last java frame". I'm not 
>>>>>> sure why LinuxAMD64CFrame does not look at "last java frame". 
>>>>>> Maybe it should.
>>>>>
>>>>> I thought both pattern (jstack, mixed stack) for this change.
>>>>> As you know, mixed jstack (jstack --mixed) attempt to find top of 
>>>>> native stack via LinuxAMD64CFrame, register values are needed for 
>>>>> it (so it depends on ptrace() call). So I guess mixed mode jstack 
>>>>> (jhsdb jstack --mixed) would not show any stacks (cannot find 
>>>>> "last java frame").
>>>> Hi Yasumasa,
>>>>
>>>> I should have been more clear on what I meant by jstack and pstack. 
>>>> For jstack I meant using StackTrace.java, which is what you get by 
>>>> default with "jhsdb jstack" and also the clhsdb jstack command. For 
>>>> pstack I meant PStack.java, which is what you get with "jhsdb 
>>>> jstack --mixed" or the clhsdb pstack command.
>>>>
>>>> So this CR impacts both types of stack traces in that they will get 
>>>> null registers when the the lower level API fails to get the 
>>>> register set. For StackTrace.java it will then defer to "last java 
>>>> frame" if available. For PStack.java it will not, and will always 
>>>> result in no stack trace. The code of interest is here:
>>>>
>>>>         AMD64ThreadContext context = (AMD64ThreadContext) 
>>>> thread.getContext();
>>>>         Address pc  = 
>>>> context.getRegisterAsAddress(AMD64ThreadContext.RIP);
>>>>         if (pc == null) return null;
>>>>         return LinuxAMD64CFrame.getTopFrame(dbg, pc, context);
>>>>
>>>> So the question is should "last java frame" be used if pc == null. 
>>>> If so, then getTopFrame() would also need to be modified to use 
>>>> "last java frame" when fetching RBP.
>>>
>>> I don't think so because CFrame is defined as "Models a "C" 
>>> programming language frame on the stack" in the javadoc, so it 
>>> should have *valid* register values IMHO.
>>> In addition, RIP is needed for Linux AMD64 at least because it would 
>>> use DWARF since JDK-8234624.
>>>
>> Hi Yasumasa,
>>
>> I don't quite understand the "C" frame nomenclature since CFrame is 
>> used for non C frames also. The PStack code roughly does the following:
>>
>> CFrame f = cdbg.topFrameForThread();
>> ClosestSymbol sym = f.closestSymbolToPC();
>> Address pc = f.pc();
>> if (sym != null) {
>>     ... native symbol
>> } else if (interp.contains(pc)) {
>>     ... print interpreter frame
>>
>> So if the CFrame was filled in with "last java frame" values, it 
>> should allow PStack to print the stack starting with the "last java 
>> frame". Any native frame below that point would be missed.
>
> To use "last java frame" in this case looks good because stack 
> unwinding is a best effort behavior.
> However PStack::run is PC-driven. I want to regard it - in other 
> words, it should not perform if we cannot get register values even if 
> "last java frame" is available.
Ok, that sounds reasonable.

thanks,

Chris
>
>
> Thanks,
>
> Yasumasa
>
>
>> Chris
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>>> thanks,
>>>>
>>>> Chris
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> On 6/23/20 11:04 PM, Yasumasa Suenaga wrote:
>>>>>>> Hi Chris,
>>>>>>>
>>>>>>> Thanks you for explanation.
>>>>>>> Your change looks good (but "last java frame" would not be found 
>>>>>>> in Linux AMD64 because RSP is NULL - cf. LinuxAMD64CFrame.java)
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> On 2020/06/24 12:09, Chris Plummer wrote:
>>>>>>>> On 6/23/20 6:05 PM, Yasumasa Suenaga wrote:
>>>>>>>>> Hi Chris,
>>>>>>>>>
>>>>>>>>> Skillful troubleshooters who use jhsdb will aware this 
>>>>>>>>> warnings, and they will take other appropriate methods.
>>>>>>>>>
>>>>>>>>> However, I'm not sure it is worth to continue to perform even 
>>>>>>>>> if SA cannot get register values.
>>>>>>>>>
>>>>>>>>> For example, Linux AMD64 depends on RIP and RSP values to find 
>>>>>>>>> top frame.
>>>>>>>>> According to your change, The caller of 
>>>>>>>>> getThreadIntegerRegisterSet() has responsible for dealing with 
>>>>>>>>> the set of null registers. However X86ThreadContext::data (it 
>>>>>>>>> includes raw register values) would still be zero when it 
>>>>>>>>> happens.
>>>>>>>> This is  what I intended to have happen. Just end up with a 
>>>>>>>> register set of all nulls. Then when stack walking code gets a 
>>>>>>>> null, it will revert to "last java frame" if available, 
>>>>>>>> otherwise no stack dump is done.
>>>>>>>>>
>>>>>>>>> So I think register holder (e.g. X86ThreadContext) should have 
>>>>>>>>> tri-state (have registers, fail to get registers, not yet 
>>>>>>>>> attempt to get registers).
>>>>>>>>> OTOH it might be over-engineering. What do you think?
>>>>>>>> Before implementing this I looked at the what would be the 
>>>>>>>> easier approach to get the desired effect of stack walking code 
>>>>>>>> simply failing over to using "last java frame", and decided the 
>>>>>>>> null set of registers was easiest. Other approaches involved 
>>>>>>>> more changes and impacted more files.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2020/06/24 3:16, Chris Plummer wrote:
>>>>>>>>>> On 6/20/20 12:53 AM, Yasumasa Suenaga wrote:
>>>>>>>>>>> Hi Chris,
>>>>>>>>>>>
>>>>>>>>>>> On 2020/06/20 15:20, Chris Plummer wrote:
>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>
>>>>>>>>>>>> ptrace is not used for core files, so the EFAULT for a bad 
>>>>>>>>>>>> core file is not a possibility. However, get_lwp_regs() 
>>>>>>>>>>>> does redirect to core_get_lwp_regs() for core files. It can 
>>>>>>>>>>>> fail, but the only reason it ever does is if the LWP can't 
>>>>>>>>>>>> be found in the core (which is never suppose to happen). I 
>>>>>>>>>>>> would think if this happened due to the core being 
>>>>>>>>>>>> truncated, SA would be blowing up all over the place with 
>>>>>>>>>>>> exceptions, probably before we ever get to this code, but 
>>>>>>>>>>>> in any cast what we do here wouldn't really make a difference.
>>>>>>>>>>>
>>>>>>>>>>> You are right, sorry.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> I'm not sure why you prefer an exception for errors other 
>>>>>>>>>>>> than ESRCH. Why should they be treated differently? 
>>>>>>>>>>>> getThreadIntegerRegisterSet0() is used for finding the 
>>>>>>>>>>>> current frame for stack tracing. With my changes any 
>>>>>>>>>>>> failure will result in deferring to "last java frame" if 
>>>>>>>>>>>> set, and otherwise just not produce a stack trace (and the 
>>>>>>>>>>>> WARNING will be present in the output). This seems 
>>>>>>>>>>>> preferable to completely abandoning any further thread 
>>>>>>>>>>>> stack tracking.
>>>>>>>>>>>
>>>>>>>>>>> I'm not sure we can trust call stack when ptrace() returns 
>>>>>>>>>>> any errors other than ESRCH even if "last java frame" is 
>>>>>>>>>>> available. For example, don't ptrace() return EFAULT or EIO 
>>>>>>>>>>> when something wrong? (e.g. stack corruption) If so, it may 
>>>>>>>>>>> lead to a wrong analysis for troubleshooter.
>>>>>>>>>>> I think it should be abort dumping call stack for its thread 
>>>>>>>>>>> at least.
>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>
>>>>>>>>>> In general stack walking makes a best effort and can be 
>>>>>>>>>> wrong, even when not getting errors like this. For any 
>>>>>>>>>> actively executing thread SA needs to determine where the 
>>>>>>>>>> stack starts, with register contents being the starting point 
>>>>>>>>>> (SP, FP, and PC). These registers could contain anything, and 
>>>>>>>>>> SA makes a best effort to determine a current frame from 
>>>>>>>>>> them. However, the verification steps it takes are not 100% 
>>>>>>>>>> guaranteed, and can lead to an incorrect assumption of the 
>>>>>>>>>> current frame, which in turn can result in an exception later 
>>>>>>>>>> on when walking the stack. See JDK-8247641.
>>>>>>>>>>
>>>>>>>>>> Keep in mind that the WARNING message will always be there. 
>>>>>>>>>> This should be enough to put the troubleshooter on alert that 
>>>>>>>>>> the stack trace may not be accurate. I think it's better to 
>>>>>>>>>> make an attempt at a stack trace then to just abandon it and 
>>>>>>>>>> not attempt to do something that may be useful.
>>>>>>>>>>
>>>>>>>>>> thanks,
>>>>>>>>>>
>>>>>>>>>> Chris
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Chris
>>>>>>>>>>>>
>>>>>>>>>>>> On 6/19/20 6:33 PM, Yasumasa Suenaga wrote:
>>>>>>>>>>>>> Hi Chris,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I checked Linux kernel code at a glance, ESRCH seems to be 
>>>>>>>>>>>>> set to errno by default.
>>>>>>>>>>>>> So I guess it is similar to "generic" error code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://github.com/torvalds/linux/blob/master/kernel/ptrace.c
>>>>>>>>>>>>>
>>>>>>>>>>>>> According to manpage of ptrace(2), it might return errno 
>>>>>>>>>>>>> other than ESRCH.
>>>>>>>>>>>>> For example, if we analyze broken core (e.g. the core was 
>>>>>>>>>>>>> dumped with disk full), we might get EFAULT.
>>>>>>>>>>>>> Thus I prefer to handle ESRCH only in your patch, and also 
>>>>>>>>>>>>> I think SA should throw DebuggerException if other error 
>>>>>>>>>>>>> is occurred.
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://www.man7.org/linux/man-pages/man2/ptrace.2.html
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2020/06/20 5:51, Chris Plummer wrote:
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've  updated with webrev based on the new finding that a 
>>>>>>>>>>>>>> JavaThread cannot be on the ThreadList after its OS 
>>>>>>>>>>>>>> thread has been destroyed since the JavaThread removes 
>>>>>>>>>>>>>> itself from the ThreadList, and therefore must be running 
>>>>>>>>>>>>>> on its OS thread. The logic of the fix is unchanged from 
>>>>>>>>>>>>>> the first webrev, but I updated the comments to better 
>>>>>>>>>>>>>> reflect what is going on. I also updated the CR:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.01/index.html 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 6/19/20 12:24 AM, David Holmes wrote:
>>>>>>>>>>>>>>> Hi Chris,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 19/06/2020 8:55 am, Chris Plummer wrote:
>>>>>>>>>>>>>>>> On 6/18/20 1:43 AM, David Holmes wrote:
>>>>>>>>>>>>>>>>> On 18/06/2020 4:49 pm, Chris Plummer wrote:
>>>>>>>>>>>>>>>>>> On 6/17/20 10:29 PM, David Holmes wrote:
>>>>>>>>>>>>>>>>>>> On 18/06/2020 3:13 pm, Chris Plummer wrote:
>>>>>>>>>>>>>>>>>>>> On 6/17/20 10:09 PM, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>>> On 18/06/2020 2:33 pm, Chris Plummer wrote:
>>>>>>>>>>>>>>>>>>>>>> On 6/17/20 7:43 PM, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi Chris,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On 18/06/2020 6:34 am, Chris Plummer wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Please help review the following:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247533
>>>>>>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8247533/webrev.00/index.html 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> The CR contains all the needed details. Here's 
>>>>>>>>>>>>>>>>>>>>>>>> a summary of changes in each file:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The problem sounds to me like a variation of the 
>>>>>>>>>>>>>>>>>>>>>>> more general problem of not ensuring a thread is 
>>>>>>>>>>>>>>>>>>>>>>> kept alive whilst acting upon it. I don't know 
>>>>>>>>>>>>>>>>>>>>>>> how the SA finds these references to the threads 
>>>>>>>>>>>>>>>>>>>>>>> it is going to stackwalk, but is it possible to 
>>>>>>>>>>>>>>>>>>>>>>> fix this via appropriate uses of 
>>>>>>>>>>>>>>>>>>>>>>> ThreadsListHandle/Iterator?
>>>>>>>>>>>>>>>>>>>>>> It fetches ThreadsSMRSupport::_java_thread_list.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Keep in mind that once SA attaches, nothing in 
>>>>>>>>>>>>>>>>>>>>>> the VM changes. For example, SA can't create a 
>>>>>>>>>>>>>>>>>>>>>> wrapper to a JavaThread, only to have the 
>>>>>>>>>>>>>>>>>>>>>> JavaThread be freed later on. It's just not 
>>>>>>>>>>>>>>>>>>>>>> possible.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Then how does it obtain a reference to a 
>>>>>>>>>>>>>>>>>>>>> JavaThread for which the native OS thread id is 
>>>>>>>>>>>>>>>>>>>>> invalid? Any thread found in _java_thread_list is 
>>>>>>>>>>>>>>>>>>>>> either live or still to be started. In the latter 
>>>>>>>>>>>>>>>>>>>>> case the JavaThread->osThread does not have its 
>>>>>>>>>>>>>>>>>>>>> thread_id set yet.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> My assumption was that the JavaThread is in the 
>>>>>>>>>>>>>>>>>>>> process of being destroyed, and it has freed its OS 
>>>>>>>>>>>>>>>>>>>> thread but is itself still in the thread list. I 
>>>>>>>>>>>>>>>>>>>> did notice that the OS thread id being used looked 
>>>>>>>>>>>>>>>>>>>> to be in the range of thread id #'s you would 
>>>>>>>>>>>>>>>>>>>> expect for the running app, so that to me indicated 
>>>>>>>>>>>>>>>>>>>> it was once valid, but is no more.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Keep in mind that although hotspot may have 
>>>>>>>>>>>>>>>>>>>> synchronization code that prevents you from pulling 
>>>>>>>>>>>>>>>>>>>> a JavaThread off the thread list when it is in the 
>>>>>>>>>>>>>>>>>>>> process of being destroyed (I'm guessing it does), 
>>>>>>>>>>>>>>>>>>>> SA has no such protections.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> But you stated that once the SA has attached, the 
>>>>>>>>>>>>>>>>>>> target VM can't change. If the SA gets its set of 
>>>>>>>>>>>>>>>>>>> thread from one attach then tries to make queries 
>>>>>>>>>>>>>>>>>>> about those threads in a separate attach, then 
>>>>>>>>>>>>>>>>>>> obviously it could be providing garbage thread 
>>>>>>>>>>>>>>>>>>> information. So you would need to re-validate the 
>>>>>>>>>>>>>>>>>>> JavaThread in the target VM before trying to do 
>>>>>>>>>>>>>>>>>>> anything with it.
>>>>>>>>>>>>>>>>>> That's not what is going on here. It's attaching and 
>>>>>>>>>>>>>>>>>> doing a stack trace, which involves getting the 
>>>>>>>>>>>>>>>>>> thread list and iterating through all threads without 
>>>>>>>>>>>>>>>>>> detaching. 
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Okay so I restate my original comment - all the 
>>>>>>>>>>>>>>>>> JavaThreads must be alive or not yet started, so how 
>>>>>>>>>>>>>>>>> are you encountering an invalid thread id? Any thread 
>>>>>>>>>>>>>>>>> you find via the ThreadsList can't have destroyed its 
>>>>>>>>>>>>>>>>> osThread. In any case the logic should be checking 
>>>>>>>>>>>>>>>>> thread->osThread() for NULL, and then 
>>>>>>>>>>>>>>>>> osThread()->get_state() to ensure it is >= INITIALIZED 
>>>>>>>>>>>>>>>>> before using the thread_id().
>>>>>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I chatted with Dan about this, and he said since the 
>>>>>>>>>>>>>>>> JavaThread is responsible for removing itself from the 
>>>>>>>>>>>>>>>> ThreadList, it is impossible to have a JavaThread still 
>>>>>>>>>>>>>>>> on the ThreadList, but without and underlying OS 
>>>>>>>>>>>>>>>> Thread. So I'm a bit perplexed as to how I can find a 
>>>>>>>>>>>>>>>> JavaThread on the ThreadList, but that results in ESRCH 
>>>>>>>>>>>>>>>> when trying to access the thread with ptrace. My only 
>>>>>>>>>>>>>>>> conclusion is that this failure is somehow spurious, 
>>>>>>>>>>>>>>>> and maybe the issue it just that the thread is in some 
>>>>>>>>>>>>>>>> temporary state that prevents its access. If so, I 
>>>>>>>>>>>>>>>> still think the approach I'm taking is the correct one, 
>>>>>>>>>>>>>>>> but the comments should be updated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ESRCH can have other meanings but I don't know enough 
>>>>>>>>>>>>>>> about the broader context to know whether they are 
>>>>>>>>>>>>>>> applicable in this case.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     ESRCH  The  specified  process does not exist, or is 
>>>>>>>>>>>>>>> not currently being traced by the caller, or is not stopped
>>>>>>>>>>>>>>>               (for requests that require a stopped tracee).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I won't comment further on the fix/workaround as I don't 
>>>>>>>>>>>>>>> know the code. I'll leave that to other folk.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I had one other finding. When this issue first turned 
>>>>>>>>>>>>>>>> up, it prevented the thread from getting a stack trace 
>>>>>>>>>>>>>>>> due to the exception being thrown. What I hadn't 
>>>>>>>>>>>>>>>> realize is that after fixing it to not throw an 
>>>>>>>>>>>>>>>> exception, which resulted in the stack walking code 
>>>>>>>>>>>>>>>> getting all nulls for register values, I actually 
>>>>>>>>>>>>>>>> started to see a stack trace printed:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "JLine terminal non blocking reader thread" #26 daemon 
>>>>>>>>>>>>>>>> prio=5 tid=0x00007f12f0cd6420 nid=0x1f99 runnable 
>>>>>>>>>>>>>>>> [0x00007f125f0f4000]
>>>>>>>>>>>>>>>>     java.lang.Thread.State: RUNNABLE
>>>>>>>>>>>>>>>>     JavaThread state: _thread_in_native
>>>>>>>>>>>>>>>> WARNING: getThreadIntegerRegisterSet0: get_lwp_regs 
>>>>>>>>>>>>>>>> failed for lwp (8089)
>>>>>>>>>>>>>>>> CurrentFrameGuess: choosing last Java frame: sp = 
>>>>>>>>>>>>>>>> 0x00007f125f0f4770, fp = 0x00007f125f0f47c0
>>>>>>>>>>>>>>>>   - java.io.FileInputStream.read0() @bci=0 (Interpreted 
>>>>>>>>>>>>>>>> frame)
>>>>>>>>>>>>>>>>   - java.io.FileInputStream.read() @bci=1, line=223 
>>>>>>>>>>>>>>>> (Interpreted frame)
>>>>>>>>>>>>>>>>   - 
>>>>>>>>>>>>>>>> jdk.internal.org.jline.utils.NonBlockingInputStreamImpl.run() 
>>>>>>>>>>>>>>>> @bci=108, line=216 (Interpreted frame)
>>>>>>>>>>>>>>>>   - 
>>>>>>>>>>>>>>>> jdk.internal.org.jline.utils.NonBlockingInputStreamImpl$$Lambda$536+0x0000000800daeca0.run() 
>>>>>>>>>>>>>>>> @bci=4 (Interpreted frame)
>>>>>>>>>>>>>>>>   - java.lang.Thread.run() @bci=11, line=832 
>>>>>>>>>>>>>>>> (Interpreted frame)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The "CurrentFrameGuess" output is some debug tracing I 
>>>>>>>>>>>>>>>> had enabled, and it indicates that the stack walking 
>>>>>>>>>>>>>>>> code is using the "last java frame" setting, which it 
>>>>>>>>>>>>>>>> will do if current registers values don't indicate a 
>>>>>>>>>>>>>>>> valid frame (as would be the case if sp was null). I 
>>>>>>>>>>>>>>>> had previously assumed that without an underling valid 
>>>>>>>>>>>>>>>> LWP, there would be no stack trace. Given that there is 
>>>>>>>>>>>>>>>> one, there must be a valid LWP. Otherwise I don't see 
>>>>>>>>>>>>>>>> how the stack could have been walked. That's another 
>>>>>>>>>>>>>>>> indication that the ptrace failure is spurious in nature.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Also, even if you are using something like clhsdb to 
>>>>>>>>>>>>>>>>>> issue commands on addresses, if the address is no 
>>>>>>>>>>>>>>>>>> longer valid for the command you are executing, then 
>>>>>>>>>>>>>>>>>> you would get the appropriate error when there is an 
>>>>>>>>>>>>>>>>>> attempt to create a wrapper for it. I don't know of 
>>>>>>>>>>>>>>>>>> any command that operates directly on a JavaThread, 
>>>>>>>>>>>>>>>>>> but I think there are for InstanceKlass. So if you 
>>>>>>>>>>>>>>>>>> remembered the address of an InstanceKlass, and then 
>>>>>>>>>>>>>>>>>> reattached and tried a command that takes an 
>>>>>>>>>>>>>>>>>> InstanceKlass address, you would get an exception 
>>>>>>>>>>>>>>>>>> when SA tries to create the wrapper for the 
>>>>>>>>>>>>>>>>>> InsanceKlass if it were no longer a valid address for 
>>>>>>>>>>>>>>>>>> one.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/LinuxDebuggerLocal.cpp 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/windows/native/libsaproc/sawindbg.cpp 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> -Instead of throwing an exception when the OS 
>>>>>>>>>>>>>>>>>>>>>>>> ThreadID is invalid, print a warning.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> -Improve a print_debug message
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdThread.java 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxThread.java 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/windbg/amd64/WindbgAMD64Thread.java 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> -Deal with the array of registers read in being 
>>>>>>>>>>>>>>>>>>>>>>>> null due to the OS ThreadID not being valid.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/BsdDebuggerLocal.java 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/LinuxDebuggerLocal.java 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> -Fix issue with 
>>>>>>>>>>>>>>>>>>>>>>>> "sun.jvm.hotspot.debugger.DebuggerException" 
>>>>>>>>>>>>>>>>>>>>>>>> appearing twice when printing the exception.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>