Stack traces for a stuck test in mach5?

Leonid Mesnik leonid.mesnik at oracle.com
Thu Oct 29 20:32:36 UTC 2020


I think it is

https://java.se.oracle.com/infrabugs/browse/MACH5-510

I believe it is closed by "Not reproduced" only no-one replied. 
Re-opened it.

Leonid

On 10/29/20 1:25 PM, mikhailo.seledtsov at oracle.com wrote:
>
> OK, thanks for clarifying. Then right, file a bug in infrabugs, add 
> subcomponent 'host', add me as a watcher. If you already did, please 
> let me know the bug id.
>
> Thanks,
>
> Misha
>
> On 10/29/20 1:21 PM, Igor Ignatyev wrote:
>> right, timeout handler uses tools from PATH, but as far as I can tell 
>> the problem here isn't w/ the used lldb, but w/ host security 
>> policies: if DevToolsSecurity isn't enabled, macOS asks you for 
>> login/password every time you try to attach w/ lldb (or any other tools)
>>
>> -- Igor
>>
>>> On Oct 29, 2020, at 1:13 PM, mikhailo.seledtsov at oracle.com 
>>> <mailto:mikhailo.seledtsov at oracle.com> wrote:
>>>
>>> Adding Leonid, since he worked on similar issue recently.
>>>
>>> It was something about timeout handler possibly referencing a wrong 
>>> lldb (platform natively installed vs installed by JIB), IIRC.
>>>
>>>
>>> Misha
>>>
>>> On 10/27/20 1:07 PM, Igor Ignatyev wrote:
>>>> Hi Evgeny,
>>>>
>>>> if you look at `DevToolsSecurity` results, you will see that 
>>>> "Developer mode is currently disabled.", meaning this host isn't 
>>>> properly configured and `lldb` can attach to the process. you need 
>>>> to open a bug in infra JIRA -- 
>>>> https://java.se.oracle.com/infrabugs/ (you can always find link to 
>>>> it from 'services' menu on infra landing page 
>>>> https://java.se.oracle.com <https://java.se.oracle.com/> ). I 
>>>> *assume* the appropriate project is MACH5 w/ 'host' being the 
>>>> component, Misha (cc'ed) might know better.
>>>>
>>>> HTH,
>>>> -- Igor
>>>>
>>>> PS even if the host was properly configured, we wouldn't get any 
>>>> meaningful data in this particular case, as by the time jtreg 
>>>> invoked failure-handler, the test process had already finished and 
>>>> exited. that's why there is no "common" section (which has 
>>>> `jstack`, `jcmd` and other java specific tools) in 
>>>> `processes.html`, and why `pgrep` (in "test_processes") exited w/ 1 
>>>> and `kill` ("core" subsection) said "No such process". this is by 
>>>> no means to say that we shouldn't fix the hosts. I just don't want 
>>>> to get yours hopes too high: failure-handler is useful and all 
>>>> (mostly b/c it's me who implented it ;) ) but b/c it's run 
>>>> concurrently to a test process, there always will be cases w/ 
>>>> missed data, esp. when a test is having almost enough time to finish.
>>>>
>>>>
>>>>> On Oct 27, 2020, at 12:51 PM, Evgeny Nikitin 
>>>>> <evgeny.nikitin at oracle.com <mailto:evgeny.nikitin at oracle.com>> wrote:
>>>>>
>>>>> Hi Igor,
>>>>>
>>>>> May I ask for your advice with one test stuck failure in mach5?
>>>>>
>>>>> Here's the job, it contains only one test failure:
>>>>>
>>>>> https://mach5.us.oracle.com/mdash/jobs/mach5-one-jdk-16+22-1219-tier3-20201022-1017-15212551/tasks/mach5-one-jdk-16+22-1219-tier3-20201022-1017-15212551-tier3-comp-open_test_hotspot_jtreg_hotspot_slow_compiler-macosx-x64-debug-40/results?search=status%3Afailed%20AND%20-state%3Ainvalid
>>>>>
>>>>> The test hung, jtreg tried to gather stack traces, but with no 
>>>>> success.
>>>>> Output for stack traces:
>>>>>
>>>>> ----------------------------------------
>>>>> [2020-10-22 10:42:02] [/bin/bash, -c, DevToolsSecurity --status | 
>>>>> grep -q enabled && lldb -o 'attach 23355' -o 'thread backtrace 
>>>>> all' -o 'detach' -o 'quit'] timeout=20000
>>>>> ----------------------------------------
>>>>> ----------------------------------------
>>>>> [2020-10-22 10:42:02] exit code: 1 time: 29 ms
>>>>> ----------------------------------------
>>>>>
>>>>> Output for spindump:
>>>>> ----------------------------------------
>>>>> [2020-10-22 10:42:02] [/usr/sbin/spindump, 23355, -stdout] 
>>>>> timeout=20000
>>>>> ----------------------------------------
>>>>> spindump must be run as root
>>>>> ----------------------------------------
>>>>> [2020-10-22 10:42:02] exit code: 77 time: 98 ms
>>>>> ----------------------------------------
>>>>>
>>>>> There's obviously some problem with stack traces gathering. Is 
>>>>> that expected? If not... how and where can I open a bug about 
>>>>> that? I'm guessing, it is for the Infra team, and not in the 
>>>>> openjdk JIRA, right?
>>>>>
>>>>> Regards,
>>>>> // Evgeny.
>>>>>
>>>>>
>>>>
>>


More information about the hotspot-runtime-dev mailing list