RFR: 8369994: Mixed mode jhsdb jstack cannot resolve symbol with cold attribute [v2]
Kevin Walls
kevinw at openjdk.org
Wed Oct 29 21:23:19 UTC 2025
On Mon, 20 Oct 2025 01:23:45 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org> wrote:
>> `jhsdb jstack --mixed` with coredump cannot resolve function symbol which has `.cold` attribute.
>>
>>
>> ----------------- 120485 -----------------
>> "Thread-0" #24 prio=5 tid=0x00007f50dc1aa7c0 nid=120485 waiting on condition [0x00007f50c0d1a000]
>> java.lang.Thread.State: TIMED_WAITING (sleeping)
>> JavaThread state: _thread_blocked
>> 0x00007f50e4710735 __GI_abort + 0x8b
>> 0x00007f50e1e01f33 ????????
>>
>>
>> 0x7f50e1e01f33 was `os::abort(bool, void const*, void const*) [clone .cold]` and I could see it in GDB. However it has `.cold` suffix, it means the code has been relocated as ["cold" function](https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-cold-function-attribute). In GDB, we can see the code in another area from function body as following:
>>
>>
>> (gdb) disas 0x7f50e1e01f2e, 0x7f50e1e01f34
>> Dump of assembler code from 0x7f50e1e01f2e to 0x7f50e1e01f34:
>> 0x00007f50e1e01f2e <_ZN2os5abortEbPKvS1_.cold+0>: call 0x7f50e1e01010 <abort at plt>
>> => 0x00007f50e1e01f33: nop
>> End of assembler dump.
>>
>>
>> libsaproc.so checks address range to resolve symbol whether the address is in between `start` and `start + size - 1`. As you can see in assembler dump, the code in `.cold` section is `call` instruction, thus IP points next `nop`, thus we should allow address range between `start` and `start + size`.
>>
>> After this PR, you can see the right symbol as following:
>>
>>
>> ----------------- 120485 -----------------
>> "Thread-0" #24 prio=5 tid=0x00007f50dc1aa7c0 nid=120485 waiting on condition [0x00007f50c0d1a000]
>> java.lang.Thread.State: TIMED_WAITING (sleeping)
>> JavaThread state: _thread_blocked
>> 0x00007f50e4710735 __GI_abort + 0x8b
>> 0x00007f50e1e01f33 os::abort(bool, void const*, void const*) [clone .cold] + 0x5
>
> Yasumasa Suenaga has updated the pull request incrementally with two additional commits since the last revision:
>
> - Add fallback code to process DWARF with RIP-1 in Linux AMD64
> - Revert "8369994: Mixed mode jhsdb jstack cannot resolve symbol with cold attribute"
>
> This reverts commit 570b65c6b56ba3378d4f532fa0874ff08ff18451.
Hi - I didn't reproduce this behaviour. I'm sure it can happen as you describe -- but is there a particular kind of system this reproduces on, or a particular way to crash things to get to abort via the "cold" route? 8-)
If the DWARF lookup works at RIP-1, make the closestSymbol call always use RIP-1.
Is it possible to not find DWARF and still get to resolving a symbol?
If closestSymbol retried at pc-1 after it fails, then we wouldn't need to know that DWARF and the symbol info always have the same range (we might expect they do, but maybe could DWARF be a longer range, there is often padding at the end...)
(then we don't need the use1ByteBeforeToLookup param)
The comment could be:
158 // Try again with RIP-1 in case RIP is just outside function bounds,
159 // due to function ending with a `call` instruction.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/27846#issuecomment-3464106789
More information about the serviceability-dev
mailing list