RFR(s-ish): 8191101: Show register content in hs-err file on assert
thomas.stuefe at gmail.com
Wed Nov 22 18:01:06 UTC 2017
may I please have reviews for the following enhancement:
(Patch looks big but lot of it is os_cpu fluff.)
Prior email discussion at:
Basically, this adds the ability to show register values in assert
situations into the error report. This can be useful in certain corner
cases, e.g. if you want to know the value of some local variables or how
deep your stack currently runs.
This works by triggering a fault right when an assert happens and
squirreling the context away for later error reporting.
When an assert happens, we touch a poison page. To preserve the context as
best as possible, we want to avoid running too much code after the assert
condition has been evaluated, so this is done in a very simple way:
directly in the assert macro, right after the condition is evaluated, we
dereference the content of a global poison page pointer. In my tests, even
with slowdebug, this only spoils one register, rax on x64.
In the signal handler, we recognize the assertion poison fault by the
faulting address. We disable the poison page and store the ucontext away.
Then we just return from signal handling. Poison page is now disarmed, the
load from it is retried and now goes through. Normal assertion handling is
then resumed - so, things like -XX:SuppressAt are unaffected and work fine.
Then, when an error report is generated due to this assert, we now also use
the stored context. So now we get registers and instructions at the assert
Right now, this is implemented on (non-zero) Linux, though other posix
platforms should be no problem. Have not yet thought deeply about windows.
It is tested and - in debug - switched on by default on linux x86, ppc,
If implemented, it can be switched on and off with
-XX:+ShowRegistersOnAssert. This is a failsafe, in case the mechanism does
not work and we want to have clean asserts.
To test this, do a java -XX:ErrorHandlerTest=1 -XX:+ShowRegistersOnAssert
with a not-product VM. On Linux x64, ppc and s390 we should now see the
register output in the hs-err file:
4 # Internal Error
5 # assert(str == NULL) failed: expected null
60 RAX=0x00007f736c8f7000, RBX=0x0000000000000000, RCX=0x0000000000000000,
61 RSP=0x00007f736c8d8ce0, RBP=0x00007f736c8d8d30, RSI=0x0000000000000001,
62 R8 =0x0000000000000040, R9 =0x0000000000000001, R10=0x0000000000efb028,
63 R12=0x0000000000000000, R13=0x00007fff3de46dbf, R14=0x00007f736c8d99c0,
64 RIP=0x00007f736aec5529, EFLAGS=0x0000000000010202,
73 Instructions: (pc=0x00007f736aec5529)
74 0x00007f736aec5509: 8d 05 31 21 4a 00 48 01 d0 ff e0 48 83 7d c8 00
75 0x00007f736aec5519: 0f 84 7f 03 00 00 48 8d 05 82 fc b1 00 48 8b 00
76 0x00007f736aec5529: c6 00 58 e8 cb 17 62 ff 84 c0 74 11 48 8d 3d 2f
77 0x00007f736aec5539: 1f 4a 00 b8 00 00 00 00 e8 ed 17 62 ff 48 8d 0d
- when handling the poison fault, we need to copy the context away from the
signal handler stack. For posix, this means copying the ucontext_t. This is
undefined territory. On most platforms, this simply means copying the
ucontex_t as a flat structure. On some platforms more is needed, e.g. on
linux ppc, we need to patch up the context after copying (the context is
not position independent), and on MacOS, the context is not self-contained
but contains pointers to sub structures which need to be copied too and
whose size is unknown at compile time. Because of these platform
dependencies, I factored out the copying of ucontext_t to
os::Posix::copy_ucontext and its implementations are os_cpu specific.
- As an added precaution, when copying the context, we use a safe version
of memcpy (os::safe_memcpy) which I added to copy from potentially invalid
memory regions. The reason is that we have seen on some Unices - e.g. hpux
- that the size of the ucontext_t structure at runtime may be different
from the build machine, so we tread carefully. os::safe_memcpy() uses
SafeFetch to copy a range of memory.
If this does not work, asserts will become segfaults, which can be
confusing. But the feature can be disabled with -XX:+ShowRegistersOnAssert
and for now on most platforms is disabled by default.
- as it is now implemented, this is a one-shot mechanism and only works for
the first assert.
- -XX:SuppressAt=... is not affected and works fine. However, if the first
assert is suppressed, follow-up asserts will not show register values.
- When multiple threads run into an assert, we may or may not see register
values depending on which thread is the first of finishing the poison page
I do not think these limitations are severe. They can be solved, but at the
cost of added complexity, which I preferred not to add.
More information about the hotspot-runtime-dev