RFR(s-ish): 8191101: Show register content in hs-err file on assert

Thomas Stüfe thomas.stuefe at gmail.com
Wed Nov 22 22:42:10 UTC 2017


Small correction, see below...

On Wed 22. Nov 2017 at 19:01, Thomas Stüfe <thomas.stuefe at gmail.com> wrote:

> Hi all,
>
> may I please have reviews for the following enhancement:
>
> issue: https://bugs.openjdk.java.net/browse/JDK-8191101
> webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8191101-show-registers-on-assert/webrev.00/webrev/
>
>
> (Patch looks big but lot of it is os_cpu fluff.)
>
> Prior email discussion at:
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/025018.html
>
> ----
>
> Basically, this adds the ability to show register values in assert
> situations into the error report. This can be useful in certain corner
> cases, e.g. if you want to know the value of some local variables or how
> deep your stack currently runs.
>
> This works by triggering a fault right when an assert happens and
> squirreling the context away for later error reporting.
>
> When an assert happens, we touch a poison page. To preserve the context as
> best as possible, we want to avoid running too much code after the assert
> condition has been evaluated, so this is done in a very simple way:
> directly in the assert macro, right after the condition is evaluated, we
> dereference the content of a global poison page pointer. In my tests, even
> with slowdebug, this only spoils one register, rax on x64.
>
> In the signal handler, we recognize the assertion poison fault by the
> faulting address. We disable the poison page and store the ucontext away.
> Then we just return from signal handling. Poison page is now disarmed, the
> load from it is retried and now goes through. Normal assertion handling is
> then resumed - so, things like -XX:SuppressAt are unaffected and work fine.
>
> Then, when an error report is generated due to this assert, we now also
> use the stored context. So now we get registers and instructions at the
> assert point.
>
> Right now, this is implemented on (non-zero) Linux, though other posix
> platforms should be no problem. Have not yet thought deeply about windows.
> It is tested and - in debug - switched on by default on linux x86, ppc,
> s390.
>
> If implemented, it can be switched on and off with
> -XX:+ShowRegistersOnAssert. This is a failsafe, in case the mechanism does
> not work and we want to have clean asserts.
>
> To test this, do a java -XX:ErrorHandlerTest=1 -XX:+ShowRegistersOnAssert
> with a not-product VM. On Linux x64, ppc and s390 we should now see the
> register output in the hs-err file:
>
>   4 #  Internal Error
> (/shared/projects/openjdk/jdk-hs/source/src/hotspot/share/utilities/vmError.cpp:1660),
> pid=4022, tid=4023
>   5 #  assert(str == NULL) failed: expected null
>   6 #
>   7 #
> ...
>  59 Registers:
>  60 RAX=0x00007f736c8f7000, RBX=0x0000000000000000,
> RCX=0x0000000000000000, RDX=0xffffffffffb5ded4
>  61 RSP=0x00007f736c8d8ce0, RBP=0x00007f736c8d8d30,
> RSI=0x0000000000000001, RDI=0x0000000000000001
>  62 R8 =0x0000000000000040, R9 =0x0000000000000001,
> R10=0x0000000000efb028, R11=0x00040788a244f42a
>  63 R12=0x0000000000000000, R13=0x00007fff3de46dbf,
> R14=0x00007f736c8d99c0, R15=0x0000000000000000
>  64 RIP=0x00007f736aec5529, EFLAGS=0x0000000000010202,
> CSGSFS=0x002b000000000033, ERR=0x0000000000000006
>  65   TRAPNO=0x000000000000000e
> ...
>  72
>  73 Instructions: (pc=0x00007f736aec5529)
>  74 0x00007f736aec5509:   8d 05 31 21 4a 00 48 01 d0 ff e0 48 83 7d c8 00
>  75 0x00007f736aec5519:   0f 84 7f 03 00 00 48 8d 05 82 fc b1 00 48 8b 00
>  76 0x00007f736aec5529:   c6 00 58 e8 cb 17 62 ff 84 c0 74 11 48 8d 3d 2f
>  77 0x00007f736aec5539:   1f 4a 00 b8 00 00 00 00 e8 ed 17 62 ff 48 8d 0d
>  78
>
> ---
>
> Implementation notes:
>
> - when handling the poison fault, we need to copy the context away from
> the signal handler stack. For posix, this means copying the ucontext_t.
> This is undefined territory. On most platforms, this simply means copying
> the ucontex_t as a flat structure. On some platforms more is needed, e.g.
> on linux ppc, we need to patch up the context after copying (the context is
> not position independent), and on MacOS, the context is not self-contained
> but contains pointers to sub structures which need to be copied too and
> whose size is unknown at compile time. Because of these platform
> dependencies, I factored out the copying of ucontext_t to
> os::Posix::copy_ucontext and its implementations are os_cpu specific.
>
> - As an added precaution, when copying the context, we use a safe version
> of memcpy (os::safe_memcpy) which I added to copy from potentially invalid
> memory regions. The reason is that we have seen on some Unices - e.g. hpux
> - that the size of the ucontext_t structure at runtime may be different
> from the build machine, so we tread carefully. os::safe_memcpy() uses
> SafeFetch to copy a range of memory.
>
> Risk:
>
> If this does not work, asserts will become segfaults, which can be
> confusing. But the feature can be disabled with -XX:+ShowRegistersOnAssert
> and for now on most platforms is disabled by default.
>

Feature is disabled with -XX:-ShowRegistersOnAssert, of course. Sorry for
the typo.


> ---
>
> Limitations:
> - as it is now implemented, this is a one-shot mechanism and only works
> for the first assert.
> - -XX:SuppressAt=... is not affected and works fine. However, if the first
> assert is suppressed, follow-up asserts will not show register values.
> - When multiple threads run into an assert, we may or may not see register
> values depending on which thread is the first of finishing the poison page
> handler.
>
> I do not think these limitations are severe. They can be solved, but at
> the cost of added complexity, which I preferred not to add.
>
> Thank you,
>
> Thomas
>
>
>
>


More information about the hotspot-runtime-dev mailing list