RFR(s-ish): 8191101: Show register content in hs-err file on assert
Thomas Stüfe
thomas.stuefe at gmail.com
Tue Nov 28 08:08:41 UTC 2017
Updated webrev with Andrews request worked in:
http://cr.openjdk.java.net/~stuefe/webrevs/8191101-show-registers-on-assert/webrev.01/webrev/
The only difference to before is that I restored the ShouldNotReachHere()
at the end of VMError::controlled_crash(). Before that assert, I added a
hopefully clearer message than before to enable my test to scan for this
message. I also changed the jtreg test to scan for this new message.
Note: in the jtreg test, I test that assert supression works with
-XX:SupressErrorAt. But I just give it the file
(-XX:SuppressErrorAt=/vmError.cpp)
and omit the line number, because I do not want the test to stop working
when someone adds lines in front of VMError::controlled_crash(). But that
means that the ShouldNotReachHere() at the end of controlled_crash() does
not fire either, because now all asserts in vmError.cpp are disabled. So, I
need this new message.
Thanks, Thomas
On Mon, Nov 27, 2017 at 11:33 AM, Thomas Stüfe <thomas.stuefe at gmail.com>
wrote:
> Ping... Could I please have reviews? Thank you!
> ..Thomas
>
> On Wed, Nov 22, 2017 at 7:01 PM, Thomas Stüfe <thomas.stuefe at gmail.com>
> wrote:
>
>> Hi all,
>>
>> may I please have reviews for the following enhancement:
>>
>> issue: https://bugs.openjdk.java.net/browse/JDK-8191101
>> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8191101-
>> show-registers-on-assert/webrev.00/webrev/
>>
>> (Patch looks big but lot of it is os_cpu fluff.)
>>
>> Prior email discussion at: http://mail.openjdk.java.net/p
>> ipermail/hotspot-runtime-dev/2017-October/025018.html
>>
>> ----
>>
>> Basically, this adds the ability to show register values in assert
>> situations into the error report. This can be useful in certain corner
>> cases, e.g. if you want to know the value of some local variables or how
>> deep your stack currently runs.
>>
>> This works by triggering a fault right when an assert happens and
>> squirreling the context away for later error reporting.
>>
>> When an assert happens, we touch a poison page. To preserve the context
>> as best as possible, we want to avoid running too much code after the
>> assert condition has been evaluated, so this is done in a very simple way:
>> directly in the assert macro, right after the condition is evaluated, we
>> dereference the content of a global poison page pointer. In my tests, even
>> with slowdebug, this only spoils one register, rax on x64.
>>
>> In the signal handler, we recognize the assertion poison fault by the
>> faulting address. We disable the poison page and store the ucontext away.
>> Then we just return from signal handling. Poison page is now disarmed, the
>> load from it is retried and now goes through. Normal assertion handling is
>> then resumed - so, things like -XX:SuppressAt are unaffected and work fine.
>>
>> Then, when an error report is generated due to this assert, we now also
>> use the stored context. So now we get registers and instructions at the
>> assert point.
>>
>> Right now, this is implemented on (non-zero) Linux, though other posix
>> platforms should be no problem. Have not yet thought deeply about windows.
>> It is tested and - in debug - switched on by default on linux x86, ppc,
>> s390.
>>
>> If implemented, it can be switched on and off with
>> -XX:+ShowRegistersOnAssert. This is a failsafe, in case the mechanism does
>> not work and we want to have clean asserts.
>>
>> To test this, do a java -XX:ErrorHandlerTest=1 -XX:+ShowRegistersOnAssert
>> with a not-product VM. On Linux x64, ppc and s390 we should now see the
>> register output in the hs-err file:
>>
>> 4 # Internal Error (/shared/projects/openjdk/jdk-
>> hs/source/src/hotspot/share/utilities/vmError.cpp:1660), pid=4022,
>> tid=4023
>> 5 # assert(str == NULL) failed: expected null
>> 6 #
>> 7 #
>> ...
>> 59 Registers:
>> 60 RAX=0x00007f736c8f7000, RBX=0x0000000000000000,
>> RCX=0x0000000000000000, RDX=0xffffffffffb5ded4
>> 61 RSP=0x00007f736c8d8ce0, RBP=0x00007f736c8d8d30,
>> RSI=0x0000000000000001, RDI=0x0000000000000001
>> 62 R8 =0x0000000000000040, R9 =0x0000000000000001,
>> R10=0x0000000000efb028, R11=0x00040788a244f42a
>> 63 R12=0x0000000000000000, R13=0x00007fff3de46dbf,
>> R14=0x00007f736c8d99c0, R15=0x0000000000000000
>> 64 RIP=0x00007f736aec5529, EFLAGS=0x0000000000010202,
>> CSGSFS=0x002b000000000033, ERR=0x0000000000000006
>> 65 TRAPNO=0x000000000000000e
>> ...
>> 72
>> 73 Instructions: (pc=0x00007f736aec5529)
>> 74 0x00007f736aec5509: 8d 05 31 21 4a 00 48 01 d0 ff e0 48 83 7d c8 00
>> 75 0x00007f736aec5519: 0f 84 7f 03 00 00 48 8d 05 82 fc b1 00 48 8b 00
>> 76 0x00007f736aec5529: c6 00 58 e8 cb 17 62 ff 84 c0 74 11 48 8d 3d 2f
>> 77 0x00007f736aec5539: 1f 4a 00 b8 00 00 00 00 e8 ed 17 62 ff 48 8d 0d
>> 78
>>
>> ---
>>
>> Implementation notes:
>>
>> - when handling the poison fault, we need to copy the context away from
>> the signal handler stack. For posix, this means copying the ucontext_t.
>> This is undefined territory. On most platforms, this simply means copying
>> the ucontex_t as a flat structure. On some platforms more is needed, e.g.
>> on linux ppc, we need to patch up the context after copying (the context is
>> not position independent), and on MacOS, the context is not self-contained
>> but contains pointers to sub structures which need to be copied too and
>> whose size is unknown at compile time. Because of these platform
>> dependencies, I factored out the copying of ucontext_t to
>> os::Posix::copy_ucontext and its implementations are os_cpu specific.
>>
>> - As an added precaution, when copying the context, we use a safe version
>> of memcpy (os::safe_memcpy) which I added to copy from potentially invalid
>> memory regions. The reason is that we have seen on some Unices - e.g. hpux
>> - that the size of the ucontext_t structure at runtime may be different
>> from the build machine, so we tread carefully. os::safe_memcpy() uses
>> SafeFetch to copy a range of memory.
>>
>> Risk:
>>
>> If this does not work, asserts will become segfaults, which can be
>> confusing. But the feature can be disabled with -XX:+ShowRegistersOnAssert
>> and for now on most platforms is disabled by default.
>>
>> ---
>>
>> Limitations:
>> - as it is now implemented, this is a one-shot mechanism and only works
>> for the first assert.
>> - -XX:SuppressAt=... is not affected and works fine. However, if the
>> first assert is suppressed, follow-up asserts will not show register values.
>> - When multiple threads run into an assert, we may or may not see
>> register values depending on which thread is the first of finishing the
>> poison page handler.
>>
>> I do not think these limitations are severe. They can be solved, but at
>> the cost of added complexity, which I preferred not to add.
>>
>> Thank you,
>>
>> Thomas
>>
>>
>>
>>
>
More information about the hotspot-runtime-dev
mailing list