RFR(s-ish): 8191101: Show register content in hs-err file on assert

David Holmes david.holmes at oracle.com
Wed Nov 29 04:46:49 UTC 2017


Hi Thomas,

I took a look at this and have a few concerns ...

But first, this is currently targeted to JDK 11 - is that what you are 
aiming for? The JDK 10 gate closes on Friday for hotspot changes.

On 28/11/2017 6:08 PM, Thomas Stüfe wrote:
> Updated webrev with Andrews request worked in:
> 
> http://cr.openjdk.java.net/~stuefe/webrevs/8191101-show-registers-on-assert/webrev.01/webrev/

I don't understand why a number of platforms define 
os::Posix::copy_context as a no-op instead of using 
os::Posix::default_copy_ucontext - this is either common posix code or 
it is not. ?? Is it just a case of you not being able to test these 
other "posix" platforms? But if the actual ucontext_t layout is OS 
specific and not Posix specific then this should not be defined in the 
os::Posix class. (os::Posix is supposed to be a place for standard Posix 
conforming functionality, not a convenient place to share stuff on 
non-windows OS.)

Any why the combination of build-time and run-time checking? This seems 
overly complicated. Can't you just rely on runtime checks?

Your bug synopsis and preamble only mentions assert so I was going to 
comment that:

!   diagnostic_pd(bool, ShowRegistersOnAssert,

should be a develop flag not diagnostic but then I saw you do this for 
guarantees and all the other abort points too! In which case use of 
"assert" in the flag name and CAN_SHOW_REGISTERS_ON_ASSERT, is 
misleading - how about s/assert/abort/ ?

---

I can't really comment on the safe* functionality - I'm not sure what 
that is all about. ??

---

src/hotspot/os/posix/os_posix.cpp

Style nit:  if (copy) -> if (copy != NULL)

+   ::free(uc); // [sic] see os::Posix::copy_context(): the ucontext 
should have been copied with raw ::malloc,

I don't understand the comment - should it be referring to 
os::Posix::default_copy_ucontext? Shouldn't the issue with os::malloc 
use be documented in os::Posix::default_copy_ucontext? And the comment 
here just needs to say:

::free(uc);  // Use raw free to match the raw malloc

---

src/hotspot/os/posix/os_posix.hpp

+   // Copy an ucontext.

an -> a

+   // (which is most of them).Do

Space before Do

---

src/hotspot/os_cpu/linux_ppc/os_linux_ppc.cpp

Style nit: if (uc) -> if (uc != NULL)

---
src/hotspot/share/utilities/debug.hpp

   38 #endif
   39
   40 #ifdef CAN_SHOW_REGISTERS_ON_ASSERT

You can delete these three lines as you know you can show registers on 
assert.

  42   #define TOUCH_ASSERT_POISON (*(char*)g_assert_poison = 'X');

Why the casting? Why not declare g_assert_poison as intptr_t* and write 
an int into it?

Nit: usually macros don't include the trailing ; so that in the code 
they look like statements. (Though I think the use of macro and 
conditional compilation is overkill here.)

---

  test/hotspot/jtreg/runtime/ErrorHandling/ShowRegistersOnAssertTest.java

I think we need to be able to test a guarantee failure in a product 
build, else how do we know this works okay in that scenario? Do we have 
something in place to allow that?

---

Thanks,
David
-----


> The only difference to before is that I restored the ShouldNotReachHere()
> at the end of VMError::controlled_crash(). Before that assert, I added a
> hopefully clearer message than before to enable my test to scan for this
> message. I also changed the jtreg test to scan for this new message.
> 
> Note: in the jtreg test, I test that assert supression works with
> -XX:SupressErrorAt. But I just give it the file
> (-XX:SuppressErrorAt=/vmError.cpp)
> and omit the line number, because I do not want the test to stop working
> when someone adds lines in front of VMError::controlled_crash(). But that
> means that the ShouldNotReachHere() at the end of controlled_crash() does
> not fire either, because now all asserts in vmError.cpp are disabled. So, I
> need this new message.
> 
> Thanks, Thomas
> 
> On Mon, Nov 27, 2017 at 11:33 AM, Thomas Stüfe <thomas.stuefe at gmail.com>
> wrote:
> 
>> Ping... Could I please have reviews? Thank you!
>> ..Thomas
>>
>> On Wed, Nov 22, 2017 at 7:01 PM, Thomas Stüfe <thomas.stuefe at gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> may I please have reviews for the following enhancement:
>>>
>>> issue: https://bugs.openjdk.java.net/browse/JDK-8191101
>>> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8191101-
>>> show-registers-on-assert/webrev.00/webrev/
>>>
>>> (Patch looks big but lot of it is os_cpu fluff.)
>>>
>>> Prior email discussion at: http://mail.openjdk.java.net/p
>>> ipermail/hotspot-runtime-dev/2017-October/025018.html
>>>
>>> ----
>>>
>>> Basically, this adds the ability to show register values in assert
>>> situations into the error report. This can be useful in certain corner
>>> cases, e.g. if you want to know the value of some local variables or how
>>> deep your stack currently runs.
>>>
>>> This works by triggering a fault right when an assert happens and
>>> squirreling the context away for later error reporting.
>>>
>>> When an assert happens, we touch a poison page. To preserve the context
>>> as best as possible, we want to avoid running too much code after the
>>> assert condition has been evaluated, so this is done in a very simple way:
>>> directly in the assert macro, right after the condition is evaluated, we
>>> dereference the content of a global poison page pointer. In my tests, even
>>> with slowdebug, this only spoils one register, rax on x64.
>>>
>>> In the signal handler, we recognize the assertion poison fault by the
>>> faulting address. We disable the poison page and store the ucontext away.
>>> Then we just return from signal handling. Poison page is now disarmed, the
>>> load from it is retried and now goes through. Normal assertion handling is
>>> then resumed - so, things like -XX:SuppressAt are unaffected and work fine.
>>>
>>> Then, when an error report is generated due to this assert, we now also
>>> use the stored context. So now we get registers and instructions at the
>>> assert point.
>>>
>>> Right now, this is implemented on (non-zero) Linux, though other posix
>>> platforms should be no problem. Have not yet thought deeply about windows.
>>> It is tested and - in debug - switched on by default on linux x86, ppc,
>>> s390.
>>>
>>> If implemented, it can be switched on and off with
>>> -XX:+ShowRegistersOnAssert. This is a failsafe, in case the mechanism does
>>> not work and we want to have clean asserts.
>>>
>>> To test this, do a java -XX:ErrorHandlerTest=1 -XX:+ShowRegistersOnAssert
>>> with a not-product VM. On Linux x64, ppc and s390 we should now see the
>>> register output in the hs-err file:
>>>
>>>    4 #  Internal Error (/shared/projects/openjdk/jdk-
>>> hs/source/src/hotspot/share/utilities/vmError.cpp:1660), pid=4022,
>>> tid=4023
>>>    5 #  assert(str == NULL) failed: expected null
>>>    6 #
>>>    7 #
>>> ...
>>>   59 Registers:
>>>   60 RAX=0x00007f736c8f7000, RBX=0x0000000000000000,
>>> RCX=0x0000000000000000, RDX=0xffffffffffb5ded4
>>>   61 RSP=0x00007f736c8d8ce0, RBP=0x00007f736c8d8d30,
>>> RSI=0x0000000000000001, RDI=0x0000000000000001
>>>   62 R8 =0x0000000000000040, R9 =0x0000000000000001,
>>> R10=0x0000000000efb028, R11=0x00040788a244f42a
>>>   63 R12=0x0000000000000000, R13=0x00007fff3de46dbf,
>>> R14=0x00007f736c8d99c0, R15=0x0000000000000000
>>>   64 RIP=0x00007f736aec5529, EFLAGS=0x0000000000010202,
>>> CSGSFS=0x002b000000000033, ERR=0x0000000000000006
>>>   65   TRAPNO=0x000000000000000e
>>> ...
>>>   72
>>>   73 Instructions: (pc=0x00007f736aec5529)
>>>   74 0x00007f736aec5509:   8d 05 31 21 4a 00 48 01 d0 ff e0 48 83 7d c8 00
>>>   75 0x00007f736aec5519:   0f 84 7f 03 00 00 48 8d 05 82 fc b1 00 48 8b 00
>>>   76 0x00007f736aec5529:   c6 00 58 e8 cb 17 62 ff 84 c0 74 11 48 8d 3d 2f
>>>   77 0x00007f736aec5539:   1f 4a 00 b8 00 00 00 00 e8 ed 17 62 ff 48 8d 0d
>>>   78
>>>
>>> ---
>>>
>>> Implementation notes:
>>>
>>> - when handling the poison fault, we need to copy the context away from
>>> the signal handler stack. For posix, this means copying the ucontext_t.
>>> This is undefined territory. On most platforms, this simply means copying
>>> the ucontex_t as a flat structure. On some platforms more is needed, e.g.
>>> on linux ppc, we need to patch up the context after copying (the context is
>>> not position independent), and on MacOS, the context is not self-contained
>>> but contains pointers to sub structures which need to be copied too and
>>> whose size is unknown at compile time. Because of these platform
>>> dependencies, I factored out the copying of ucontext_t to
>>> os::Posix::copy_ucontext and its implementations are os_cpu specific.
>>>
>>> - As an added precaution, when copying the context, we use a safe version
>>> of memcpy (os::safe_memcpy) which I added to copy from potentially invalid
>>> memory regions. The reason is that we have seen on some Unices - e.g. hpux
>>> - that the size of the ucontext_t structure at runtime may be different
>>> from the build machine, so we tread carefully. os::safe_memcpy() uses
>>> SafeFetch to copy a range of memory.
>>>
>>> Risk:
>>>
>>> If this does not work, asserts will become segfaults, which can be
>>> confusing. But the feature can be disabled with -XX:+ShowRegistersOnAssert
>>> and for now on most platforms is disabled by default.
>>>
>>> ---
>>>
>>> Limitations:
>>> - as it is now implemented, this is a one-shot mechanism and only works
>>> for the first assert.
>>> - -XX:SuppressAt=... is not affected and works fine. However, if the
>>> first assert is suppressed, follow-up asserts will not show register values.
>>> - When multiple threads run into an assert, we may or may not see
>>> register values depending on which thread is the first of finishing the
>>> poison page handler.
>>>
>>> I do not think these limitations are severe. They can be solved, but at
>>> the cost of added complexity, which I preferred not to add.
>>>
>>> Thank you,
>>>
>>> Thomas
>>>
>>>
>>>
>>>
>>


More information about the hotspot-runtime-dev mailing list