HSX-25: code review (round 0) for VMError::report_and_die () dies on bad PC (8015884)

Daniel D. Daugherty daniel.daugherty at oracle.com
Mon Jul 1 16:07:10 PDT 2013


Sorry for the delay in getting back to you on this thread.
I've been focused on trying to figure out why the test was
broken on Win*...

I have a fix in mind so there will be a new webrev...


On 6/27/13 11:21 AM, Dmitry Samersoff wrote:
> Dan,
>
>> but I try not to change existing style unless I have to
>> make other logic changes to the line.
> os_linux.cpp:
>
> 1712 doesn't have redundant else, but 1795 does have.

The change from the old code:

1701   if (dladdr((void*)addr, &dlinfo) && dlinfo.dli_sname != NULL) {
<snip>
1709   } else if (dlinfo.dli_fname != NULL && dlinfo.dli_fbase != 0) {

to the new code:

1705   if (dladdr((void*)addr, &dlinfo) != 0) {
1706     // see if we have a matching symbol
1707     if (dlinfo.dli_saddr != NULL && dlinfo.dli_sname != NULL) {

<snip>

1713     }
1714     // no matching symbol so try for just file info
1715     if (dlinfo.dli_fname != NULL && dlinfo.dli_fbase != NULL) {

was not simply a cosmetic change to get rid of the redundant "else"
that was on line 1709.

In fact, this is one of the key fixes:

- the dladdr() call on old line 1701 could fail and
- the dlinfo fields on old line 1709 could still be used

In the new code:

- old line 1701 is split into new lines 1705 and 1707
- old line 1709 is now conditional inside the block
   started by new line 1705

>
> I would prefer to have the same stile in all similar
> code fragments if possible.

Yes, I understand that. In the case of old line 1709, I changed
the core logic of line (by moving it inside the now split-if)
so it made sense to also get rid of the redundant "else".

For line 1795, I just changed from an implied boolean to '!= 0'
so that doesn't really meet my criteria.

Dan


>
> -Dmitry
>
>
> On 2013-06-27 20:18, Daniel D. Daugherty wrote:
>> Dmitry,
>>
>> Thanks for the review!
>>
>> Sorry for the delay in responding. I was out of the office
>> for a couple of days.
>>
>> Replies embedded below.
>>
>>
>> On 6/22/13 11:51 AM, Dmitry Samersoff wrote:
>>> Dan,
>>>
>>> os_linux.cpp:
>>>
>>> 1685:
>>>     It might be better to just turn assert to guarantee. Something bad is
>>> happening if we can't obtain JVM base address.
>> So that would be:
>>
>> In os_bsd.cpp:
>>
>> 1232 bool os::address_is_in_vm(address addr) {
>> 1240     assert(libjvm_base_addr !=NULL, "Cannot obtain base address for
>> libjvm");
>>
>> In os_linux.cpp:
>>
>> 1680 bool os::address_is_in_vm(address addr) {
>> 1688     assert(libjvm_base_addr !=NULL, "Cannot obtain base address for
>> libjvm");
>>
>> In os_solaris.cpp:
>>
>> 1922 bool os::address_is_in_vm(address addr) {
>> 1930     assert(libjvm_base_addr !=NULL, "Cannot obtain base address for
>> libjvm");
>>
>> In os_windows.cpp:
>>
>> 1477 bool os::address_is_in_vm(address addr) {
>> 1481       assert(false, "Can't find jvm module.");
>>
>> I concur that not knowing the JVM base address would be bad.
>> I will change the above "assert" calls into "guarantee" calls
>> if other reviewers do not object.
>>
>>
>>
>>> 1795,1803:
>>> else is redundant
>> Agreed. My personal preference would be something like:
>>
>>    if (cond1) {
>>      <some work>
>>      return true;
>>    }
>>
>>    if (cond2) {
>>      <some other work>
>>      return true;
>>    }
>>
>>    <work to prepare for false return>
>>    return false;
>>
>> but I try not to change existing style unless I have to
>> make other logic changes to the line. In this case, I'm
>> just changing from an implied boolean to '!= 0' so that
>> doesn't really meet my criteria.
>>
>> The indenting is also wrong in this function and I'm
>> trying really hard to not fix... Since this will likely
>> get back ported to HSX-24, I shouldn't... must resist...
>>
>> Again, thanks for the review!
>>
>> Dan
>>
>>
>>> -Dmitry
>>>
>>>
>>> On 2013-06-22 02:44, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I have have a proposed fix for the following bug:
>>>>
>>>>       8015884 runThese crashed with SIGSEGV, hs_err has an error instead
>>>>               of stacktrace
>>>>       http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8015884
>>>>       https://jbs.oracle.com/bugs/browse/JDK-8015884
>>>>
>>>> Here are the HSX-25 webrev URLs:
>>>>
>>>> OpenJDK: http://cr.openjdk.java.net/~dcubed/8015884-webrev/0-hsx25/
>>>> Internal: http://javaweb.us.oracle.com/~ddaugher/8015884-webrev/0-hsx25/
>>>>
>>>> Testing:
>>>> -  test/runtime/6888954/vmerrors.sh has been updated to include a
>>>>      test for a bad function pointer
>>>> - vmerrors.sh has been used on MacOS X and Solaris X86 to verify
>>>>     that bad function pointers get better trouble shooting info
>>>> - (running, @ 24 hours and not yet done) JPRT build and test job
>>>> - (TBD) JPRT build and new test only job
>>>> - (TBD) Aurora Adhoc vm.quick batch for all platforms in the following
>>>>     configs: {Client VM, Server VM} x {fastdebug} x {-Xmixed}
>>>>
>>>> Just to be clear: This fix solves the problem of not getting a
>>>> stacktrace when we get a bad PC; it does not solve the underlying
>>>> reason why libpthread sent the VM into the weeds.
>>>>
>>>> Gory details are below. As always, comments, questions and
>>>> suggestions are welome.
>>>>
>>>> Dan
>>>>
>>>>
>>>> Gory Details:
>>>>
>>>> The primary bug fixes are:
>>>>
>>>> - the Dl_info struct should only be used if dladdr() has
>>>>     returned non-zero (no errors) and always check the dladdr()
>>>>     return value
>>>> - the Dl_info.dli_sname and Dl_info.dli_saddr fields should
>>>>     only be used if non-NULL
>>>> - the first two fixes are based on the Solaris dladdr() man page
>>>> - make logic consistent between MacOS X, Linux and Solaris:
>>>>     - offset is only set when a matching symbol is found (and
>>>>       the pointer is non-NULL)
>>>>
>>>> Cleanup bug fixes are:
>>>>
>>>> - add assert()'s to verify os::dll_address_to_function_name() and
>>>>     os::dll_address_to_library_name() contract for 'buf' parameter
>>>>
>>>> - HotSpot style: explicitly check int function return values
>>>>     rather that assume conversion to boolean, e.g.
>>>>
>>>>       if (dladdr(...)) {  =>  if (dladdr(...) != 0) {
>>>>
>>>> - consistently compare Dl_info fields against NULL instead of a mix of
>>>>     '0', NULL and implied boolean
>>>>
>>>> - the Dl_info.dli_fname and Dl_info.dli_fbase fields are only used if
>>>>     non-NULL; this was done for consistency (and paranoia); the Solaris
>>>>     man page does not state whether or not NULL is a permitted value for
>>>>     these fields
>>>>
>>>> Test changes:
>>>> - add case "13" for bad function pointers
>>>> - change VM-side of the test code so that case "12", the
>>>>     os::signal_raise(SIGSEGV) case, does not have to be last;
>>>>     this puts a ShouldNotReachHere() call after the test_num
>>>>     case statement so this test function will always crash
>>>>     the VM in non-product bits
>>>> - change VM-side of the test code to issue an ERROR message
>>>>     when an unexpected test_num value is passed
>>>>
>



More information about the hotspot-runtime-dev mailing list