HSX-24 Backport: code review (round 0) for VMError::report_and_die	() dies on bad PC (8015884)
    David Holmes 
    david.holmes at oracle.com
       
    Wed Jul 10 23:15:46 PDT 2013
    
    
  
On 10/07/2013 6:13 AM, Daniel D. Daugherty wrote:
> On 7/9/13 9:21 AM, Gerard Ziemski wrote:
>> hi Dan,
>>
>> Are the comment around the case 12 and 13 in test_error_handler right?
>
> Yes and no. :-)
>
>
>> Is it really possible that dereferencing NULL will not cause the
>> hardware to raise a signal?
>
> For all of the platforms that we test, dereferencing NULL causes a
> crash (as expected). However, I can't say what other platforms do...
As I recall some OS (Solaris) has a setting that allows access to the 
full address range such that accessing 0 does not cause a fault. Can't 
recall or locate the details but it broke our implicit null reference 
checks.
David
-----
> Dan
>
>
>>
>>
>> cheers
>>
>> On 7/8/2013 5:01 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> I have an HSX-24 backport of the proposed fix for the following bug:
>>>
>>>     8015884 runThese crashed with SIGSEGV, hs_err has an error instead
>>>             of stacktrace
>>>     http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8015884
>>>     https://jbs.oracle.com/bugs/browse/JDK-8015884
>>>
>>> Here are the (round 0) webrev URLs for the HSX-24 backport:
>>>
>>> OpenJDK: http://cr.openjdk.java.net/~dcubed/8015884-webrev/0-hsx24/
>>> Internal: http://javaweb.us.oracle.com/~ddaugher/8015884-webrev/0-hsx24/
>>>
>>> Here are the (round 1) webrev URLs for the HSX-25 version for reference:
>>>
>>> OpenJDK: http://cr.openjdk.java.net/~dcubed/8015884-webrev/1-hsx25/
>>> Internal: http://javaweb.us.oracle.com/~ddaugher/8015884-webrev/1-hsx25/
>>>
>>> The easiest way to review the backport is to download and compare
>>> the two patch files with your favorite file comparison tool:
>>>
>>> OpenJDK patch links:
>>> http://cr.openjdk.java.net/~dcubed/8015884-webrev/0-hsx24/8015884_exp_for_hsx24.patch
>>>
>>> http://cr.openjdk.java.net/~dcubed/8015884-webrev/1-hsx25/8015884_exp_for_hsx25.patch
>>>
>>>
>>> Internal patch links:
>>> http://javaweb.us.oracle.com/~ddaugher/8015884-webrev/0-hsx24/8015884_exp_for_hsx24.patch
>>>
>>> http://javaweb.us.oracle.com/~ddaugher/8015884-webrev/1-hsx25/8015884_exp_for_hsx25.patch
>>>
>>>
>>> For 'diff' oriented people (like me), here's my quick sanity of the two
>>> patch files:
>>>     - diff the patches
>>>     - strip down to line additions and deletions
>>>     - strip out the copyright changes
>>>
>>> $ diff 8015884-webrev/1-hsx25/8015884_exp_for_hsx25.patch \
>>>     8015884-webrev/0-hsx24/8015884_exp_for_hsx24.patch \
>>>     | grep '^[<>] [+-] ' | grep -v Copyright
>>> > -   char fname[32];
>>> > -   pid_t pid = os::Bsd::gettid();
>>> > +  char fname[32];
>>> > +  pid_t pid = os::Bsd::gettid();
>>> > -   jio_snprintf(fname, sizeof(fname), "/proc/%d/maps", pid);
>>> > +  jio_snprintf(fname, sizeof(fname), "/proc/%d/maps", pid);
>>> > -   if (!_print_ascii_file(fname, st)) {
>>> > -     st->print("Can not get library information for pid = %d\n",
>>> pid);
>>> > -   }
>>> > +  if (!_print_ascii_file(fname, st)) {
>>> > +    st->print("Can not get library information for pid = %d\n", pid);
>>> > +  }
>>> < +  #ifndef CALL_TEST_FUNC_WITH_WRAPPER_IF_NEEDED
>>> > +  #ifndef TARGET_OS_FAMILY_windows
>>> < -    NOT_PRODUCT(if (ReplayCompiles) ciReplay::replay(thread);)
>>> < +    if (ReplayCompiles) ciReplay::replay(thread);
>>>
>>> The majority of the diffs are for white-space changes in the HSX-24
>>> version where the code no longer exists in the HSX-25 version. The
>>> TARGET_OS_FAMILY_windows diff is for the change made after HSX-25 code
>>> review round 1 which was reviewed via e-mail only. The ReplayCompiles
>>> diff is for an option that doesn't exist in HSX-24.
>>>
>>> The remainder is an update to the HSX-25 round 0 code review invite.
>>>
>>> Testing:
>>> - test/runtime/6888954/vmerrors.sh has been updated to test for a bad
>>>   data pointer instead of os::signal_raise() and includes a new test
>>>   for a bad function pointer
>>> - (running) JPRT build and test job
>>> - (running) JPRT build and new test only job
>>> - (TBD) Aurora Adhoc vm.quick batch for all platforms in the following
>>>   configs: {Client VM, Server VM} x {fastdebug} x {-Xmixed}
>>>
>>> Just to be clear: This fix solves the problem of not getting a
>>> stacktrace when we get a bad PC; it does not solve the underlying
>>> reason why libpthread sent the VM into the weeds.
>>>
>>> Gory details are below. As always, comments, questions and
>>> suggestions are welome.
>>>
>>> Dan
>>>
>>>
>>> Gory Details:
>>>
>>> The primary bug fixes are:
>>>
>>> - the Dl_info struct should only be used if dladdr() has
>>>   returned non-zero (no errors) and always check the dladdr()
>>>   return value
>>> - the Dl_info.dli_sname and Dl_info.dli_saddr fields should
>>>   only be used if non-NULL
>>> - the first two fixes are based on the Solaris dladdr() man page
>>> - make logic consistent between MacOS X, Linux and Solaris:
>>>   - offset is only set when a matching symbol is found (and
>>>     the pointer is non-NULL)
>>> - additional changes to get proper hs_err_pid file generation in
>>> non-product
>>>   Win* bits when the following tests are executed:
>>>     -XX:ErrorHandlerTest=N option aka test_error_handler() function
>>>     -XX:+ExecuteInternalVMTests option aka
>>> execute_internal_vm_tests() function
>>>
>>> Cleanup bug fixes are:
>>>
>>> - add assert()'s to verify os::dll_address_to_function_name() and
>>>   os::dll_address_to_library_name() contract for 'buf' parameter
>>>
>>> - HotSpot style: explicitly check int function return values
>>>   rather that assume conversion to boolean, e.g.
>>>
>>>     if (dladdr(...)) {  =>  if (dladdr(...) != 0) {
>>>
>>> - consistently compare Dl_info fields against NULL instead of a mix of
>>>   '0', NULL and implied boolean
>>>
>>> - the Dl_info.dli_fname and Dl_info.dli_fbase fields are only used if
>>>   non-NULL; this was done for consistency (and paranoia); the Solaris
>>>   man page does not state whether or not NULL is a permitted value for
>>>   these fields
>>>
>>> - get rid of redundant "else" statements in modified code - thanks
>>> Dmitry!
>>> - cleanup whitespace/indenting in modified code
>>> - switch to PTR_FORMAT in some print calls - thanks Harold!
>>>
>>> Test changes:
>>> - update case "12" to test for bad data pointer instead of calling
>>>   os::signal_raise(SIGSEGV)
>>> - add case "13" for bad function pointers
>>> - change VM-side of the test code to issue an ERROR message
>>>   when an unexpected test_num value is passed
>>> - change VM-side of the test code to call ShouldNotReachHere() if
>>>   the code makes it out of the case statement
>>>
>>>
>>
>
    
    
More information about the hotspot-runtime-dev
mailing list