HSX-24 Backport: code review (round 0) for VMError::report_and_die () dies on bad PC (8015884)

Gerard Ziemski gerard.ziemski at oracle.com
Tue Jul 9 08:21:00 PDT 2013


hi Dan,

Are the comment around the case 12 and 13 in test_error_handler right?

Is it really possible that dereferencing NULL will not cause the 
hardware to raise a signal?


cheers

On 7/8/2013 5:01 PM, Daniel D. Daugherty wrote:
> Greetings,
>
> I have an HSX-24 backport of the proposed fix for the following bug:
>
>     8015884 runThese crashed with SIGSEGV, hs_err has an error instead
>             of stacktrace
>     http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8015884
>     https://jbs.oracle.com/bugs/browse/JDK-8015884
>
> Here are the (round 0) webrev URLs for the HSX-24 backport:
>
> OpenJDK: http://cr.openjdk.java.net/~dcubed/8015884-webrev/0-hsx24/
> Internal: http://javaweb.us.oracle.com/~ddaugher/8015884-webrev/0-hsx24/
>
> Here are the (round 1) webrev URLs for the HSX-25 version for reference:
>
> OpenJDK: http://cr.openjdk.java.net/~dcubed/8015884-webrev/1-hsx25/
> Internal: http://javaweb.us.oracle.com/~ddaugher/8015884-webrev/1-hsx25/
>
> The easiest way to review the backport is to download and compare
> the two patch files with your favorite file comparison tool:
>
> OpenJDK patch links:
> http://cr.openjdk.java.net/~dcubed/8015884-webrev/0-hsx24/8015884_exp_for_hsx24.patch 
>
> http://cr.openjdk.java.net/~dcubed/8015884-webrev/1-hsx25/8015884_exp_for_hsx25.patch 
>
>
> Internal patch links:
> http://javaweb.us.oracle.com/~ddaugher/8015884-webrev/0-hsx24/8015884_exp_for_hsx24.patch 
>
> http://javaweb.us.oracle.com/~ddaugher/8015884-webrev/1-hsx25/8015884_exp_for_hsx25.patch 
>
>
> For 'diff' oriented people (like me), here's my quick sanity of the two
> patch files:
>     - diff the patches
>     - strip down to line additions and deletions
>     - strip out the copyright changes
>
> $ diff 8015884-webrev/1-hsx25/8015884_exp_for_hsx25.patch \
>     8015884-webrev/0-hsx24/8015884_exp_for_hsx24.patch \
>     | grep '^[<>] [+-] ' | grep -v Copyright
> > -   char fname[32];
> > -   pid_t pid = os::Bsd::gettid();
> > +  char fname[32];
> > +  pid_t pid = os::Bsd::gettid();
> > -   jio_snprintf(fname, sizeof(fname), "/proc/%d/maps", pid);
> > +  jio_snprintf(fname, sizeof(fname), "/proc/%d/maps", pid);
> > -   if (!_print_ascii_file(fname, st)) {
> > -     st->print("Can not get library information for pid = %d\n", pid);
> > -   }
> > +  if (!_print_ascii_file(fname, st)) {
> > +    st->print("Can not get library information for pid = %d\n", pid);
> > +  }
> < +  #ifndef CALL_TEST_FUNC_WITH_WRAPPER_IF_NEEDED
> > +  #ifndef TARGET_OS_FAMILY_windows
> < -    NOT_PRODUCT(if (ReplayCompiles) ciReplay::replay(thread);)
> < +    if (ReplayCompiles) ciReplay::replay(thread);
>
> The majority of the diffs are for white-space changes in the HSX-24
> version where the code no longer exists in the HSX-25 version. The
> TARGET_OS_FAMILY_windows diff is for the change made after HSX-25 code
> review round 1 which was reviewed via e-mail only. The ReplayCompiles
> diff is for an option that doesn't exist in HSX-24.
>
> The remainder is an update to the HSX-25 round 0 code review invite.
>
> Testing:
> - test/runtime/6888954/vmerrors.sh has been updated to test for a bad
>   data pointer instead of os::signal_raise() and includes a new test
>   for a bad function pointer
> - (running) JPRT build and test job
> - (running) JPRT build and new test only job
> - (TBD) Aurora Adhoc vm.quick batch for all platforms in the following
>   configs: {Client VM, Server VM} x {fastdebug} x {-Xmixed}
>
> Just to be clear: This fix solves the problem of not getting a
> stacktrace when we get a bad PC; it does not solve the underlying
> reason why libpthread sent the VM into the weeds.
>
> Gory details are below. As always, comments, questions and
> suggestions are welome.
>
> Dan
>
>
> Gory Details:
>
> The primary bug fixes are:
>
> - the Dl_info struct should only be used if dladdr() has
>   returned non-zero (no errors) and always check the dladdr()
>   return value
> - the Dl_info.dli_sname and Dl_info.dli_saddr fields should
>   only be used if non-NULL
> - the first two fixes are based on the Solaris dladdr() man page
> - make logic consistent between MacOS X, Linux and Solaris:
>   - offset is only set when a matching symbol is found (and
>     the pointer is non-NULL)
> - additional changes to get proper hs_err_pid file generation in 
> non-product
>   Win* bits when the following tests are executed:
>     -XX:ErrorHandlerTest=N option aka test_error_handler() function
>     -XX:+ExecuteInternalVMTests option aka execute_internal_vm_tests() 
> function
>
> Cleanup bug fixes are:
>
> - add assert()'s to verify os::dll_address_to_function_name() and
>   os::dll_address_to_library_name() contract for 'buf' parameter
>
> - HotSpot style: explicitly check int function return values
>   rather that assume conversion to boolean, e.g.
>
>     if (dladdr(...)) {  =>  if (dladdr(...) != 0) {
>
> - consistently compare Dl_info fields against NULL instead of a mix of
>   '0', NULL and implied boolean
>
> - the Dl_info.dli_fname and Dl_info.dli_fbase fields are only used if
>   non-NULL; this was done for consistency (and paranoia); the Solaris
>   man page does not state whether or not NULL is a permitted value for
>   these fields
>
> - get rid of redundant "else" statements in modified code - thanks 
> Dmitry!
> - cleanup whitespace/indenting in modified code
> - switch to PTR_FORMAT in some print calls - thanks Harold!
>
> Test changes:
> - update case "12" to test for bad data pointer instead of calling
>   os::signal_raise(SIGSEGV)
> - add case "13" for bad function pointers
> - change VM-side of the test code to issue an ERROR message
>   when an unexpected test_num value is passed
> - change VM-side of the test code to call ShouldNotReachHere() if
>   the code makes it out of the case statement
>
>



More information about the hotspot-runtime-dev mailing list