HSX-25: code review (round 0) for VMError::report_and_die () dies on bad PC (8015884)
Daniel D. Daugherty
daniel.daugherty at oracle.com
Mon Jul 1 16:07:10 PDT 2013
Sorry for the delay in getting back to you on this thread.
I've been focused on trying to figure out why the test was
broken on Win*...
I have a fix in mind so there will be a new webrev...
On 6/27/13 11:21 AM, Dmitry Samersoff wrote:
> Dan,
>
>> but I try not to change existing style unless I have to
>> make other logic changes to the line.
> os_linux.cpp:
>
> 1712 doesn't have redundant else, but 1795 does have.
The change from the old code:
1701 if (dladdr((void*)addr, &dlinfo) && dlinfo.dli_sname != NULL) {
<snip>
1709 } else if (dlinfo.dli_fname != NULL && dlinfo.dli_fbase != 0) {
to the new code:
1705 if (dladdr((void*)addr, &dlinfo) != 0) {
1706 // see if we have a matching symbol
1707 if (dlinfo.dli_saddr != NULL && dlinfo.dli_sname != NULL) {
<snip>
1713 }
1714 // no matching symbol so try for just file info
1715 if (dlinfo.dli_fname != NULL && dlinfo.dli_fbase != NULL) {
was not simply a cosmetic change to get rid of the redundant "else"
that was on line 1709.
In fact, this is one of the key fixes:
- the dladdr() call on old line 1701 could fail and
- the dlinfo fields on old line 1709 could still be used
In the new code:
- old line 1701 is split into new lines 1705 and 1707
- old line 1709 is now conditional inside the block
started by new line 1705
>
> I would prefer to have the same stile in all similar
> code fragments if possible.
Yes, I understand that. In the case of old line 1709, I changed
the core logic of line (by moving it inside the now split-if)
so it made sense to also get rid of the redundant "else".
For line 1795, I just changed from an implied boolean to '!= 0'
so that doesn't really meet my criteria.
Dan
>
> -Dmitry
>
>
> On 2013-06-27 20:18, Daniel D. Daugherty wrote:
>> Dmitry,
>>
>> Thanks for the review!
>>
>> Sorry for the delay in responding. I was out of the office
>> for a couple of days.
>>
>> Replies embedded below.
>>
>>
>> On 6/22/13 11:51 AM, Dmitry Samersoff wrote:
>>> Dan,
>>>
>>> os_linux.cpp:
>>>
>>> 1685:
>>> It might be better to just turn assert to guarantee. Something bad is
>>> happening if we can't obtain JVM base address.
>> So that would be:
>>
>> In os_bsd.cpp:
>>
>> 1232 bool os::address_is_in_vm(address addr) {
>> 1240 assert(libjvm_base_addr !=NULL, "Cannot obtain base address for
>> libjvm");
>>
>> In os_linux.cpp:
>>
>> 1680 bool os::address_is_in_vm(address addr) {
>> 1688 assert(libjvm_base_addr !=NULL, "Cannot obtain base address for
>> libjvm");
>>
>> In os_solaris.cpp:
>>
>> 1922 bool os::address_is_in_vm(address addr) {
>> 1930 assert(libjvm_base_addr !=NULL, "Cannot obtain base address for
>> libjvm");
>>
>> In os_windows.cpp:
>>
>> 1477 bool os::address_is_in_vm(address addr) {
>> 1481 assert(false, "Can't find jvm module.");
>>
>> I concur that not knowing the JVM base address would be bad.
>> I will change the above "assert" calls into "guarantee" calls
>> if other reviewers do not object.
>>
>>
>>
>>> 1795,1803:
>>> else is redundant
>> Agreed. My personal preference would be something like:
>>
>> if (cond1) {
>> <some work>
>> return true;
>> }
>>
>> if (cond2) {
>> <some other work>
>> return true;
>> }
>>
>> <work to prepare for false return>
>> return false;
>>
>> but I try not to change existing style unless I have to
>> make other logic changes to the line. In this case, I'm
>> just changing from an implied boolean to '!= 0' so that
>> doesn't really meet my criteria.
>>
>> The indenting is also wrong in this function and I'm
>> trying really hard to not fix... Since this will likely
>> get back ported to HSX-24, I shouldn't... must resist...
>>
>> Again, thanks for the review!
>>
>> Dan
>>
>>
>>> -Dmitry
>>>
>>>
>>> On 2013-06-22 02:44, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I have have a proposed fix for the following bug:
>>>>
>>>> 8015884 runThese crashed with SIGSEGV, hs_err has an error instead
>>>> of stacktrace
>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8015884
>>>> https://jbs.oracle.com/bugs/browse/JDK-8015884
>>>>
>>>> Here are the HSX-25 webrev URLs:
>>>>
>>>> OpenJDK: http://cr.openjdk.java.net/~dcubed/8015884-webrev/0-hsx25/
>>>> Internal: http://javaweb.us.oracle.com/~ddaugher/8015884-webrev/0-hsx25/
>>>>
>>>> Testing:
>>>> - test/runtime/6888954/vmerrors.sh has been updated to include a
>>>> test for a bad function pointer
>>>> - vmerrors.sh has been used on MacOS X and Solaris X86 to verify
>>>> that bad function pointers get better trouble shooting info
>>>> - (running, @ 24 hours and not yet done) JPRT build and test job
>>>> - (TBD) JPRT build and new test only job
>>>> - (TBD) Aurora Adhoc vm.quick batch for all platforms in the following
>>>> configs: {Client VM, Server VM} x {fastdebug} x {-Xmixed}
>>>>
>>>> Just to be clear: This fix solves the problem of not getting a
>>>> stacktrace when we get a bad PC; it does not solve the underlying
>>>> reason why libpthread sent the VM into the weeds.
>>>>
>>>> Gory details are below. As always, comments, questions and
>>>> suggestions are welome.
>>>>
>>>> Dan
>>>>
>>>>
>>>> Gory Details:
>>>>
>>>> The primary bug fixes are:
>>>>
>>>> - the Dl_info struct should only be used if dladdr() has
>>>> returned non-zero (no errors) and always check the dladdr()
>>>> return value
>>>> - the Dl_info.dli_sname and Dl_info.dli_saddr fields should
>>>> only be used if non-NULL
>>>> - the first two fixes are based on the Solaris dladdr() man page
>>>> - make logic consistent between MacOS X, Linux and Solaris:
>>>> - offset is only set when a matching symbol is found (and
>>>> the pointer is non-NULL)
>>>>
>>>> Cleanup bug fixes are:
>>>>
>>>> - add assert()'s to verify os::dll_address_to_function_name() and
>>>> os::dll_address_to_library_name() contract for 'buf' parameter
>>>>
>>>> - HotSpot style: explicitly check int function return values
>>>> rather that assume conversion to boolean, e.g.
>>>>
>>>> if (dladdr(...)) { => if (dladdr(...) != 0) {
>>>>
>>>> - consistently compare Dl_info fields against NULL instead of a mix of
>>>> '0', NULL and implied boolean
>>>>
>>>> - the Dl_info.dli_fname and Dl_info.dli_fbase fields are only used if
>>>> non-NULL; this was done for consistency (and paranoia); the Solaris
>>>> man page does not state whether or not NULL is a permitted value for
>>>> these fields
>>>>
>>>> Test changes:
>>>> - add case "13" for bad function pointers
>>>> - change VM-side of the test code so that case "12", the
>>>> os::signal_raise(SIGSEGV) case, does not have to be last;
>>>> this puts a ShouldNotReachHere() call after the test_num
>>>> case statement so this test function will always crash
>>>> the VM in non-product bits
>>>> - change VM-side of the test code to issue an ERROR message
>>>> when an unexpected test_num value is passed
>>>>
>
More information about the hotspot-runtime-dev
mailing list