RFR (S): 8007779: os::die() on solaris should generate core file

Fri Feb 8 10:21:56 PST 2013

The 0x80 bit should be part of the value returned by wait(2), but it may 
be shell-specific which
bits are captured by $?.

dl

On 2/8/2013 8:04 AM, Mikael Vidstedt wrote:
> On 2013-02-08 07:12, Mikael Gerdin wrote:
>> On 2013-02-08 15:05, Staffan Larsen wrote:
>>> The return code from the process seems to be 134 (after an 
>>> experiment). This would be the same as after a successful printing 
>>> of hs_err when we do manage to create a core dump.
>>
>> When a posix process is terminated by an uncaught fatal signal the 
>> exit code is usually 128 + SIGNAL
>> Since SIGABRT == 6 you got 134
>
> I believe the 128+n may be for bash specifically, not for general 
> posix processes, but the same conclusion goes.
>
> /Another Mikael
>
>>
>> /Mikael
>>
>>>
>>> /Staffan
>>>
>>> On 8 feb 2013, at 14:54, David Holmes <david.holmes at oracle.com> wrote:
>>>
>>>> My other email hasn't turned up yet but I was confusing this with 
>>>> the change that added the dump_core flag to os::abort.
>>>>
>>>> It's only by "accident" that we use ::abort on linux - _exit didn't 
>>>> work back in the old days of LinuxThreads :)
>>>>
>>>> This seems like a simple and potentially useful change, but I have 
>>>> a feeling it may have some unexpected consequences somewhere. :)
>>>>
>>>> Actually one possible consequence - what return code will the 
>>>> process issue if it now hits this? Could this impact testing and 
>>>> failure matching ?
>>>>
>>>> David
>>>>
>>>> On 8/02/2013 10:24 PM, Staffan Larsen wrote:
>>>>> This is a request for review of a small change to the crash 
>>>>> reporting on solaris.
>>>>>
>>>>> When hotspot crashes during the writing of the hs_err file, we 
>>>>> call os::die(). On linux and bsd this causes a core file to be 
>>>>> written (by calling ::abort()). This is good since we then have 
>>>>> some record of what went wrong. On solaris, we call _exit() and no 
>>>>> core file is created.
>>>>>
>>>>> There are two cases during the hs_err writing where we call 
>>>>> os::die(). First, if the writing hangs, the WatcherThread will 
>>>>> call os::die(). Second, if we get too many errors during the 
>>>>> writing we will call os::die(). In both these cases it would be 
>>>>> very helpful to have a core file. Otherwise all you have to go on 
>>>>> is something like this:
>>>>>
>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>> #
>>>>> # SIGSEGV (0xb) at pc=0xffffffff653848c0, pid=11823, tid=240
>>>>> #
>>>>> # JRE version: Java(TM) SE Runtime Environment (7.0_12-b11)
>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.0-b24 mixed mode 
>>>>> solaris-sparc compressed oops)
>>>>> # Problematic frame:
>>>>> # C [libc.so.1+0x848c0]# [ timer expired, abort... ]
>>>>>
>>>>> Below is the change I would like to do.
>>>>>
>>>>> Thanks,
>>>>> /Staffan
>>>>>
>>>>>
>>>>> diff --git a/src/os/solaris/vm/os_solaris.cpp 
>>>>> b/src/os/solaris/vm/os_solaris.cpp
>>>>> --- a/src/os/solaris/vm/os_solaris.cpp
>>>>> +++ b/src/os/solaris/vm/os_solaris.cpp
>>>>> @@ -1865,7 +1865,7 @@
>>>>>
>>>>>   // Die immediately, no exit hook, no abort hook, no cleanup.
>>>>>   void os::die() {
>>>>> -  _exit(-1);
>>>>> +  ::abort(); // dump core (for debugging)
>>>>>   }
>>>>>
>>>>>
>>>
>