RFR(s): 8144219: [posix] Remove redundant code around os::print_siginfo()

Thomas Stüfe thomas.stuefe at gmail.com
Fri Dec 11 08:34:53 UTC 2015


Hi David,

New webrev.03:
http://cr.openjdk.java.net/~stuefe/webrevs/8144219-posix-os-print_siginfo/webrev.03/webrev/
And the delta to webrev.02:
http://cr.openjdk.java.net/~stuefe/webrevs/8144219-posix-os-print_siginfo/webrev.03-base-02/webrev/index.html

Just worked in your suggestions and nothing else.

--

On Solaris I found a curious behaviour, btw: there, when sending
synchronous error signals from the outside via kill (e.g. kill -fpe <pid>),
the vm immediately crashes with "Segmentation fault" as if no signal
handler were installed. On Linux (and afaik other Unices), there is no
difference between an internally generated and an externally sent
synchronous signal - in both cases our signal handler jumps in an produces
an hs-err pid. The only difference is that the si_code is different.

I tested this with and without my change, both sides behave the same, so I
think this is a Solaris speciality. Curious, nonetheless.

Thanks, Thomas


On Fri, Dec 11, 2015 at 2:49 AM, David Holmes <david.holmes at oracle.com>
wrote:

> Hi Thomas,
>
> This all looks fine to me - just some minor nits below :). I'm running
> this through JPRT to see if everything works okay. Still need a second
> reviewer of course.
>
> Nit: I find the notation:
>
> const /* real-type */ void* siginfo
>
> somewhat distracting - especially when we see siginfo being cast to the
> correct type within the next two lines.
>
> ---
>
> src/os/posix/vm/os_posix.cpp
>
> Nit: os->print(" (me)"); me -> current process ?
>
> ---
>
> src/share/vm/utilities/vmError.cpp
>
> Nit: CDS. -> CDS archive.
>
> ---
>
> src/share/vm/utilities/vmError.hpp
>
> Nit: CDS store -> CDS archive
>
>
> Thanks,
> David
>
> On 10/12/2015 12:22 AM, Thomas Stüfe wrote:
>
>> Hi David,
>>
>> thanks for the review!
>>
>> my new webrev:
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8144219-posix-os-print_siginfo/webrev.02/webrev/
>>
>> On Tue, Dec 8, 2015 at 7:57 AM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>>     Hi Thomas,
>>
>>     Sorry for the delay.
>>
>>     On 30/11/2015 8:03 PM, Thomas Stüfe wrote:
>>
>>         Hi all,
>>
>>         please review this small change:
>>
>>
>>     Maybe not so small :)
>>
>>
>> Well. I did name it (s), not (xxs) :)
>>
>>
>>
>>         Webrev:
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8144219-posix-os-print_siginfo/webrev.00/webrev/
>>         Bug Report: https://bugs.openjdk.java.net/browse/JDK-8144219
>>
>>         This is mainly a cleanup change for posix platforms. It gets rid
>> of
>>         multiple copies of os::print_siginfo() in favour of the central
>>         os::Posix::print_siginfo() in os_posix.cpp.
>>
>>
>>     Something seems not right. Currently vmError calls
>>     os::print_siginfo, and the per-posix-platform os::print_siginfo then
>>     call os::Posix::print_siginfo_brief. And now you have
>>     os::Posix::print_siginfo in os_posix.cpp, but that should be
>>     os::print_siginfo - no Posix namespace ??
>>
>>
>> There is os::print_siginfo(outputStream*, void*), for which exist two
>> implementations, one for Windows, one for Posix. In my first webrev I
>> also kept os::Posix::print_siginfo(outputStream*, siginfo_t*), because I
>> wanted a version which takes a real siginfo_t*.
>> os::print_siginfo(outputStream*, void*) for posix just called
>> os::Posix::print_siginfo(outputStream*, siginfo_t*).
>>
>> To simplify things, I removed the os::Posix::print_siginfo(). Now we
>> just have os::print_siginfo(outputStream*, const void*) in two variants,
>> one for windows, one for posix.
>>
>> (I also made the pointer const for const correctness)
>>
>>         It also extends os::Posix::print_siginfo() to write more details
>>         about the
>>         received signal which stems from our experiences with various
>>         Unices over
>>         the time. This will improve hs-err file printout.
>>
>>
>>     The old code was deliberately intended as a simple one-liner for the
>>     error report. But okay ....
>>
>>     987   os->print_cr("siginfo @ " PTR_FORMAT ":", p2i(si));
>>
>>     What is the value in printing the pointer to the siginfo_t ??
>>
>>     I personally prefer to see si_signo, si_code and si_errno to be
>>     clear that what is printed are the fields of the siginfo_t.
>>
>>     1030 #ifdef _AIX
>>     1031   } else if (si->si_code == SI_EMPTY) {
>>     1032     os->print_cr("  Signal may have been sent by another
>> thread.");
>>     1033 #endif
>>     1034 #ifdef __linux
>>     1035   } else if (si->si_code == SI_TKILL) {
>>     1036     os->print_cr("  Signal has been sent by another thread.");
>>     1037 #endif
>>     1038 #ifdef __sun
>>     1039   } else if (si->si_code == SI_LWP) {
>>     1040     os->print_cr("  Signal has been sent by another thread.");
>>     1041 #endif
>>     1042   } else if (sig == SIGSEGV || sig == SIGBUS) {
>>     1043     os->print_cr("  Address of faulting memory reference: "
>>     PTR_FORMAT, p2i(si->si_addr));
>>     1044   } else if (sig == SIGILL || sig == SIGFPE) {
>>     1045     os->print_cr("  Address of faulting instruction: "
>>     PTR_FORMAT, p2i(si->si_addr));
>>     1046   } else if (sig == SIGTRAP) {
>>     1047     os->print_cr("  Address of trap instruction: " PTR_FORMAT,
>>     p2i(si->si_addr));
>>     1048 #ifndef __APPLE__
>>     1049   } else if (sig == SIGPOLL) {
>>     1050     os->print_cr("  Band event: " PTR64_FORMAT,
>>     (uint64_t)si->si_band);
>>     1051 #endif
>>
>>     These should be conditional on the specific signal or value being
>>     defined, not platform specific. So the __APPLE__  goes back to
>>     SIGPOLL (even if it is always defined you'll never receive it on
>>     platforms that don't use it). The others should check SI_EMPTY etc.
>>
>>     Note that there are a number of other possibilities just on Solaris:
>>
>>     https://docs.oracle.com/cd/E23823_01/html/816-5173/siginfo-3head.html
>>
>>     That said we don't need to give this much detail in my opinion. If
>>     you get a si_code of SI_LWP for example then you can go an look at
>>     the manpage to find out exactly what it means - we don't want to be
>>     maintaining that information here. So I'm not sure we really
>>     need/want this expanded level of detail.
>>
>>
>> Okay. I think we (SAP) do not care so much about the size of the hs-err
>> file and trade a bit more verbosity in return for giving more detailled
>> information to our support people. But I can see your point, and it is a
>> matter of preferences.
>>
>> In the new version, I kept the output of os::print_siginfo very compact
>> and on one line. We print si_signo, si_code unconditionally, si_errno if
>> set, and then decide depending on the code whether to print si_addr or
>> si_pid/si_uid or si_band.
>>
>>
>>         Note that there was on all platforms code which examines the
>>         crash address
>>         for a possible CDS fault and prints a specialized hint. In my
>>         opinion that
>>         makes no sense in a general os::print_siginfo() function and I
>>         moved it to
>>         a separate error reporting step into vmError.cpp. This may be a
>>         point of
>>         contention because the way to extract the fault address from a
>>         siginfo/EXCEPTION_RECORD structure is platform dependend, but
>>         imho this is
>>         still the least ugly solution.
>>
>>
>>     Sorry I don't like seeing the _WIN32 condition in the vmError.cpp
>>     code. By keeping this as part of the print_siginfo the platform
>>     specific aspect is handled transparently. As something that is only
>>     used as part of error reporting, and which is directly related to a
>>     signal being received the current placement doesn't seem
>>     unreasonable. If you really want to factor this out then I would add
>>     os::check_for_failing_cds_access() or something like that.
>>
>>
>> I really did not like this in os::print_siginfo(). One does not expect
>> os::print_siginfo() to know about cds specialities, and to know that it
>> only gets called in the context of error handling (who says this is
>> always true?)
>>
>> But I agree the platform dependend code in vmError.cpp is ugly. I moved
>> it to vmError_posix.cpp and vmError_windows.cpp respectively, and
>> renamed it per your suggestion. I did not want to add this to the os::
>> namespace because it is really a highly specialized function which only
>> makes sense in error handling.
>>
>> Thanks & Regards, Thomas
>>
>>     Thanks,
>>     David
>>     -----
>>
>>         Kind regards, Thomas
>>
>>
>>


More information about the hotspot-runtime-dev mailing list