RFR(s): 8144219: [posix] Remove redundant code around os::print_siginfo()

Fri Dec 11 11:37:11 UTC 2015

Hi David,

...

>
>> On Solaris I found a curious behaviour, btw: there, when sending
>> synchronous error signals from the outside via kill (e.g. kill -fpe
>> <pid>), the vm immediately crashes with "Segmentation fault" as if no
>> signal handler were installed. On Linux (and afaik other Unices), there
>> is no difference between an internally generated and an externally sent
>> synchronous signal - in both cases our signal handler jumps in an
>> produces an hs-err pid. The only difference is that the si_code is
>> different.
>>
>> I tested this with and without my change, both sides behave the same, so
>> I think this is a Solaris speciality. Curious, nonetheless.
>>
>
> Does it depend on which thread actually has the signal delivered to it?
>
>
Don't know. I do not know which thread gets the signal delivered.

But you are right, that could be the reason. The signal could be masked out
for the thread, and the signal is a synchronous error signal. Which would
make the OS to immediately abort the process.

..Thomas

> David
>
> Thanks, Thomas
>>
>>
>> On Fri, Dec 11, 2015 at 2:49 AM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>>     Hi Thomas,
>>
>>     This all looks fine to me - just some minor nits below :). I'm
>>     running this through JPRT to see if everything works okay. Still
>>     need a second reviewer of course.
>>
>>     Nit: I find the notation:
>>
>>     const /* real-type */ void* siginfo
>>
>>     somewhat distracting - especially when we see siginfo being cast to
>>     the correct type within the next two lines.
>>
>>     ---
>>
>>     src/os/posix/vm/os_posix.cpp
>>
>>     Nit: os->print(" (me)"); me -> current process ?
>>
>>     ---
>>
>>     src/share/vm/utilities/vmError.cpp
>>
>>     Nit: CDS. -> CDS archive.
>>
>>     ---
>>
>>     src/share/vm/utilities/vmError.hpp
>>
>>     Nit: CDS store -> CDS archive
>>
>>
>>     Thanks,
>>     David
>>
>>     On 10/12/2015 12:22 AM, Thomas Stüfe wrote:
>>
>>         Hi David,
>>
>>         thanks for the review!
>>
>>         my new webrev:
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8144219-posix-os-print_siginfo/webrev.02/webrev/
>>
>>         On Tue, Dec 8, 2015 at 7:57 AM, David Holmes
>>         <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>>         <mailto:david.holmes at oracle.com
>>
>>         <mailto:david.holmes at oracle.com>>> wrote:
>>
>>              Hi Thomas,
>>
>>              Sorry for the delay.
>>
>>              On 30/11/2015 8:03 PM, Thomas Stüfe wrote:
>>
>>                  Hi all,
>>
>>                  please review this small change:
>>
>>
>>              Maybe not so small :)
>>
>>
>>         Well. I did name it (s), not (xxs) :)
>>
>>
>>
>>                  Webrev:
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8144219-posix-os-print_siginfo/webrev.00/webrev/
>>                  Bug Report:
>>         https://bugs.openjdk.java.net/browse/JDK-8144219
>>
>>                  This is mainly a cleanup change for posix platforms. It
>>         gets rid of
>>                  multiple copies of os::print_siginfo() in favour of the
>>         central
>>                  os::Posix::print_siginfo() in os_posix.cpp.
>>
>>
>>              Something seems not right. Currently vmError calls
>>              os::print_siginfo, and the per-posix-platform
>>         os::print_siginfo then
>>              call os::Posix::print_siginfo_brief. And now you have
>>              os::Posix::print_siginfo in os_posix.cpp, but that should be
>>              os::print_siginfo - no Posix namespace ??
>>
>>
>>         There is os::print_siginfo(outputStream*, void*), for which
>>         exist two
>>         implementations, one for Windows, one for Posix. In my first
>>         webrev I
>>         also kept os::Posix::print_siginfo(outputStream*, siginfo_t*),
>>         because I
>>         wanted a version which takes a real siginfo_t*.
>>         os::print_siginfo(outputStream*, void*) for posix just called
>>         os::Posix::print_siginfo(outputStream*, siginfo_t*).
>>
>>         To simplify things, I removed the os::Posix::print_siginfo(). Now
>> we
>>         just have os::print_siginfo(outputStream*, const void*) in two
>>         variants,
>>         one for windows, one for posix.
>>
>>         (I also made the pointer const for const correctness)
>>
>>                  It also extends os::Posix::print_siginfo() to write
>>         more details
>>                  about the
>>                  received signal which stems from our experiences with
>>         various
>>                  Unices over
>>                  the time. This will improve hs-err file printout.
>>
>>
>>              The old code was deliberately intended as a simple
>>         one-liner for the
>>              error report. But okay ....
>>
>>              987   os->print_cr("siginfo @ " PTR_FORMAT ":", p2i(si));
>>
>>              What is the value in printing the pointer to the siginfo_t ??
>>
>>              I personally prefer to see si_signo, si_code and si_errno to
>> be
>>              clear that what is printed are the fields of the siginfo_t.
>>
>>              1030 #ifdef _AIX
>>              1031   } else if (si->si_code == SI_EMPTY) {
>>              1032     os->print_cr("  Signal may have been sent by
>>         another thread.");
>>              1033 #endif
>>              1034 #ifdef __linux
>>              1035   } else if (si->si_code == SI_TKILL) {
>>              1036     os->print_cr("  Signal has been sent by another
>>         thread.");
>>              1037 #endif
>>              1038 #ifdef __sun
>>              1039   } else if (si->si_code == SI_LWP) {
>>              1040     os->print_cr("  Signal has been sent by another
>>         thread.");
>>              1041 #endif
>>              1042   } else if (sig == SIGSEGV || sig == SIGBUS) {
>>              1043     os->print_cr("  Address of faulting memory
>>         reference: "
>>              PTR_FORMAT, p2i(si->si_addr));
>>              1044   } else if (sig == SIGILL || sig == SIGFPE) {
>>              1045     os->print_cr("  Address of faulting instruction: "
>>              PTR_FORMAT, p2i(si->si_addr));
>>              1046   } else if (sig == SIGTRAP) {
>>              1047     os->print_cr("  Address of trap instruction: "
>>         PTR_FORMAT,
>>              p2i(si->si_addr));
>>              1048 #ifndef __APPLE__
>>              1049   } else if (sig == SIGPOLL) {
>>              1050     os->print_cr("  Band event: " PTR64_FORMAT,
>>              (uint64_t)si->si_band);
>>              1051 #endif
>>
>>              These should be conditional on the specific signal or value
>>         being
>>              defined, not platform specific. So the __APPLE__  goes back
>> to
>>              SIGPOLL (even if it is always defined you'll never receive
>>         it on
>>              platforms that don't use it). The others should check
>>         SI_EMPTY etc.
>>
>>              Note that there are a number of other possibilities just on
>>         Solaris:
>>
>>
>> https://docs.oracle.com/cd/E23823_01/html/816-5173/siginfo-3head.html
>>
>>              That said we don't need to give this much detail in my
>>         opinion. If
>>              you get a si_code of SI_LWP for example then you can go an
>>         look at
>>              the manpage to find out exactly what it means - we don't
>>         want to be
>>              maintaining that information here. So I'm not sure we really
>>              need/want this expanded level of detail.
>>
>>
>>         Okay. I think we (SAP) do not care so much about the size of the
>>         hs-err
>>         file and trade a bit more verbosity in return for giving more
>>         detailled
>>         information to our support people. But I can see your point, and
>>         it is a
>>         matter of preferences.
>>
>>         In the new version, I kept the output of os::print_siginfo very
>>         compact
>>         and on one line. We print si_signo, si_code unconditionally,
>>         si_errno if
>>         set, and then decide depending on the code whether to print
>>         si_addr or
>>         si_pid/si_uid or si_band.
>>
>>
>>                  Note that there was on all platforms code which
>>         examines the
>>                  crash address
>>                  for a possible CDS fault and prints a specialized hint.
>>         In my
>>                  opinion that
>>                  makes no sense in a general os::print_siginfo()
>>         function and I
>>                  moved it to
>>                  a separate error reporting step into vmError.cpp. This
>>         may be a
>>                  point of
>>                  contention because the way to extract the fault address
>>         from a
>>                  siginfo/EXCEPTION_RECORD structure is platform
>>         dependend, but
>>                  imho this is
>>                  still the least ugly solution.
>>
>>
>>              Sorry I don't like seeing the _WIN32 condition in the
>>         vmError.cpp
>>              code. By keeping this as part of the print_siginfo the
>> platform
>>              specific aspect is handled transparently. As something that
>>         is only
>>              used as part of error reporting, and which is directly
>>         related to a
>>              signal being received the current placement doesn't seem
>>              unreasonable. If you really want to factor this out then I
>>         would add
>>              os::check_for_failing_cds_access() or something like that.
>>
>>
>>         I really did not like this in os::print_siginfo(). One does not
>>         expect
>>         os::print_siginfo() to know about cds specialities, and to know
>>         that it
>>         only gets called in the context of error handling (who says this
>> is
>>         always true?)
>>
>>         But I agree the platform dependend code in vmError.cpp is ugly.
>>         I moved
>>         it to vmError_posix.cpp and vmError_windows.cpp respectively, and
>>         renamed it per your suggestion. I did not want to add this to
>>         the os::
>>         namespace because it is really a highly specialized function
>>         which only
>>         makes sense in error handling.
>>
>>         Thanks & Regards, Thomas
>>
>>              Thanks,
>>              David
>>              -----
>>
>>                  Kind regards, Thomas
>>
>>
>>
>>