RFR(s): 8144219: [posix] Remove redundant code around os::print_siginfo()
Thomas Stüfe
thomas.stuefe at gmail.com
Fri Dec 11 11:37:11 UTC 2015
Hi David,
...
>
>> On Solaris I found a curious behaviour, btw: there, when sending
>> synchronous error signals from the outside via kill (e.g. kill -fpe
>> <pid>), the vm immediately crashes with "Segmentation fault" as if no
>> signal handler were installed. On Linux (and afaik other Unices), there
>> is no difference between an internally generated and an externally sent
>> synchronous signal - in both cases our signal handler jumps in an
>> produces an hs-err pid. The only difference is that the si_code is
>> different.
>>
>> I tested this with and without my change, both sides behave the same, so
>> I think this is a Solaris speciality. Curious, nonetheless.
>>
>
> Does it depend on which thread actually has the signal delivered to it?
>
>
Don't know. I do not know which thread gets the signal delivered.
But you are right, that could be the reason. The signal could be masked out
for the thread, and the signal is a synchronous error signal. Which would
make the OS to immediately abort the process.
..Thomas
> David
>
> Thanks, Thomas
>>
>>
>> On Fri, Dec 11, 2015 at 2:49 AM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>> Hi Thomas,
>>
>> This all looks fine to me - just some minor nits below :). I'm
>> running this through JPRT to see if everything works okay. Still
>> need a second reviewer of course.
>>
>> Nit: I find the notation:
>>
>> const /* real-type */ void* siginfo
>>
>> somewhat distracting - especially when we see siginfo being cast to
>> the correct type within the next two lines.
>>
>> ---
>>
>> src/os/posix/vm/os_posix.cpp
>>
>> Nit: os->print(" (me)"); me -> current process ?
>>
>> ---
>>
>> src/share/vm/utilities/vmError.cpp
>>
>> Nit: CDS. -> CDS archive.
>>
>> ---
>>
>> src/share/vm/utilities/vmError.hpp
>>
>> Nit: CDS store -> CDS archive
>>
>>
>> Thanks,
>> David
>>
>> On 10/12/2015 12:22 AM, Thomas Stüfe wrote:
>>
>> Hi David,
>>
>> thanks for the review!
>>
>> my new webrev:
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8144219-posix-os-print_siginfo/webrev.02/webrev/
>>
>> On Tue, Dec 8, 2015 at 7:57 AM, David Holmes
>> <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>> <mailto:david.holmes at oracle.com
>>
>> <mailto:david.holmes at oracle.com>>> wrote:
>>
>> Hi Thomas,
>>
>> Sorry for the delay.
>>
>> On 30/11/2015 8:03 PM, Thomas Stüfe wrote:
>>
>> Hi all,
>>
>> please review this small change:
>>
>>
>> Maybe not so small :)
>>
>>
>> Well. I did name it (s), not (xxs) :)
>>
>>
>>
>> Webrev:
>>
>> http://cr.openjdk.java.net/~stuefe/webrevs/8144219-posix-os-print_siginfo/webrev.00/webrev/
>> Bug Report:
>> https://bugs.openjdk.java.net/browse/JDK-8144219
>>
>> This is mainly a cleanup change for posix platforms. It
>> gets rid of
>> multiple copies of os::print_siginfo() in favour of the
>> central
>> os::Posix::print_siginfo() in os_posix.cpp.
>>
>>
>> Something seems not right. Currently vmError calls
>> os::print_siginfo, and the per-posix-platform
>> os::print_siginfo then
>> call os::Posix::print_siginfo_brief. And now you have
>> os::Posix::print_siginfo in os_posix.cpp, but that should be
>> os::print_siginfo - no Posix namespace ??
>>
>>
>> There is os::print_siginfo(outputStream*, void*), for which
>> exist two
>> implementations, one for Windows, one for Posix. In my first
>> webrev I
>> also kept os::Posix::print_siginfo(outputStream*, siginfo_t*),
>> because I
>> wanted a version which takes a real siginfo_t*.
>> os::print_siginfo(outputStream*, void*) for posix just called
>> os::Posix::print_siginfo(outputStream*, siginfo_t*).
>>
>> To simplify things, I removed the os::Posix::print_siginfo(). Now
>> we
>> just have os::print_siginfo(outputStream*, const void*) in two
>> variants,
>> one for windows, one for posix.
>>
>> (I also made the pointer const for const correctness)
>>
>> It also extends os::Posix::print_siginfo() to write
>> more details
>> about the
>> received signal which stems from our experiences with
>> various
>> Unices over
>> the time. This will improve hs-err file printout.
>>
>>
>> The old code was deliberately intended as a simple
>> one-liner for the
>> error report. But okay ....
>>
>> 987 os->print_cr("siginfo @ " PTR_FORMAT ":", p2i(si));
>>
>> What is the value in printing the pointer to the siginfo_t ??
>>
>> I personally prefer to see si_signo, si_code and si_errno to
>> be
>> clear that what is printed are the fields of the siginfo_t.
>>
>> 1030 #ifdef _AIX
>> 1031 } else if (si->si_code == SI_EMPTY) {
>> 1032 os->print_cr(" Signal may have been sent by
>> another thread.");
>> 1033 #endif
>> 1034 #ifdef __linux
>> 1035 } else if (si->si_code == SI_TKILL) {
>> 1036 os->print_cr(" Signal has been sent by another
>> thread.");
>> 1037 #endif
>> 1038 #ifdef __sun
>> 1039 } else if (si->si_code == SI_LWP) {
>> 1040 os->print_cr(" Signal has been sent by another
>> thread.");
>> 1041 #endif
>> 1042 } else if (sig == SIGSEGV || sig == SIGBUS) {
>> 1043 os->print_cr(" Address of faulting memory
>> reference: "
>> PTR_FORMAT, p2i(si->si_addr));
>> 1044 } else if (sig == SIGILL || sig == SIGFPE) {
>> 1045 os->print_cr(" Address of faulting instruction: "
>> PTR_FORMAT, p2i(si->si_addr));
>> 1046 } else if (sig == SIGTRAP) {
>> 1047 os->print_cr(" Address of trap instruction: "
>> PTR_FORMAT,
>> p2i(si->si_addr));
>> 1048 #ifndef __APPLE__
>> 1049 } else if (sig == SIGPOLL) {
>> 1050 os->print_cr(" Band event: " PTR64_FORMAT,
>> (uint64_t)si->si_band);
>> 1051 #endif
>>
>> These should be conditional on the specific signal or value
>> being
>> defined, not platform specific. So the __APPLE__ goes back
>> to
>> SIGPOLL (even if it is always defined you'll never receive
>> it on
>> platforms that don't use it). The others should check
>> SI_EMPTY etc.
>>
>> Note that there are a number of other possibilities just on
>> Solaris:
>>
>>
>> https://docs.oracle.com/cd/E23823_01/html/816-5173/siginfo-3head.html
>>
>> That said we don't need to give this much detail in my
>> opinion. If
>> you get a si_code of SI_LWP for example then you can go an
>> look at
>> the manpage to find out exactly what it means - we don't
>> want to be
>> maintaining that information here. So I'm not sure we really
>> need/want this expanded level of detail.
>>
>>
>> Okay. I think we (SAP) do not care so much about the size of the
>> hs-err
>> file and trade a bit more verbosity in return for giving more
>> detailled
>> information to our support people. But I can see your point, and
>> it is a
>> matter of preferences.
>>
>> In the new version, I kept the output of os::print_siginfo very
>> compact
>> and on one line. We print si_signo, si_code unconditionally,
>> si_errno if
>> set, and then decide depending on the code whether to print
>> si_addr or
>> si_pid/si_uid or si_band.
>>
>>
>> Note that there was on all platforms code which
>> examines the
>> crash address
>> for a possible CDS fault and prints a specialized hint.
>> In my
>> opinion that
>> makes no sense in a general os::print_siginfo()
>> function and I
>> moved it to
>> a separate error reporting step into vmError.cpp. This
>> may be a
>> point of
>> contention because the way to extract the fault address
>> from a
>> siginfo/EXCEPTION_RECORD structure is platform
>> dependend, but
>> imho this is
>> still the least ugly solution.
>>
>>
>> Sorry I don't like seeing the _WIN32 condition in the
>> vmError.cpp
>> code. By keeping this as part of the print_siginfo the
>> platform
>> specific aspect is handled transparently. As something that
>> is only
>> used as part of error reporting, and which is directly
>> related to a
>> signal being received the current placement doesn't seem
>> unreasonable. If you really want to factor this out then I
>> would add
>> os::check_for_failing_cds_access() or something like that.
>>
>>
>> I really did not like this in os::print_siginfo(). One does not
>> expect
>> os::print_siginfo() to know about cds specialities, and to know
>> that it
>> only gets called in the context of error handling (who says this
>> is
>> always true?)
>>
>> But I agree the platform dependend code in vmError.cpp is ugly.
>> I moved
>> it to vmError_posix.cpp and vmError_windows.cpp respectively, and
>> renamed it per your suggestion. I did not want to add this to
>> the os::
>> namespace because it is really a highly specialized function
>> which only
>> makes sense in error handling.
>>
>> Thanks & Regards, Thomas
>>
>> Thanks,
>> David
>> -----
>>
>> Kind regards, Thomas
>>
>>
>>
>>
More information about the hotspot-runtime-dev
mailing list