RFR: 8294003: Don't handle si_addr == 0 && si_code == SI_KERNEL SIGSEGVs [v2]
Thomas Stuefe
stuefe at openjdk.org
Thu Sep 22 13:06:30 UTC 2022
On Thu, 22 Sep 2022 06:51:05 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:
>> We have this code code in our signal handler:
>>
>>
>> #ifndef AMD64
>> // Halt if SI_KERNEL before more crashes get misdiagnosed as Java bugs
>> // This can happen in any running code (currently more frequently in
>> // interpreter code but has been seen in compiled code)
>> if (sig == SIGSEGV && info->si_addr == 0 && info->si_code == SI_KERNEL) {
>> fatal("An irrecoverable SI_KERNEL SIGSEGV has occurred due "
>> "to unstable signal handling in this distribution.");
>> }
>> #endif // AMD64
>>
>>
>> This bug added that change:
>> https://bugs.openjdk.java.net/browse/JDK-8004124
>>
>> In the Generational ZGC we hit the exact same condition whenever we try to (incorrectly) dereference one of our colored pointers. From the bug above:
>>
>> "A segmentation violation that occurs as a result of userspace process accessing virtual memory above the TASK_SIZE limit will cause a segmentation violation with an si_code of SI_KERNEL"
>>
>> That is, if we have set high-order bits (past the TASK_SIZE limit), we get these kind of SIGSEGVs.
>>
>> As the signal handle code is written today, we don't "stop" this signal, and instead try to handle it as an implicit null check. This causes hard-to-debug error messages and crashes in code that incorrectly try to deoptimize the faulty code.
>>
>> I propose that we short-cut the signal handling code, and let this problematic SIGSEGV get passed to VMError::report_and_die.
>>
>> We've been running with this patch in the Generational ZGC repository for over a year, without any problems.
>
> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision:
>
> Remove 32-bit error handling
A small nit remains: why do we even need this section at all?
We get a SIGSEGV with si_addr=0. VM assumes this to be an implicit null check if signal==SIGSEGV and the PC makes sense (interpreter or code blob or vtable stub).
Would it not be cleaner to add a check at that point for info->si_code != SI_KERNEL? E.g. before calling SharedRuntime::continuation_for_implicit_exception?
-------------
PR: https://git.openjdk.org/jdk/pull/10340
More information about the hotspot-dev
mailing list