RFR 8004124: Handle and/or warn about SI_KERNEL

Mikael Gerdin mikael.gerdin at oracle.com
Thu Jun 20 13:59:57 PDT 2013


On 06/20/2013 10:34 PM, Coleen Phillimore wrote:
>
> All of the crashes I have seen with this symptom, especially addr=0 have
> been unconditionally the kernel's fault and we've tried to prove
> otherwise.  I did not see a case that wasn't the kernel's fault.   Do
> you have a case where the VM trashes a random value from memory and
> doesn't get a regular segv?

Not off the top of my head.
But I did write a c program that tried to dereference an address on 
non-canonical form and got the same combination of
SIGSEGV, SI_KERNEL, SI_ADDR=0

/Mikael

>
> Coleen
>
> On 06/20/2013 04:27 PM, Mikael Gerdin wrote:
>> Coleen,
>>
>> On 06/20/2013 04:48 PM, Coleen Phillimore wrote:
>>> Summary: Detect this crash in the signal handler and give a fatal error
>>> message  instead of making us chase down bugs that don't reproduce
>>>
>>> This change also has more information for crash site from bug
>>> https://jbs.oracle.com/bugs/browse/JDK-8007019
>>>
>>> guarantee(cb->is_adapter_blob() || cb->is_method_handles_adapter_blob())
>>> failed: exception happened outside interpreter, nmethods and vtable
>>> stubs (1) <https://jbs.oracle.com/bugs/browse/JDK-8007019>
>>>
>>> There used to be two places that had the same message so they were
>>> qualified by (1) and (2).   The second one is gone.   Now this prints
>>> the blob and pc.
>>>
>>> Tested with full vm.quick.testlist and the sets of jdi tests that failed
>>> with -client -Xcomp and specjvm98 that used to fail with this signal
>>> code.   I got one failure two days ago before this change but now it
>>> won't fail with my new message or at all.
>>
>> The error message you added for SI_KERNEL puts the blame
>> unconditionally on the kernel.
>> As I mentioned in the bug it's possible to cause this signal
>> combination by trying to access memory with an invalid memory address
>> on non-canonical form:
>> https://en.wikipedia.org/wiki/X86-64#Canonical_form_addresses
>>
>> (sorry for the wikipedia link, I don't have the Intel X86_64 manual
>> page reference at hand)
>>
>> Basically, if we trash an object somewhere or the compiler does
>> something strange we may try to use a random value from memory as an
>> address and if that address is on non-canonical form we'll say that
>> the OS is broken when in fact it is probably our fault.
>>
>> /Mikael
>>
>>>
>>> open webrev at http://cr.openjdk.java.net/~coleenp/8004124/
>>> bug link at http://bugs.sun.com/view_bug.do?bug_id=8004124
>>> local bug link https://jbs.oracle.com/bugs/browse/JDK-8004124
>>>
>>> Thanks,
>>> Coleen
>>
>



More information about the hotspot-runtime-dev mailing list