RFR: JDK-8283326: Implement SafeFetch statically [v3]

Martin Doerr mdoerr at openjdk.java.net
Tue Apr 12 14:19:37 UTC 2022


On Tue, 12 Apr 2022 13:24:09 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Hi,
>> 
>> This is a reimplementation of SafeFetch to get rid of dynamic code generation. This is done, depending on platform, via static assembly, Posix setjmp/longjmp or SEH.
>> 
>> This is based on a proposal by @theRealAph [1] when discussing [JDK-8282475](https://bugs.openjdk.java.net/browse/JDK-8282475). I extended the proposal to do this for all platforms, not just MacOS, see discussion below.
>> 
>> ---
>> 
>> Why we do this:
>> 
>> SafeFetch is special: it is a basic utility function that may be called before the VM is properly initialized, from a "foreign" thread not attached to the VM, in a broken environment... mainly this is due to its use during stack walking, error reporting, hex dumping, etc. SafeFetch has to be safe to call under any circumstances - or as safe as we can make it.
>> 
>> Today, SafeFetch is implemented via stub routines. That causes problems:
>> 
>> - MacOs aarch64 has mechanisms to prevent execution of arbitrary non-text memory. On that platform, invoking generated code requires some awkward gymnastics [2]. These workarounds require access to Thread::current, which renders SafeFetch non-functional if Thread::current is NULL (e.g. crash reporting in "foreign" native threads or in a corrupted VM). They also cost a bit of performance on every SafeFetch invocation. Note that more OSes may implement similar execution prevention mechanics in the future.
>> 
>> - There is a time window during VM initialization where the stubs are not generated yet and thus SafeFetch is not available.
>> 
>> Both points cause hs-err files to be compromised (e.g. missing call stacks) for crashes in native threads, when the VM is corrupted, or early during VM initialization. Since SafeFetch is such a basic tool, it causes more issues than that (e.g., on AIX it is used for verifying mprotect() itself). But hs-err files being compromised by secondary errors is the most annoying effect.
>> 
>> A minor point is that generating stubs takes a tiny bit of startup time. Not a big issue, but anything we can generate at build time instead of at runtime is good.
>> 
>> ---
>> 
>> The patch:
>> 
>> - Removes the SafeFetch stub routine generation for all platforms
>> - For Linux and MacOS, it implements SafeFetch as cpu-specific static assembler routines. These routines live in `os_cpu/<os>_<cpu>/safefetch_<os>_<cpu>.S`.
>> - On Windows, SafeFetch is implemented via SEH. This is beautifully simple and does not need handwritten assembly at all. See `os/windows/safefetch_windows.hpp`.
>> - On Zero, SafeFetch uses a Posix compatible solution using `sigsetjmp(2)`. This is portable and robust, but a bit slow compared to the other methods. Note that this solution existed before, and is well tested. I just moved the code around and massaged it a bit.
>> - AIX uses now the same method as Zero. AIX "downgraded" in that sense from the stub routine solution, but since SAP gave up maintenance of AIX, I don't have easy access to AIX hardware. If SafeFetch speed becomes an issue, I am sure a solution based on static assembly can be implemented on AIX too.
>> - The SafeFetch gtests were greatly expanded.
>> 
>> Tests:
>> 
>> - The patch was tested manually by building and running gtests on:
>>   - MacOS (x64, aarch64)
>>   - Linux (aarch64, arm, ppcle, s390, x64, x86, riscv)
>>   - Windows (x64 and x86, but not arm, see below)
>>   - AIX
>>   - Zero (on Linux x64).
>> - GHAs
>> - The patch ran through SAP nightlies for several nights in a row
>> 
>> Note:
>> - I have no access to Windows arm, so I could not test it. I contacted @mo-beck  to verify that the patch works there.
>> - I have no access to BSD. I am not sure who maintains BSD, so I have no idea who to contact. However, I believe the MacOS solution should work for BSD too. If not, BSD can simply fall back to the Posix sigjmp solution like AIX and Zero.
>> 
>> It would be nice to have @RealLucy, @TheRealMDoerr and @theRealAph to okay s390, ppcle and aarch64 (and possibly arm32). Each assembly file is tiny, so it should not take long. @RealFYang and @backwaterred already tested the patch on RiscV and AIX (thanks).
>> 
>> Thanks, Thomas
>> 
>> [1] https://github.com/openjdk/jdk/pull/7727#issuecomment-1064980598
>> [2] https://github.com/openjdk/jdk/blob/4df67426ed02f18af0757897acb28b636a317a91/src/hotspot/share/runtime/safefetch.inline.hpp#L37-L41
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove SAFEFETCH_DEFAULT

It's very unfortunate that such a simple task requires so much platform code. Would be nice if the Windows implementation would work everywhere :-)
I'm not aware of a simple implementation which works reliably on all platforms, so LGTM.
Feel free to fix indentation.
Would it make sense to add a test case which uses a negative int as default value for SafeFetch32? Just to make sure the signed int gets passed correctly through the assembler stubs and reaches C++ code without loss of sign extension.

src/hotspot/os_cpu/bsd_aarch64/safefetch_bsd_aarch64.S line 47:

> 45:     #  w1 : defaultval
> 46: 
> 47: 	# needed to align function start to 4 byte

indentation

src/hotspot/os_cpu/bsd_x86/safefetch_bsd_x86_64.S line 47:

> 45:     #  %rdi : address
> 46:     #  %esi : defaultval
> 47: 	ELF_TYPE(SafeFetch32, at function)

indentation

src/hotspot/os_cpu/linux_x86/safefetch_linux_x86_32.S line 34:

> 32:     #  8(%esp) : default value
> 33:     #  4(%esp) : crash address
> 34: 	#  0(%esp) : return pc

indentation

-------------

Marked as reviewed by mdoerr (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7865


More information about the hotspot-dev mailing list