RFR: JDK-8283326: Implement SafeFetch statically

Thomas Stuefe stuefe at openjdk.java.net
Mon Apr 11 09:07:36 UTC 2022


On Mon, 11 Apr 2022 05:52:15 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Hi,
>> 
>> This is a reimplementation of SafeFetch to get rid of dynamic code generation. This is done, depending on platform, via static assembly, Posix setjmp/longjmp or SEH.
>> 
>> This is based on a proposal by @theRealAph [1] when discussing [JDK-8282475](https://bugs.openjdk.java.net/browse/JDK-8282475). I extended the proposal to do this for all platforms, not just MacOS, see discussion below.
>> 
>> ---
>> 
>> Why we do this:
>> 
>> SafeFetch is special: it is a basic utility function that may be called before the VM is properly initialized, from a "foreign" thread not attached to the VM, in a broken environment... mainly this is due to its use during stack walking, error reporting, hex dumping, etc. SafeFetch has to be safe to call under any circumstances - or as safe as we can make it.
>> 
>> Today, SafeFetch is implemented via stub routines. That causes problems:
>> 
>> - MacOs aarch64 has mechanisms to prevent execution of arbitrary non-text memory. On that platform, invoking generated code requires some awkward gymnastics [2]. These workarounds require access to Thread::current, which renders SafeFetch non-functional if Thread::current is NULL (e.g. crash reporting in "foreign" native threads or in a corrupted VM). They also cost a bit of performance on every SafeFetch invocation. Note that more OSes may implement similar execution prevention mechanics in the future.
>> 
>> - There is a time window during VM initialization where the stubs are not generated yet and thus SafeFetch is not available.
>> 
>> Both points cause hs-err files to be compromised (e.g. missing call stacks) for crashes in native threads, when the VM is corrupted, or early during VM initialization. Since SafeFetch is such a basic tool, it causes more issues than that (e.g., on AIX it is used for verifying mprotect() itself). But hs-err files being compromised by secondary errors is the most annoying effect.
>> 
>> A minor point is that generating stubs takes a tiny bit of startup time. Not a big issue, but anything we can generate at build time instead of at runtime is good.
>> 
>> ---
>> 
>> The patch:
>> 
>> - Removes the SafeFetch stub routine generation for all platforms
>> - For Linux and MacOS, it implements SafeFetch as cpu-specific static assembler routines. These routines live in `os_cpu/<os>_<cpu>/safefetch_<os>_<cpu>.S`.
>> - On Windows, SafeFetch is implemented via SEH. This is beautifully simple and does not need handwritten assembly at all. See `os/windows/safefetch_windows.hpp`.
>> - On Zero, SafeFetch uses a Posix compatible solution using `sigsetjmp(2)`. This is portable and robust, but a bit slow compared to the other methods. Note that this solution existed before, and is well tested. I just moved the code around and massaged it a bit.
>> - AIX uses now the same method as Zero. AIX "downgraded" in that sense from the stub routine solution, but since SAP gave up maintenance of AIX, I don't have easy access to AIX hardware. If SafeFetch speed becomes an issue, I am sure a solution based on static assembly can be implemented on AIX too.
>> - The SafeFetch gtests were greatly expanded.
>> 
>> Tests:
>> 
>> - The patch was tested manually by building and running gtests on:
>>   - MacOS (x64, aarch64)
>>   - Linux (aarch64, arm, ppcle, s390, x64, x86, riscv)
>>   - Windows (x64 and x86, but not arm, see below)
>>   - AIX
>>   - Zero (on Linux x64).
>> - GHAs
>> - The patch ran through SAP nightlies for several nights in a row
>> 
>> Note:
>> - I have no access to Windows arm, so I could not test it. I contacted @mo-beck  to verify that the patch works there.
>> - I have no access to BSD. I am not sure who maintains BSD, so I have no idea who to contact. However, I believe the MacOS solution should work for BSD too. If not, BSD can simply fall back to the Posix sigjmp solution like AIX and Zero.
>> 
>> It would be nice to have @RealLucy, @TheRealMDoerr and @theRealAph to okay s390, ppcle and aarch64 (and possibly arm32). Each assembly file is tiny, so it should not take long. @RealFYang and @backwaterred already tested the patch on RiscV and AIX (thanks).
>> 
>> Thanks, Thomas
>> 
>> [1] https://github.com/openjdk/jdk/pull/7727#issuecomment-1064980598
>> [2] https://github.com/openjdk/jdk/blob/4df67426ed02f18af0757897acb28b636a317a91/src/hotspot/share/runtime/safefetch.inline.hpp#L37-L41
>
> src/hotspot/os/posix/safefetch_sigjmp.cpp line 35:
> 
>> 33: 
>> 34: // For SafeFetch we need POSIX TLS and sigsetjmp/longjmp.
>> 35: // (Note: We would prefer compiler level TLS but for some reason __thread does not
> 
> __thread cannot be safely used from a signal handler.

Ah, that explains the weird errors I got. I'll change the comment.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7865


More information about the hotspot-dev mailing list