RFR: JDK-8282405: Make thread resource areas signal safe [v5]

Fri Mar 4 22:10:01 UTC 2022

On Thu, 3 Mar 2022 07:38:48 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> In the context of signal handlers, we may allocate RA memory. That is not ideal but may happen. One example is error reporting - even if we are careful, some code down the stack may use RA. Another example is code running in the context of AsyncGetCallTrace. I'm sure there may be more examples.
>> 
>> The problem is that the signal may (rarely) leave the current thread's RA in an inconsistent state, especially if it got interrupted in the middle of a chunk turnover. Subsequent allocations from it inside the signal handler then would malfunction.
>> 
>> A simple solution would be double buffering. Let each thread have a second resource area, to be used only in signal handling. At the entrance of the hotspot signal handler (which everyone goes through, even in chain scenarios like with AsyncGetCallTrace) we would switch over to the secondary resource area, and switch back when leaving the hotspot signal handler.
>> 
>> Note that I proposed this on hs-runtime-dev [1] but I am actually not sure if the mailing lists work, since I did not see that mail delivered to subscribers. Therefore I went ahead and implemented a simple prototype. 
>> 
>> The prototype keeps matters simple:
>> - we just use two resource areas: the normal one and an alternate one for signal handling. So we don't handle recursive calls to signal handlers, see comment in signals_posix.cpp.
>> - we preallocate both resource area at thread creation time. For the pros and cons of pre-allocating them vs creating them on demand, and possible further improvements, pls see [1].
>> 
>> Tests:
>> - SAP nightlies
>> - GHAs
>> - I tested this manually by corrupting the resource area of a thread, then faulting, and inside the signal handler, I was able to use the secondary resource area as expected.
>> - Automated tests are somewhat more difficult, akin to the existing SafeFetchInErrorHandlerTest. I am not sure if its worth the complexity.
>> 
>> [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2022-February/054126.html
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   ACGT: guard the correct (current) thread

It seems like this perhaps narrows but does not fix the problem.

What if we're in the Chunk allocator when a signal is received, and the signal
handler allocates enough that it needs another chunk? The Chunk allocator uses
ThreadCritical to protect it's critical regions, which does not make it
re-entrant either.

And if the Chunk allocator finds the associated ChunkPool free-list to be
empty, then we need to go to malloc, which has the same problem of not being
async-signal safe.

I'm also not thrilled by the idea of devoting 1K (or more, if we discover 1K
isn't enough for all the allocations we might need) for each thread for this
purpose.  That seems like quite a bit of memory being (mostly) wasted.

So basically I'm not convinced the proposed design is right yet.

One thing I was wondering (because I don't know the answer) is whether there
is any bound on the number of in-flight signals that might need to be handled
at the same time by different threads.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7624