RFR: JDK-8282405: Make thread resource areas signal safe [v5]
David Holmes
dholmes at openjdk.java.net
Fri Mar 4 06:38:05 UTC 2022
On Thu, 3 Mar 2022 07:38:48 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> In the context of signal handlers, we may allocate RA memory. That is not ideal but may happen. One example is error reporting - even if we are careful, some code down the stack may use RA. Another example is code running in the context of AsyncGetCallTrace. I'm sure there may be more examples.
>>
>> The problem is that the signal may (rarely) leave the current thread's RA in an inconsistent state, especially if it got interrupted in the middle of a chunk turnover. Subsequent allocations from it inside the signal handler then would malfunction.
>>
>> A simple solution would be double buffering. Let each thread have a second resource area, to be used only in signal handling. At the entrance of the hotspot signal handler (which everyone goes through, even in chain scenarios like with AsyncGetCallTrace) we would switch over to the secondary resource area, and switch back when leaving the hotspot signal handler.
>>
>> Note that I proposed this on hs-runtime-dev [1] but I am actually not sure if the mailing lists work, since I did not see that mail delivered to subscribers. Therefore I went ahead and implemented a simple prototype.
>>
>> The prototype keeps matters simple:
>> - we just use two resource areas: the normal one and an alternate one for signal handling. So we don't handle recursive calls to signal handlers, see comment in signals_posix.cpp.
>> - we preallocate both resource area at thread creation time. For the pros and cons of pre-allocating them vs creating them on demand, and possible further improvements, pls see [1].
>>
>> Tests:
>> - SAP nightlies
>> - GHAs
>> - I tested this manually by corrupting the resource area of a thread, then faulting, and inside the signal handler, I was able to use the secondary resource area as expected.
>> - Automated tests are somewhat more difficult, akin to the existing SafeFetchInErrorHandlerTest. I am not sure if its worth the complexity.
>>
>> [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2022-February/054126.html
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>
> ACGT: guard the correct (current) thread
The signal handler doesn't call foo(), it just happens to call some code that also uses the data structure X. If X expands while on the alt-RA and doesn't contract again, then X will have a chunk that still refers to the alt-RA when we have switched back to the primary-RA.
Ideally such things can never happen and the alt-RA is always unused by the time the signal handler returns, but without looking at the kind of RA usage that might occur from the signal handler, I can't say whether it will be clean this way or not.
JDK-8265150 is a great example as it shows that allocating from the RA in a signal handler via AGCT is fundamentally broken. Even with the alt-RA do we still not have the ThreadCritical problem?
I don't want the perfect to be the enemy of the "good enough" here, but do want to be sure the cost:benefit ratio is sufficiently small to make this worthwhile.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7624
More information about the hotspot-runtime-dev
mailing list