RFR: JDK-8282405: Make thread resource areas signal safe [v5]

Fri Mar 11 07:03:43 UTC 2022

On Thu, 3 Mar 2022 07:38:48 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> In the context of signal handlers, we may allocate RA memory. That is not ideal but may happen. One example is error reporting - even if we are careful, some code down the stack may use RA. Another example is code running in the context of AsyncGetCallTrace. I'm sure there may be more examples.
>> 
>> The problem is that the signal may (rarely) leave the current thread's RA in an inconsistent state, especially if it got interrupted in the middle of a chunk turnover. Subsequent allocations from it inside the signal handler then would malfunction.
>> 
>> A simple solution would be double buffering. Let each thread have a second resource area, to be used only in signal handling. At the entrance of the hotspot signal handler (which everyone goes through, even in chain scenarios like with AsyncGetCallTrace) we would switch over to the secondary resource area, and switch back when leaving the hotspot signal handler.
>> 
>> Note that I proposed this on hs-runtime-dev [1] but I am actually not sure if the mailing lists work, since I did not see that mail delivered to subscribers. Therefore I went ahead and implemented a simple prototype. 
>> 
>> The prototype keeps matters simple:
>> - we just use two resource areas: the normal one and an alternate one for signal handling. So we don't handle recursive calls to signal handlers, see comment in signals_posix.cpp.
>> - we preallocate both resource area at thread creation time. For the pros and cons of pre-allocating them vs creating them on demand, and possible further improvements, pls see [1].
>> 
>> Tests:
>> - SAP nightlies
>> - GHAs
>> - I tested this manually by corrupting the resource area of a thread, then faulting, and inside the signal handler, I was able to use the secondary resource area as expected.
>> - Automated tests are somewhat more difficult, akin to the existing SafeFetchInErrorHandlerTest. I am not sure if its worth the complexity.
>> 
>> [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2022-February/054126.html
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   ACGT: guard the correct (current) thread

> 

> Hi Thomas,
> 
> On 9/03/2022 7:15 pm, Thomas Stuefe wrote:
> 
> > I put this back into draft for further refinement.
> > @kimbarrett , @dholmes-ora :
> > Atm I tend toward a solution as laid out in my comment https://github.com/[/pull/7624](https://github.com/openjdk/jdk/pull/7624)#issuecomment-1059713107.
> > Condensed:
> > The secondary "safe" ResourceArea would be one that has a single chunk and cannot grow more chunks. That chunk would be either (a) allocated with raw ::malloc(), on demand only on first RA use, or (b) just use an array placed on the stack in the frame that entered ACGT.
> 
> Sorry but how can you constrain the new RA to only a single chunk? What if it needs to grow? IIUC we don't know exactly what RA allocation may be attempted in the signal context via AGCT or other code.
> 

The idea is that even though we don't know exactly who or what uses RA, we can assume that it will not need a ton of memory. That holds def. true for error handling, and probably for ACGT too. The culprits are usually small utility functions, tightly guarded by ResourceMarks. So we should be able to safely get by with a single 64k or 128k chunk. And if we still see problems, those can be solved by enlarging the chunk or placing tighter resource marks.

It's a pragmatic way to defuse the problem. We'd go from "forbid RA under any circumstances" to covering ~95% of use cases.

> And apologies but I'm about to disappear for a week on vacation.
> 

No problem, have a good time!

> Cheers, David -----

-------------

PR: https://git.openjdk.java.net/jdk/pull/7624