RFR: JDK-8282405: Make thread resource areas signal safe [v5]

Fri Mar 4 06:05:06 UTC 2022

On Thu, 3 Mar 2022 07:38:48 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> In the context of signal handlers, we may allocate RA memory. That is not ideal but may happen. One example is error reporting - even if we are careful, some code down the stack may use RA. Another example is code running in the context of AsyncGetCallTrace. I'm sure there may be more examples.
>> 
>> The problem is that the signal may (rarely) leave the current thread's RA in an inconsistent state, especially if it got interrupted in the middle of a chunk turnover. Subsequent allocations from it inside the signal handler then would malfunction.
>> 
>> A simple solution would be double buffering. Let each thread have a second resource area, to be used only in signal handling. At the entrance of the hotspot signal handler (which everyone goes through, even in chain scenarios like with AsyncGetCallTrace) we would switch over to the secondary resource area, and switch back when leaving the hotspot signal handler.
>> 
>> Note that I proposed this on hs-runtime-dev [1] but I am actually not sure if the mailing lists work, since I did not see that mail delivered to subscribers. Therefore I went ahead and implemented a simple prototype. 
>> 
>> The prototype keeps matters simple:
>> - we just use two resource areas: the normal one and an alternate one for signal handling. So we don't handle recursive calls to signal handlers, see comment in signals_posix.cpp.
>> - we preallocate both resource area at thread creation time. For the pros and cons of pre-allocating them vs creating them on demand, and possible further improvements, pls see [1].
>> 
>> Tests:
>> - SAP nightlies
>> - GHAs
>> - I tested this manually by corrupting the resource area of a thread, then faulting, and inside the signal handler, I was able to use the secondary resource area as expected.
>> - Automated tests are somewhat more difficult, akin to the existing SafeFetchInErrorHandlerTest. I am not sure if its worth the complexity.
>> 
>> [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2022-February/054126.html
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   ACGT: guard the correct (current) thread

Hi David,

> Hi Thomas,
> 
> I have a remaining concern. It seemed to me that after we switch the RA we should always have a ResourceMark to ensure that the secondary RA is cleared by the time we switch back. Or else we should be able to check it is "empty" before switching back. But that lead to a second concern - which really relates to how RA allocation works in general. Consider this hypothetical scenario:
> 
> ```
> void foo() {
>   ResourceMark rm;
>   bar();   // allocates in RA for some data structure X
>   baz();  //  Also allocates for X causing X to expand in RA
> }
> ```
> 
> now suppose a signal hits before `baz()`, and so we switch the the secondary RA. The signal handler invokes code that also leads to use of X and causes X expand, but now in the secondary RA. When the signal handler returns X has "dangling pointers" into the secondary RA.
> 
> In short you can't know that RA allocation in the secondary area is really safe without knowing what code is going to do the allocation and how that interacts with existing ResourceMarks on the stack. If you add a ResourceMark when switching to the secondary RA then you may cause a "dangling pointer", and if you don't then you may cause a leak in the secondary RA.
>

I don't understand. The signal handler, if it calls foo(), executes foo(), from start to finish, then returns. With RA1 as primary, RA2 as secondary area I see it like this:

                         RA1          RM1              RA1           RM2

                         n            -                0             -

--Enter foo()--
ResourceMark()           n            ->RA1[n]         0             - 
bar() alloc X            n+X          ->RA1[n]         0             -
--Signal!--                       
--Enter foo()--
ResourceMark()           n+X          ->RA1[n]         0             ->RA2[0]
bar() alloc X            n+X          ->RA1[n]         0+X           ->RA2[0]
baZ() alloc X            n+X          ->RA1[n]         0+X+X         ->RA2[0]
~ResourceMark()          n+X          ->RA1[n]         0             -
--Signal handler returns--                       
baZ() alloc X            n+X+X        ->RA1[n]         0             -
~ResourceMark()          n            -                0             -

In other words, entering the handler causes us to switch stacks, very similar to what one could do with `sigaltstack(3)`. We set aside the primary RA and all its marks and switch to the secondary RA. We don't switch back to the primary RA without completely unrolling the secondary RA and all its marks.

I do agree though that when switching to the secondary RA we need a Mark. Otherwise subsequent invocations of a signal handler would just accumulate memory in the secondary RA.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7624