RFR: 8304725: AsyncGetCallTrace can cause SIGBUS on M1 [v3]

Thomas Stuefe stuefe at openjdk.org
Thu Mar 23 11:49:48 UTC 2023


On Thu, 23 Mar 2023 08:36:59 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:

>> The best alternative to me is to take the perf hit and disable the caching when we're in forte (possibly only on Mac).
>
>> The best alternative to me is to take the perf hit and disable the caching when we're in forte (possibly only on Mac).
> 
> Sounds like a plan.

> > Reading @fisk excellent catch the async safety of stacking the wx raii mechanics, I retract my approval. This looks like a recipe for hard-to-find bugs :-/
> 
> Yes, this is my current thought too.
> 
> > Incidentally, do we see in the hs_err file whether async profiler is attached? We should maybe make that prominently visible.
> 
> We see ASGCT in the stack trace, but what exactly do you mean.
> 

For issues like this, you wouldn't necessarily have AGCT on the stack. Consider:
- Compiler gets invoked, write protects, compiles, then tries to restore write protection. Gets interrupted by AGCT after calling pthread_jit_write_protect_np but before updating Thread::_wx_state.
- Now AGCT runs. It disables write protection, does its thing, then reinstates the state *it thinks preceded it*. But that is the wrong state. In this case, the one used during compilation.
- we return from signal handling. Compiler now sets Thread::_wx_state and assumes write protection is restored, but it isnt.
- Later that day, the thread tries to call into compiled code. It will not be able to execute it, since the protection is wrong. 

There are variants of this play, but my point is the resulting crashes may happen after AGCT was invoked. 

So all we spy with our little eyes would be a segfault, I guess SEGV_ACCERR ?, and maybe the AGCT shared lib among the list of loaded libraries. 

In particular, we do not know if AGCT did interrupt the crashing thread recently. Or do we? This would be valuable information.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13144#issuecomment-1481044491


More information about the serviceability-dev mailing list