RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die()
Christian Hagedorn
chagedorn at openjdk.org
Tue Mar 14 18:18:37 UTC 2023
On Tue, 14 Mar 2023 09:04:51 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:
>> @dholmes-ora My proposal would be to be pragmatic and continue the discussion here. This has been going on too long. Switching channels again would be more confusing.
>>
>> I would prefer JBS or ML discussion as well, but not everyone (eg. @jsolomon8080) has write access to JBS.
>>
>> ----
>>
>> I thought this over some more.
>>
>> A big part of this problem is a plain bug in VMError::check_timeout(): it should never block *step timeouts*. We added step timeouts back in 2016 to deal with these kinds of problems: if the JVM hangs in error reporting, give it a kick to get it going again. Do so repeatedly. This is related to the global timeout, but works also independently from it (and therefore from the question of when to honor the global timeout).
>>
>> I opened https://bugs.openjdk.org/browse/JDK-8303861 to deal with this, see PR: https://github.com/openjdk/jdk/pull/12936
>>
>> This may already be a big help for cases like these. In fact, it may already be enough, and we could maybe close this issue.
>>
>> However, if we have a recursive malloc situation, we may hang repeatedly. JVM will kick itself alive every time (with my patch) but this is still annoying. The root problem here is that we should not use malloc during error handling. Cannot always be avoided, but at least we should minimize malloc use.
>>
>> Decoder, in particular, should not use malloc. Therefore I also opened https://bugs.openjdk.org/browse/JDK-8303862 to track that. I won't have time to work on that. I have the hope that maybe @chhagedorn can :-) ? Otherwise Azul may also chip in some bug fixing.
>>
>> ----
>>
>> All these are unrelated to the question of whether OnError should be blocked or not. I realize now that if we decide to (continue to) protect OnError from timeouts, we must never act on the global timeout until all OnError steps ran. Since these run right before VM exit, the original implementation that just blocked the global timeout altogether was actually right.
>>
>> So, there's that. Maybe JDK-8303861 is already enough for cases like this. At least JDK-8303861 does not introduce any backward compatibility issues.
>
>> Decoder, in particular, should not use malloc. Therefore I also opened https://bugs.openjdk.org/browse/JDK-8303862 to track that. I won't have time to work on that. I have the hope that maybe @chhagedorn can :-) ?
>
> Does any of our current allocation strategies allow the usage of a custom scratch buffer (i.e. the error reporting scratch buffer) as memory location? If not, it could get more complicated. I could still try to tackle that but I'm not sure if I have the necessary knowledge in that area.
> > > Decoder, in particular, should not use malloc. Therefore I also opened https://bugs.openjdk.org/browse/JDK-8303862 to track that. I won't have time to work on that. I have the hope that maybe @chhagedorn can :-) ?
> >
> >
> > Does any of our current allocation strategies allow the usage of a custom scratch buffer (i.e. the error reporting scratch buffer) as memory location? If not, it could get more complicated. I could still try to tackle that but I'm not sure if I have the necessary knowledge in that area.
>
> What we usually do for these kind of problems is to pass the scratch buffer via argument to the processing functions. And make that use either it or, if NULL was passes, allocate its own thing.
>
> Another way to do this would be to add a scratch buffer pointer to Thread.
>
> A third way to do this (I had been experimenting with it) would be to pre-allocate a scratch buffer, and once error handling began, to use that inside os::malloc. That is the most involved solution, though, and I'm not particularly fond of it. I know many reviewers would hate it, too :-)
Thanks for the summary. Could we also use "placement new" to use the scratch buffer to allocate and create objects to? For example, when creating a decoder, to use something like
decoder = new(scratch_buffer) ElfDecoder();
here:
https://github.com/openjdk/jdk/blob/4e631fa43fd821846c12ae2177360c44cf770766/src/hotspot/share/utilities/decoder.cpp#L62-L70
Then we can still utilize polymorphism. But then `AbstractDecoder` needs to be something else than `CHeapObj`. And of course, this approach also needs careful checking that a new object actually fits into the provided scratch buffer. On top of that, it gets more complicated when we want to keep multiple objects alive in the same scratch buffer. So, I'm not sure if this approach is feasible, though.
-------------
PR: https://git.openjdk.org/jdk/pull/12925
More information about the hotspot-dev
mailing list