RFR: 8353439: Shell grouping of -XX:OnError= commands is surprising [v3]

Thomas Stuefe stuefe at openjdk.org
Tue Apr 8 13:45:22 UTC 2025


On Tue, 8 Apr 2025 13:24:32 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> > > Hmm.
> > > May there not be customers that specify a verbatim "spelled out" shell script as input for OnError? This is a behavior change.
> > > And if not (if the commands issued with OnError, separated by ;, are in turn real commands, programs or scripts): Those, before, were forked off as separate grandchilds to the same parent (the direct child), right? Whereas now we have a single parent for each grandchild process. But here, especially if OnError had been called as reaction to an OOM condition by a gigantic JVM, reusing that in-between shell may be preferable. Forking off a large process can be expensive.
> > > (Obviously, its all undocumented, which is bad in itself).
> > > All of these are questions - I may not know the full story.
> > 
> > 
> > Hi Thomas! Not sure I understand the first line about behaviour change. The ; separator was causing new separate shells used sequentially, but distinct OnError= arguments were not (yes, IF somebody has discovered that this works). So with the change, everything gets a new shell fork/exec'd. This should be more consistent, less surprising.
> 
> > I can't pretend the previous behaviour was to save memory! The posix_spawn usage hopefully means such big processes are more efficient.
> 
> Yes, posix_spawn helps, but for one I am not sure how solid the implementation is on non-linux unices, e.g. AIX. I would not be surprised if it still copied the whole working set upfront. But even if not, there is still some overhead for spawning with posix_spawn, even with COW. E.g. you need to duplicate the page table set.
> 
> When we write an error log, the working set of the JVM may be ridiculously large. You don't want to spend much time here, since - at this point - we have no cancellation logic (we are outside of VMError::report()), and you want to stop the JVM as fast as possible. Only then would outside processes notice the JVM is a goner, and e.g start a fresh JVM to serve user requests again.

But I don't want to keep up this PR, if you think this is the way forward. There is a very simple workaround for the problem I described above, which is to group multiple scripts into a single umbrella script and call that with a single OnError argument.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24354#issuecomment-2786483773


More information about the hotspot-dev mailing list