RFR: 8308751: Create new switch to print error reporting output to both hs_err_pid file and stdout/stderr [v3]

Thu Jun 22 06:22:05 UTC 2023

On Tue, 6 Jun 2023 17:57:08 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Masanori Yano has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   8308751: Create new switch to print error reporting output to both hs_err_pid file and stdout/stderr
>
>> /csr needed
>> 
>> I have my doubts about the usefulness of this.
> 
> I agree. This is nothing a simple `tee` couldn't solve. 
> 
> We've long departed from the Unix line of "do one thing and do it well", but there is a limit to how much functionality we should heap onto the JVM.

> @tstuefe As you say, there are situations where we can solve with the tee command, but only if java is invoked as a shell command. In the case of an application server, it is difficult to redirect to the tee command because java is started as a subprocess of the administration service.
> 
> Another reason is that the tee command does not exist in every environment where java runs. In a windows environment, we cannot use features like the tee command without adding features like wsl or cygwin.
> 
> I'm not proposing complex features outside the Unix philosophy, I'm just suggesting a way to enable options that are currently mutually exclusive.

First off, your implementation is not valid, at least not for more complex cases: 

You call report() twice. `report(verbose=true)` should only be called once since it is a bit of a "one-shot" operation. Many of the error-reporting steps may be destructive. They may crash or hang. They even may change runtime context. Even if they don't, they will run at different times (since hs_er reporting can take a while), which could cause the second printout to be different and probably less truthful since it is further removed from the crash point.

So, calling report(verbose=true) twice is absolutely not an option and we should add asserts for this. The only time when calling it again is valid is in the context of secondary crash handling, and even then we never repeat an error reporting step but skip over the ones that already ran.

To implement what you want, there are two options:

A) either buffer the content of the first printout. 
B) add a new type of outputStream that forks and prints twice

(A) is not reasonable. hs_err files can be massive, in the multi-gb range, for very large heaps. This is not something you want to cache.

(B) adds more even complexity.

So it comes down to complexity vs. benefit. The costs are:

- obviously, review cost for CSR and PR. These costs can be a lot higher than the initial effort that the author spent on his PR.
- more complex user-facing interface: option must be documented and explained. Extending this interface causes us to be less flexible in the future, translating to cost-of-opportunity. Also, customer support: every new feature *will* be used by some enterprising customer, if things go wrong someone needs to help them.
- more technical debt, more complex code: the more complex this coding gets, the more expensive future maintenance gets.
- more regression tests for both the new option as well as for the new forking outputStream (B) (these are machine costs as well as costs to fix regressions)

All of this is normal and if the new feature adds enough benefit, we do this of course. But please remember, these costs have to be carried by us, the maintainers, almost exclusively. And carried in all perpetuity since it is an official option that is meant to stay. 

IMHO, in this case the benefits - while they exist - are too slim to carry the costs.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14114#issuecomment-1602070855