RFR: 8351491: Add info from release file to hserr file [v3]

Fri Apr 4 07:32:55 UTC 2025

On Fri, 4 Apr 2025 06:00:52 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Coming to this discussion late.
>> 
>> IMHO this is overengineered for just a printout to the hs-err file during error dumping. We already read from proc fs. Proc can be worse (depending on what you read, a lot) than reading sequentially from a flat file.
>> 
>> Remember that we already run the JVM binaries from the same file system. We read debug information from those binaries during error dumping, and that causes a ton of IO; a sequential read of a tiny file is a drop in the bucket.
>> 
>> Also remember that we have safety fuses: Step timeouts and Step signal handling - so if this read ever turns out to be a problem, e.g by hanging, the Step would be cancelled and error reporting would continue with the next step.
>> 
>> I would, however, attempt to avoid malloc. Not super important, but if its easy to do I would do it. Best by using a small fixed-sized stack-allocated buffer, and just printing the file line by line.
>> 
>> Just my 5 cent.
>
>> @tstuefe perhap this is over engineered but there are a number of issues to consider:
>> 
>>     * reading from the physical filesystem is not the same as reading from /proc and IMO is far more likely to be problematic if done in a signal handling context during error reporting - hence we need to read the file ahead-of-time
> 
> But we already do exactly that. We read the Elf- and Dwarf-files to print out the symbol and stack information. From the same filesystem. And these are way larger than the release file, and we don't read sequentially, but seek around. This generates a ton of IO. Reading a small file sequentially is fine in comparison.
> 
> I also don't follow the arguments: the argument for this is that the file system could be slow or IO could be broken in some form. 
> 
> If that is true, it is likely true for the the whole file system. In that case, VM startup would take either a very long time or the VM would not even come up, and I argue that is a configuration error. 
> 
> If only reading the release file is slow while reading binaries is fine (and why would that ever happen?), then doing this for every startup of the JVM is a bad idea since it blocks the task queue, concurrent execution or not. I think if we expect problems reading that file, we should rather read it when we need it, not unconditionally at every VM startup. Plus, in error handling we have safeguards against hanging reads - we cancel hanging error reporting steps. We don't have those safeguards in the normal JVM.

> > > @tstuefe perhap this is over engineered but there are a number of issues to consider:
> > > ```
> > > * reading from the physical filesystem is not the same as reading from /proc and IMO is far more likely to be problematic if done in a signal handling context during error reporting - hence we need to read the file ahead-of-time
> > > ```
> > 
> > 
> > But we already do exactly that. We read the Elf- and Dwarf-files to print out the symbol and stack information. From the same filesystem. And these are way larger than the release file, and we don't read sequentially, but seek around. This generates a ton of IO. Reading a small file sequentially is fine in comparison.
> 
> Yes we do read those, or attempt to, and doing so is risky and may not work. So do we just keep adding more and risky things to error reporting and keeping hoping it will all "just work"? If we can avoid such a risk, without undue cost/effort shouldn't we do so?
> 
> > I also don't follow the arguments: the argument for this is that the file system could be slow or IO could be broken in some form.
> 
> Two different arguments. Reading the file from a signal handling context may not work. Reading the file from disk during startup adds to the startup overhead.
> 
> Again, why not avoid these issues when there is a simple way to do so?

@dholmes-ora We won't probably convince each other. I don't want to hold up the PR, so I reviewed it.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24244#issuecomment-2777795925