RFR: 8256008: UL does not report anything if disk writing fails [v2]
Yasumasa Suenaga
ysuenaga at openjdk.java.net
Thu Nov 19 05:44:02 UTC 2020
On Wed, 11 Nov 2020 08:16:59 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>>> > > > * If you handle errors from writing, make sure you handle intermittend write errors correctly. You probably want to handle EINTR by re-trying to write.
>>> > >
>>> > >
>>> > > `LogFileStreamOutput::write()` uses `jio_fprintf()` (wrapper of `vfprintf()` ) and `fflush()`. In case of `fflush()`, we can retry, however `jio_fprintf()` can be done in the same way? `vfprintf()` might not set `errno` if error happens. (I'm not sure)
>>> >
>>> >
>>> > Hmm. Thinking about this some more:
>>> > Without reading your patch closely, what is the strategy you propose if a write error happens?
>>> >
>>> > 1. Will you write a message, then refuse further log attempts?
>>> > 2. Will you write a message, but continue logging? Writing a message each time an error happens?
>>> > 3. Will you write a message only for the first write error and then not refuse further log attempts?
>>> >
>>> > Depending on this we should handle intermittent errors (EINTR and maybe EAGAIN) or can continute to ignore them.
>>> > With (1) we should handle (ignore) intermittent errors to prevent one spurious write error from stoppng the log.
>>> > With (2) we don't have to handle them since we do essentially what we do today, but a bad IO device may now cause a flood of messages.
>>> > With (3) we don't have to either but will only see the first UL writing problem.
>>> > Not sure what I like best. What did you propose?
>>>
>>> I've proposed (3) on this patch. I want to notice log writing is failed due to some error when log file is strange.
>>>
>>> Error message would be shown only the first time an error occurs, however logger would not be disabled because someone might make free disk space without rebooting JVM.
>>> I thought (1) at first, but I didn't do so because it requires operator to resume logging (e.g. `jcmd VM.log`) when the problem is fixed.
>>
>> This makes sense. Its what I would have done too.
>>
>>>
>>> > > > * I see you make an RAII object to handle flock. Could you leave this out please to make this change easier to review? I am not against it, but would prefer this in a separate cleanup change.
>>> > >
>>> > >
>>> > > If `FileLocker` is removed from this change, I want to add it on another RFE before this because this change makes to return from several locations in `LogFileStreamOutput::write()`.
>>> > > Do you agree create subtask (RFE) to add `FileLocker`?
>>> >
>>> >
>>> > Yes. But on second thought, I don't want to cause you unnecessary work. So if that is all the cleanup you'll do in this patch, I am fine with it too.
>>>
>>> No problem, I will add it as RFE to JBS.
>>
>> Thanks. I'll review the patch in detail in the next days.
>>
>> Cheers, Thomas
>
>> Thanks @tstuefe ! I've sent PR for RAII object in #1156 . I will update this PR after #1156 is pushed.
>
> Okay I'll wait
Hi @tstuefe , could you review the new change?
> > * I would like it if we could attempt to write some final words into the log stream. Because that may still work, even if the larger output does not (I really hope we do not print characters individually, don't we?). Something like `("write failed (%d)", errno)`. That may or may not work but if it does its helpful when looking at the logfile.
>
> Agree, I will add log message.
I added error message to stderr when `jio_fprintf()` or `fflush()` is failed, but I can't have confidence whether we can write error message without UL format (without decorators) in here. It might break log parser for UL.
I could not use `log_error()` because LogOutput happens dead lock, and also the error might happen on LogOutput which is not set `logging` tag.
-------------
PR: https://git.openjdk.java.net/jdk/pull/1106
More information about the hotspot-runtime-dev
mailing list