RFR: 8256008: UL does not report anything if disk writing fails

Wed Nov 11 07:17:55 UTC 2020

On Wed, 11 Nov 2020 07:12:26 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>>> > > * If you handle errors from writing, make sure you handle intermittend write errors correctly. You probably want to handle EINTR by re-trying to write.
>>> > 
>>> > 
>>> > `LogFileStreamOutput::write()` uses `jio_fprintf()` (wrapper of `vfprintf()` ) and `fflush()`. In case of `fflush()`, we can retry, however `jio_fprintf()` can be done in the same way? `vfprintf()` might not set `errno` if error happens. (I'm not sure)
>>> 
>>> Hmm. Thinking about this some more:
>>> 
>>> Without reading your patch closely, what is the strategy you propose if a write error happens?
>>> 
>>> 1. Will you write a message, then refuse further log attempts?
>>> 2. Will you write a message, but continue logging? Writing a message each time an error happens?
>>> 3. Will you write a message only for the first write error and then not refuse further log attempts?
>>> 
>>> Depending on this we should handle intermittent errors (EINTR and maybe EAGAIN) or can continute to ignore them.
>>> With (1) we should handle (ignore) intermittent errors to prevent one spurious write error from stoppng the log.
>>> With (2) we don't have to handle them since we do essentially what we do today, but a bad IO device may now cause a flood of messages.
>>> With (3) we don't have to either but will only see the first UL writing problem.
>>> 
>>> Not sure what I like best. What did you propose?
>> 
>> I've proposed (3) on this patch. I want to notice log writing is failed due to some error when log file is strange.
>> 
>> Error message would be shown only the first time an error occurs, however logger would not be disabled because someone might make free disk space without rebooting JVM.
>> I thought (1) at first, but I didn't do so because it requires operator to resume logging (e.g. `jcmd VM.log`) when the problem is fixed.
>> 
>>> > > * I see you make an RAII object to handle flock. Could you leave this out please to make this change easier to review? I am not against it, but would prefer this in a separate cleanup change.
>>> > 
>>> > 
>>> > If `FileLocker` is removed from this change, I want to add it on another RFE before this because this change makes to return from several locations in `LogFileStreamOutput::write()`.
>>> > Do you agree create subtask (RFE) to add `FileLocker`?
>>> 
>>> Yes. But on second thought, I don't want to cause you unnecessary work. So if that is all the cleanup you'll do in this patch, I am fine with it too.
>> 
>> No problem, I will add it as RFE to JBS.
>
>> > > > * If you handle errors from writing, make sure you handle intermittend write errors correctly. You probably want to handle EINTR by re-trying to write.
>> > > 
>> > > 
>> > > `LogFileStreamOutput::write()` uses `jio_fprintf()` (wrapper of `vfprintf()` ) and `fflush()`. In case of `fflush()`, we can retry, however `jio_fprintf()` can be done in the same way? `vfprintf()` might not set `errno` if error happens. (I'm not sure)
>> > 
>> > 
>> > Hmm. Thinking about this some more:
>> > Without reading your patch closely, what is the strategy you propose if a write error happens?
>> > 
>> > 1. Will you write a message, then refuse further log attempts?
>> > 2. Will you write a message, but continue logging? Writing a message each time an error happens?
>> > 3. Will you write a message only for the first write error and then not refuse further log attempts?
>> > 
>> > Depending on this we should handle intermittent errors (EINTR and maybe EAGAIN) or can continute to ignore them.
>> > With (1) we should handle (ignore) intermittent errors to prevent one spurious write error from stoppng the log.
>> > With (2) we don't have to handle them since we do essentially what we do today, but a bad IO device may now cause a flood of messages.
>> > With (3) we don't have to either but will only see the first UL writing problem.
>> > Not sure what I like best. What did you propose?
>> 
>> I've proposed (3) on this patch. I want to notice log writing is failed due to some error when log file is strange.
>> 
>> Error message would be shown only the first time an error occurs, however logger would not be disabled because someone might make free disk space without rebooting JVM.
>> I thought (1) at first, but I didn't do so because it requires operator to resume logging (e.g. `jcmd VM.log`) when the problem is fixed.
> 
> This makes sense. Its what I would have done too.
> 
>> 
>> > > > * I see you make an RAII object to handle flock. Could you leave this out please to make this change easier to review? I am not against it, but would prefer this in a separate cleanup change.
>> > > 
>> > > 
>> > > If `FileLocker` is removed from this change, I want to add it on another RFE before this because this change makes to return from several locations in `LogFileStreamOutput::write()`.
>> > > Do you agree create subtask (RFE) to add `FileLocker`?
>> > 
>> > 
>> > Yes. But on second thought, I don't want to cause you unnecessary work. So if that is all the cleanup you'll do in this patch, I am fine with it too.
>> 
>> No problem, I will add it as RFE to JBS.
> 
> Thanks. I'll review the patch in detail in the next days.
> 
> Cheers, Thomas

Thanks @tstuefe ! I've sent PR for RAII object in #1156 . I will update this PR after #1156 is pushed.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1106