RFR: JDK-8319307: DCmds should not assert on truncation and should report truncation [v2]
Thomas Stuefe
stuefe at openjdk.org
Fri Nov 10 07:46:57 UTC 2023
On Wed, 8 Nov 2023 06:48:03 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> I don't think I agree with this.
>>
>> If DCmds are regularly asserting, or being truncated, then we should be using a larger buffer shouldn't we? That is what the assert was intended to be used for.
>>
>> Having clients of the stream actively check for truncation also seems wrong to me. The basic model of our `bufferedStream` is that it is presumed large enough for all usecases (hence the assert when not the case) and users don't have to care. Otherwise the normal API approach here would be for `write()` to return a value indicating truncation so that the caller could consider resizing the buffer.
>
>> If DCmds are regularly asserting, or being truncated, then we should be using a larger buffer shouldn't we? That is what the assert was intended to be used for.
>
> The original bufferedStream did not have a limit. We added a limit with JDK-8220394 because we had cases where VMs were OOM-killed due to an error in printing code. But even beyond errors in printing code, a limit seems reasonable. E.g. when running with ZGC, I had VM.info reports that were larger than 2 GB due to the fragmented memory mapping. Do we want to cache that all?
>
> If you have a limit, all the rest follows:
> - it makes sense to signal the truncation to the caller
> - it makes sense (though it is optional) to stop printing once the bufferedStream cannot receive the output anymore.
>
> If we don't want a limit, we can either restore bufferedStream to its original state pre-JDK8220394 and live with the danger. Or maybe we can think about caching large output differently. Possibly writing temporary files to /tmp. Ideally, the jcmd communication protocol would allow streaming the output back to jcmd, but that is harder to do, especially not if one wants to remain compatible across JDK versions.
>
>> Having clients of the stream actively check for truncation also seems wrong to me. The basic model of our bufferedStream is that it is presumed large enough for all usecases (hence the assert when not the case) and users don't have to care. Otherwise the normal API approach here would be for write() to return a value indicating truncation so that the caller could consider resizing the buffer.
>
> The truncation check is optional, but reasonable IMHO, see ratio above.
>
>> I don't think clients should be aware of, or worrying about truncation, this way. It seems very intrusive.
>
> Hmm, I'd have thought you agree with this change since when discussing JDK-8220394 you proposed a truncation message:
>
> https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2019-May/034431.html
> @tstuefe thanks for the history lesson. So it was assumed writes can't fail and messages can't be truncated, which was true until we hit actual system limits. So we imposed a cap that we didn't expect to reach in practice (hence assert) and an indicator that the message had been truncated (which is a good thing). So we were still pretending writes can't fail.
Yes
>
> Now you are saying, lets drop the pretense - the buffer is limited in size and callers have to (or can if they want to) deal with that. The question then is how? The usual way would be for writes to fail, report an error and caller then does something to resize the buffer or abort the writing. That would be very disruptive to the existing code. Instead you've added a way for a failing write to record that it failed in a side-channel: is_truncated(). The callers can then query this side-channel and decide what to do. Now that I write that all out this isn't as bad as I initially thought, but I still find it somewhat grating from an API perspective.
Yes. Minus the grating, I think having an observable "buffer full" signal would make sense at least for stringStream, which can be run over an existing static buffer. I agree that bufferedStream feels more like it should not limit output but flush occasionally. IMHO the real error is not honoring the flush() request.
>
> An alternative, for these problem cases, would be to go back to a buffer that attempts to grow as needed until it hits system limits. Are the actual buffers involved isolated so that we can choose which one(s) to change to be unlimited? If so are these commands ones that would likely lead to the OOM issues that caused the limit to be imposed in the first place?
Well, yes. Its the one buffer in the attach listener. We could just crank up the buffer maximum; we could also do that on a per-command base, for commands that we know may print a lot of output.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16474#issuecomment-1805244255
More information about the hotspot-runtime-dev
mailing list