RFR: JDK-8319307: DCmds should not assert on truncation and should report truncation [v2]

David Holmes dholmes at openjdk.org
Wed Nov 8 07:57:56 UTC 2023


On Wed, 8 Nov 2023 06:48:03 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> I don't think I agree with this.
>> 
>> If DCmds are regularly asserting, or being truncated, then we should be using a larger buffer shouldn't we? That is what the assert was intended to be used for.
>> 
>> Having clients of the stream actively check for truncation also seems wrong to me. The basic model of our `bufferedStream` is that it is presumed large enough for all usecases (hence the assert when not the case) and users don't have to care. Otherwise the normal API approach here would be for `write()` to return a value indicating truncation so that the caller could consider resizing the buffer.
>
>> If DCmds are regularly asserting, or being truncated, then we should be using a larger buffer shouldn't we? That is what the assert was intended to be used for.
> 
> The original bufferedStream did not have a limit. We added a limit with JDK-8220394 because we had cases where VMs were OOM-killed due to an error in printing code. But even beyond errors in printing code, a limit seems reasonable. E.g. when running with ZGC, I had VM.info reports that were larger than 2 GB due to the fragmented memory mapping. Do we want to cache that all?
> 
> If you have a limit, all the rest follows:
> - it makes sense to signal the truncation to the caller
> - it makes sense (though it is optional) to stop printing once the bufferedStream cannot receive the output anymore.
> 
> If we don't want a limit, we can either restore bufferedStream to its original state pre-JDK8220394 and live with the danger. Or maybe we can think about caching large output differently. Possibly writing temporary files to /tmp. Ideally, the jcmd communication protocol would allow streaming the output back to jcmd, but that is harder to do, especially not if one wants to remain compatible across JDK versions.
> 
>> Having clients of the stream actively check for truncation also seems wrong to me. The basic model of our bufferedStream is that it is presumed large enough for all usecases (hence the assert when not the case) and users don't have to care. Otherwise the normal API approach here would be for write() to return a value indicating truncation so that the caller could consider resizing the buffer.
> 
> The truncation check is optional, but reasonable IMHO, see ratio above.
> 
>> I don't think clients should be aware of, or worrying about truncation, this way. It seems very intrusive.
> 
> Hmm, I'd have thought you agree with this change since when discussing JDK-8220394 you proposed a truncation message:
> 
> https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2019-May/034431.html

@tstuefe  thanks for the history lesson. So it was assumed writes can't fail and messages can't be truncated, which was true until we hit actual system limits. So we imposed a cap that we didn't expect to reach in practice (hence assert) and an indicator that the message had been truncated (which is a good thing). So we were still pretending writes can't fail.

Now you are saying, lets drop the pretense - the buffer is limited in size and callers have to (or can if they want to) deal with that. The question then is how? The usual way would be for writes to fail, report an error and caller then does something to resize the buffer or abort the writing. That would be very disruptive to the existing code. Instead you've added a way for a failing write to record that it failed in a side-channel: is_truncated(). The callers can then query this side-channel and decide what to do. Now that I write that all out this isn't as bad as I initially thought, but I still find it somewhat grating from an API perspective.

An alternative, for these problem cases, would be to go back to a buffer that attempts to grow as needed until it hits system limits. Are the actual buffers involved isolated so that we can choose which one(s) to change to be unlimited? If so are these commands ones that would likely lead to the OOM issues that caused the limit to be imposed in the first place?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16474#issuecomment-1801263601


More information about the hotspot-runtime-dev mailing list