8220394: bufferedStream does not honor size limit

Tue May 21 09:12:29 UTC 2019

Hi David,

On Tue, May 21, 2019 at 10:01 AM David Holmes <david.holmes at oracle.com>
wrote:

> Hi Thomas,
>
> On 21/05/2019 5:26 pm, Thomas Stüfe wrote:
> > Hi David,
> >
> > thank you for taking a look at this! Remarks inline:
> >
> > On Tue, May 21, 2019 at 8:45 AM David Holmes <david.holmes at oracle.com
> > <mailto:david.holmes at oracle.com>> wrote:
> >
> >     Hi Thomas,
> >
> >     Sorry but I don't grok this stop-gap fix. I get that bufferedStream
> >     is a
> >     strange entity - and to me it doesn't look or act like a buffered
> >     stream! - but other code seems to recognise this e.g. see:
> >
> >     ./share/code/codeHeapState.cpp
> >
> >     and look what it does to effect actual buffering between a
> >     bufferedStream and a real outputStream!
> >
> >     If there is code that misuses bufferedStream (and it seems the only
> >     other direct uses are in the attach listener and JFR) then I'd rather
> >     see those fixed.
> >
> >
> > Definitely.
> >
> >     But looking at the attach listener code it also seems
> >     to recognise that bufferedStream is just a "buffer" which it copies
> to
> >     the final destination ie. writing the content to the socket.
> >
> >
> > Yes, but the problem is that until it finally sends the output of an
> > attach op back it completely buffers the whole output in the stream
> > internal buffer. That is a problem especially if you have faulty
> > printing code.
> >
> > I encountered the problem when I was working on a diagnostic command and
> > had an error resulting in infinite printing. The process was growing
> > until the OOM killer killed it.
>
> That seems like a good debugging tool to me. If the output had simply
> been truncated then you may not have known where the problem was ;-)
>

I agree. That is why I assert in debug case.

But since bugs like these are likely to happen in situations which are not
critical to VM life (e.g. someone calls jcmd on a running server VM at a
customer site, and triggers an error in a diagnostic command) I rather not
crash the VM. I rather have truncated output in this case or an error
message. Therefore, for release, I truncate.

I know its not perfect. But I prefer that to killing the VM outright or
making the machine swap.

> > The intention of my patch is not to fix those usages - that should be
> > done separately, e.g. we should teach the attach listener to send output
> > piecemeal, or temporarily buffer to a local file. But that would be a
> > much larger fix, and only fixes this one occurrence.
>
> Hmmm. To me this is working as intended, with an assumption that the
> amount of data buffered will be finite and reasonable. It's not clear
> that you can change the way things work to send the data back piecemeal.
>

Lets hope its fixable. It is already today a problem if you have a command
which produces lots of output - you may hit timeouts on the receiving side.
But that is an issue for another day.

>
> > What I wanted was just an assert at a size which reasonably could be
> > called an error. Would make analyzing these issues much easier. And
> > since this is an error which very likely depends on runtime conditions
> > (e.g. you may hit an error in a jcmd at a customer site you never saw in
> > lab) it makes sense to have a solution for release builds too, hence the
> > silent truncation.
> >
> > Do you have a better proposal?
>
> No. I see what you are trying to do now. I think the commentary is a bit
> excessive and don't necessarily agree there is a fundamental problem
> with the users that needs to be fixed but ...
>

Okay, I may have gotten a bit overexcited with the comment, I will slim it
down :)

>
> One concern:
>
> 969     if (buffer_pos + len > MAX2(buffer_max, 100 * M)) {
>
> As this is already within a guard that "buffer_pos + len > buffer_max"
> you only need to check against 100*M - if flush() is a no-op. If flush()
> is not a no-op then this will be incorrect if buffer_max > 100MB and len
>  > 100M. Maybe save the original buffer_pos and only do your check if it
> has not changed - ie flush() was a no-op?

Good point. I'll fix it.

..Thomas

>
> Thanks,
> David
> -----
>
> >
> >     So where is the code that misuses bufferedStream?
> >
> >
> > It was jcmd.
> >
> >     Thanks,
> >     David
> >     -----
> >
> >
> > Cheers, Thomas
> >
> >
> >     On 21/05/2019 4:11 pm, Thomas Stüfe wrote:
> >      > Ping?
> >      >
> >      > Thanks, Thomas
> >      >
> >      > On Tue, May 14, 2019 at 12:25 PM Thomas Stüfe
> >     <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>>
> >      > wrote:
> >      >
> >      >>
> >      >>     Hi all,
> >      >>
> >      >>
> >      >> Bug: https://bugs.openjdk.java.net/browse/JDK-8220394
> >      >> cr:
> >      >>
> >
> http://cr.openjdk.java.net/~stuefe/webrevs/8220394-bufferedstream-does-not-honor-size-limit/webrev.00/webrev/
> >      >>
> >      >> I redid the patch for this in a minimal form. The first version
> >     I posted
> >      >> in March but after consideration I found it too invasive.
> >      >>
> >      >> Short story: bufferedStream is misused as a
> >      >> outputStream-with-dynamic-buffer in a number of places - misused
> >     since we
> >      >> have stringStream for that.
> >      >>
> >      >> It has something looking like a maximum buffer size cap but that
> is
> >      >> actually a flush-trigger for child classes of bufferesStream.
> >      >> bufferedStream::flush itself is a noop.
> >      >>
> >      >> That means that printing to this stream may cause high memory
> >     footprint or
> >      >> native OOM since the upper limit is not honored. In runaway
> >     printing coding
> >      >> this can tear down the process.
> >      >>
> >      >> This patch is a stopgap - when we reach the buffer limit - but
> >     not below
> >      >> 100M - we will assert(debug) or truncate (release).
> >      >>
> >      >> I am careful here since I do not know if there are situations
> >     where more
> >      >> than buffer-limit bytes are written today and suddenly enforcing
> >     this limit
> >      >> now would cause errors. I think 100M is safe to be considered
> >     "too much".
> >      >>
> >      >> The real correct solution should be that all callers use
> >     stringStream and
> >      >> handle truncation. This patch is small and could be, if
> >     necessary, ported
> >      >> down to older releases easily.
> >      >>
> >      >> Thanks, Thomas
> >      >>
> >      >>
> >      >>
> >
>