Unified Logging for network

Wed Nov 4 01:32:04 UTC 2020

Hi Thomas,

On 2020/11/03 17:09, Thomas Stüfe wrote:
> Hi Yasumasa,
> 
> I don't argue that such a feature would not be useful. Of course it would!
> 
> But as with any other added feature, it will come at the cost of complexity. It will have to be maintained, tests will have to be written and run. That increases technical debt for us all.
> 
> That is not a reason not to do it, but to think before doing it and exploring alternatives.
> 
> --
> 
> To me, the fact that a logging call now could possibly do Network IO fills me with deep unease. It violates the principle of least surprise. Logging should be as basic as possible, in order to be usable anywhere in code.
> 
> - as had been said before, it would introduce unpredictable timing behavior. The fact that we have this already today is not a big consolation :(
> 
> - similar to "the User should know what he does" argument - unfortunately many don't, so a balance has to be found to limit support from these cases

I agree with you. So I want to hear various opinions before submitting RFE :)
I want to find "balance" in this discussion.

> - AFAICS we do not do network IO anywhere in the hotspot today. That coding would have to be written and tested. Reusing some other code - e.g. from the corelibs - is out of question for such a low level API, since you don't want to risk circularities.

Debug VM has networkStream which is used for Ideal Graph Visualizer. I guess we can use it (or similar implementation) for this purpose.
We cannot use Java Socket API because UL would out at safepoint (e.g. GC logs)

> - But now we have a complete network stack below the innocuous logging call. This imposes further restrictions on where we can log - eg even if it were possible before, logging from signal handling is impossible now. Without these restrictions documented and tested anywhere. To me this makes UL more and more questionable, and I already tend to shun it when possible in favour of plain tty printing.

I think UL should be used for most of HotSpot logs, however some critical errors (e.g. at signal handler) should be used tty printing.
UL is useful for log management (e.g. log rotation)

> I argued yesterday against Ioi's concurrent-log-draining, but that is actually more attractive the more I think about it.
> 
> Only, could the same not be achieved with piping stdout/err to a separate tool like netcat, as Leo suggested?
> 
> That solution exists today. If netcat does not do it for you, this could also be a separate utility - could be even part of the jdk. Conceptually this would be much the same as a separate thread printing out UL, with the pipe size being the buffer size. Or, communication could happen via shared memory...
> 
> This would have two distinct advantages over doing network IO in UL:
> - we see the whole stderr output (e.g. output from the libc, or any third party tools)
> - we see output also from a VM which crashed and burned. E.g. any last words from hs-err reporting.

Now we can use netcat/logstash/fluentd, however I don't use stdout/stderr because application and/or libraries, frameworks might print some messages (includes exception call stacks) into it. It makes difficult to parse log lines.
And also I don't want to use UL files for it because it should be one log file (should not rotate), so file size might be large.

Cheers,

Yasumasa

> Cheers, Thomas
> 
> 
> On Tue, Nov 3, 2020 at 3:12 AM Yasumasa Suenaga <suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com>> wrote:
> 
>     Hi,
> 
>     I agree this proposal might occur performance issue. However I think it is the responsibility of the user.
>     If this proposal is implemented, I think it would be transferred to local log shipper process (fluentd, logstash on 127.0.0.1) in most case because HotSpot does not send log with JSON. And also log shipper may support message buffering and message queue persistence.
>     We can avoid (in part of) performance/reliability issues with log shipper.
> 
>     Even if current implementation, performance issues is occurs when the disk is very slow (e.g. storage is broken).
> 
> 
>     Cheers,
> 
>     Yasumasa
> 
> 
>     On 2020/11/03 6:31, Thomas Stüfe wrote:
>      > Hi Ioi,
>      >
>      > I dimly remember proposals like this from the past. Main problem I see is
>      > how large would you dimension the buffer, and what do you do if the buffer
>      > cannot be drained rapidly enough. Discard log output? Hold? The former
>      > sounds bad, the latter negates the advantages of such a buffer.
>      >
>      > Then, access to such a buffer would probably have to be synchronized,
>      > whereas today AFAIK the log calls do not have to be.
>      >
>      > Cheers, Thomas
>      >
>      > On Mon 2. Nov 2020 at 22:18, Ioi Lam <ioi.lam at oracle.com <mailto:ioi.lam at oracle.com>> wrote:
>      >
>      >> For performance, maybe the implementation can log into a memory buffer,
>      >> and use a worker thread to send the output over the network? That way we
>      >> can minimize the overhead per log_xxx() call.
>      >>
>      >> I agree that using "-Xlog:foo=debug:network=xyz.com:1234 <http://xyz.com:1234>" would be quite
>      >> handy when you have lots of containers. You don't need to enable remote
>      >> access to the container's file system just to get to the log file.
>      >>
>      >> Thanks
>      >> - Ioi
>      >>
>      >> On 11/2/20 11:10 AM, Kirk Pepperdine wrote:
>      >>> Hi Thomas,
>      >>>
>      >>> I appreciate Yasumasa’s desire to be able to redirect UL output to
>      >> somewhere other than… I also appreciate that the highly granular nature of
>      >> how UL messages are currently structure can be and indeed are an issue.
>      >> That said, I’d also like the ability to push the data to some where other
>      >> than a file on disk.
>      >>>
>      >>> To the point of granularity, UL might benefit from some message
>      >> coarsening. This might also help in with other logging related performance
>      >> issues that I’ve noted here and there. Quite frankly dealing with logs in
>      >> containers isn’t a wonderful experience. And while I firmly believe that
>      >> there is more that containers can do to ease this, being able to redirect
>      >> output to something other than a log file does feel like it would be
>      >> helpful. That said, I’m also concerned about the potential performance
>      >> impacts but I think for this things that one would generally log, this
>      >> should be minimal.
>      >>>
>      >>> Kind regards,
>      >>> Kirk Pepperdine
>      >>>
>      >>>
>      >>>> On Nov 2, 2020, at 4:26 AM, Thomas Stüfe <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>>
>      >> wrote:
>      >>>>
>      >>>> Hi Yasumasa,
>      >>>>
>      >>>> one problem I see is that this could introduce a surprising amount of
>      >> lag
>      >>>> into log() calls which do look inconspicuous, thereby distorting timing
>      >>>> behavior or even create timeout effects. We already have that problem
>      >> now
>      >>>> to some degree when logging to network shares.
>      >>>>
>      >>>> Another thing, log output can be very fine granular, which would create
>      >> a
>      >>>> lot of network traffic.
>      >>>>
>      >>>> Such an addition may also open some security questions.
>      >>>>
>      >>>>   From a more philosophical standpoint, I like the "do one thing and do
>      >> it
>      >>>> right" Unix way and this seems more like something an outside tool
>      >> should
>      >>>> be doing. Which could also aggregate log output better. But I admit that
>      >>>> argument is weak.
>      >>>>
>      >>>> Cheers, Thomas
>      >>>>
>      >>>>
>      >>>>
>      >>>> On Mon, Nov 2, 2020 at 12:21 PM Yasumasa Suenaga <
>      >> suenaga at oss.nttdata.com <mailto:suenaga at oss.nttdata.com>>
>      >>>> wrote:
>      >>>>
>      >>>>> Hi all,
>      >>>>>
>      >>>>> We need to out UL to stdout and/or file. If we can out it to TCP
>      >> socket, I
>      >>>>> think it is useful.
>      >>>>>
>      >>>>> For example, some system gather all logs to document oriented databases
>      >>>>> (e.g. Elasticsearch) and/or cloud monitoring platform (e.g.
>      >> CloudWatch). If
>      >>>>> HotSpot can out UL to TCP socket, we can send all logs to them via TCP
>      >>>>> input plugin (Fluentd, Logstash).
>      >>>>>
>      >>>>> I think it is useful for container platform. What do you think?
>      >>>>> If it is worth to work, I will add CSR and JBS ticket, and also will
>      >>>>> create patch.
>      >>>>>
>      >>
>      >>
>