RFR: 8143157: Convert TraceVMOperation to Unified Logging

Fri Nov 20 00:10:23 UTC 2015

Hi Coleen,

On 20/11/2015 1:27 AM, Coleen Phillimore wrote:
> On 11/18/15 5:06 PM, David Holmes wrote:
>> On 19/11/2015 4:41 AM, Rachel Protacio wrote:
>>> On 11/18/2015 12:32 PM, Max Ockner wrote:
>>>> I think the issue is that doit() happens regardless of what is being
>>>> logged. I don't think we can reorder the logging statements relative
>>>> to doit(). In that case, we would be forced to split into 2
>>>> conditionals for the logging.
>>>>
>>>> Still I don't think it is necessary to call log_is_enabled twice, and
>>>> I don't think it is necessary to always define the outputStream. If
>>>> this turns out to be a problem, we could flip things around a bit:
>>>>
>>>>   void VM_Operation::evaluate() {
>>>>     ResourceMark rm;
>>>> !   bool enabled = false;
>>>> !   if (log_is_enabled(Debug, vmoperation)) {
>>>> !     outputStream* debugstream =
>>>> LogHandle(vmoperation)::debug_stream();
>>>> !     enabled = true;
>>>> !   if (enabled) {
>>>> !     print_on_error(debugstream);
>>>> !     debugstream->cr();
>>>>     }
>>>>     doit();
>>>> !   if (enabled) {
>>>> !     print_on_error(debugstream);
>>>> !     debugstream->print_cr("]");
>>>>     }
>>>>   }
>>>>
>>> I'm fine with moving the definition into a conditional (but I can't move
>>> the declaration for scope reasons). It will look like this
>>>
>>>     outputStream* debugstream;
>>>     if (log_is_enabled(Debug, vmoperation)) {
>>>          debugstream = LogHandle(vmoperation)::debug_stream();
>>
>> You may run afoul of the static analyzer when you use debugstream in
>> the logging clause after the doit(). It won't know that log_is_enabled
>> has to return the same answer both times. The use of the enabled flag
>> would address that.
>>
>>> To be honest, I'm not sure whether evaluating log_is_enabled twice is
>>> worse than adding a boolean check and cluttering the code and making it
>>> less readable. The argument for changing log_is_enabled() to a boolean
>>> could be extrapolated to say that it should be a boolean for every check
>>> for every tag. And that might be a good idea. But I feel like the
>>> problem with it here is that it comes across as confusing to add
>>> "enabled", which only actually needs to be used once. If we want it
>>> changed, maybe we should change the macro? Or put it outside the
>>> function, even, to make it only evaluate once?
>>
>> I would argue that within a function log_is_enabled should only be
>> called once per tag/level combination. I don't see it impeding
>> readability at all - quite the opposite. (The new logging code in the
>> current example, looks worse to me, than the original, from a
>> clutter/readability perspective.)
>
> Something like what both you and Rachel said?
>
>     outputStream* debugstream;
>     if (log_is_enabled(Debug, vmoperations)) {
>           debugstream = LogHandle(vmoperation)::debug_stream();
>           log output
>           enabled = true;
>    }
>    vm_operation
>    if (enabled) {
>        log output again
>    }

Yes. Given what we have heard about dynamically enabling/disabling 
logging at runtime, using a single enabling flag in this kind of 
situation is important I think.

>>>> On 11/18/2015 4:22 AM, David Holmes wrote:
>>>>> On 18/11/2015 5:50 AM, Rachel Protacio wrote:
>>>>>> Please review the following small logging enhancement.
>>>>>>
>>>>>> Summary: The former -XX:+TraceVMOperation flag is updated to the
>>>>>> unified
>>>>>> logging framework and is now replaced with -Xlog:vmoperation in
>>>>>> product
>>>>>> mode.
>>>>>>
>>>>>> Open webrev: http://cr.openjdk.java.net/~rprotacio/8143157/
>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8143157
>>>>>> Testing: Passes jtreg, JPRT, RBT quick tests, and refworkload
>>>>>> performance tests.
>>>>>
>>>>> Meta-question: the logging framework is safe to be called when at a
>>>>> safepoint isn't it?
>>> I haven't seen any problems, but I also don't know what would cause it
>>> to be unsafe. I'd appreciate any guidance in figuring it out!
>>
>> The main thing would be locking of course.
>
> I think the only locking is the tty which the GC uses so if it's not
> safe for safepoint use, then there's another bigger problem with the
> framework.

But can't this hook into non-tty based logging as directed by the user? 
And can that hit any log-file rotation paths that use locking?

Just want to be sure :)

>>>>>
>>>>> src/share/vm/runtime/vm_operations.cpp
>>>>>
>>>>>   void VM_Operation::evaluate() {
>>>>>     ResourceMark rm;
>>>>> !   outputStream* debugstream =
>>>>> LogHandle(vmoperation)::debug_stream();
>>>>> !   if (log_is_enabled(Debug, vmoperation)) {
>>>>> !     debugstream->print("[");
>>>>> !     print_on_error(debugstream);
>>>>> !     debugstream->cr();
>>>>>     }
>>>>>     doit();
>>>>> !   if (log_is_enabled(Debug, vmoperation)) {
>>>>> !     print_on_error(debugstream);
>>>>> !     debugstream->print_cr("]");
>>>>>     }
>>>>>   }
>>>>>
>>>>> Why are you calling print_on_error twice?
>>> The problem with switching VMOperation to logging is that in the current
>>> system, the output goes straight to tty and so can be concatenated.
>>> However, since the output triggered by the flag needs to be printed in a
>>> log stream (which is not necessarily tty and which has bracketed
>>> decorations), the "[<VM operation info>" portion comes out in that
>>> stream, which could then be broken be other output, making it confusing
>>> to have a closing "]" sitting around later. Coleen and I agreed that it
>>> would improve output readability to be able to match the closing bracket
>>> by restating the VM operation info. In other words:
>>>
>>>     [VM_Operation (0x00007fa3b44ce700): G1CollectFull, mode: safepoint,
>>>     requested by thread 0x00007fa3ac018000
>>>     <code from doit() and not from TraceVMOperation>
>>>     ]
>>>
>>> becomes
>>>
>>>     [0.257s][debug  ][vmoperation] [VM_Operation (0x00007f74e589c700):
>>>     G1CollectFull, mode: safepoint, requested by thread
>>> 0x00007f74dc018000
>>>     <code from doit() and not from TraceVMOperation>
>>>     [0.272s][debug  ][vmoperation] VM_Operation (0x00007f74e589c700):
>>>     G1CollectFull, mode: safepoint, requested by thread
>>> 0x00007f74dc018000]
>>>
>>> rather than
>>>
>>>     [0.257s][debug  ][vmoperation] [VM_Operation (0x00007f74e589c700):
>>>     G1CollectFull, mode: safepoint, requested by thread
>>> 0x00007f74dc018000
>>>     <code from doit() and not from TraceVMOperation>
>>>     [0.272s][debug  ][vmoperation] ]
>>>
>>> It should be easier for users to parse this.
>>
>> Okay but in that case each part should have the trailing ]
>>
>> Or, perhaps better, given all the other decoration why not just drop
>> the [ ] bracketing altogether? The intent to was capture all output
>> that occurred while the VMOperation was active, but that is now much
>> clearer in the output anyway. Though perhaps a "start" and "end"
>> indicator would aid parsing - especially when nested VMOperations
>> might occur.
>
> Either way!   The '[' on the first line and the ']' on the second log
> line match up but we could drop both or keep them.  This doesn't look
> like a big deal!    "start" and "end" are too noisy.

Not a big deal, just trying to make it look neat and easy to match up 
the pre- and post- operation log lines. The decorations are already 
noisy so I don't think start/end make it any worse.

>>
>>>>>
>>>>> Why is the only logging level for this tag the "debug" level? I think
>>>>> I may be missing part of the way UL works here - can you enable
>>>>> logging both by explicit tag (ie vmoperation) and the level (ie
>>>>> debug)?
>>>>>
>>> You enable the logging with "-Xlog:vmoperation=debug". If you leave of
>>> the "=<level>" portion, it is by default parsed as "=info". We don't
>>> believe this vmoperation logging needs to come out by default in the
>>> case where someone asks for "-Xlog:all", so we put it one level below.
>>
>> <sigh> So does -Xlog:all signify all tags at info level or ???
>>
>> I think this is real conceptual problem with the UL framework. I
>> should be able to define the info/trace/debug levels for a given tag,
>> without having to think about how they interact with a "log all" request.
>
> We picked debug level.  It seems like debug level.  Do you think it
> should be Info level?  Is it important information?

In the current TraceXXX logging scheme it is the tag, the XXX, that is 
the primary focus. All of our tags are independent. We can add "levels" 
to that logging by either adding in other flags (like Verbose or 
WizardMode) or by using a numeric rather than boolean flag type.

In the UL framework, as discussed in the emails with Kirk, the primary 
focus is on the levels, and then the tags. This is an inversion of the 
current scheme. That in itself would not be a problem when just 
considering non-product tracing flags - particularly if, as discussed in 
another thread, the distinction between product and non-product levels 
is itself raised to be the first-level distinction. For each flag/tag 
you can just consider: is this a first level of detail (info), a second 
level of detail (debug), or a third level of detail (trace) for this tag?

However, when you change a non-product tag to product, this 
level-oriented focus can become problematic: the framework wants the 
user to be able to enable all info logging (for example), but the "info" 
for tag XXX isn't considered anywhere as near as important as the info 
for some other tags so "we" don't want it to show up in this case, so 
instead of using "info" we use "debug", so that it is hidden. As per the 
current case.

So now we are not using the levels to classify the detail of a 
particular subsystem, but rather we are using levels to provide a 
secondary hierarchy of subsystems/tags:

- level 1: all tags that have "info" logging statements
- level 2: all "debug" logging from level 1 tags plus all tags that 
start logging at "debug" level
- level 3: all "trace" logging from level 1 and 2 tags, plus all 
remaining tags that start logging at "trace" level

I think this is problematic because we have to consider logging 
holistically instead of just thinking about the subsystem being logged. 
I also think the rules are very ad-hoc and subjective, which isn't 
conducive to getting a clean and consistent approach to logging.

>>
>>>>> And I know I sound like a broken record but I'm concerned about the
>>>>> overhead of the logging actions when it is not enabled ie:
>>>>>
>>>>> outputStream* debugstream = LogHandle(vmoperation)::debug_stream();
>>>>>
>>>>> always gets executed - and we evaluate is_enabled twice.
>>>>>
>>> See my reply to Max above.
>>>>> ---
>>>>>
>>>>> test/runtime/logging/VMOperationTestMain.java
>>>>>
>>>>> Can you add a comment explaining what the logic is attempting to do
>>>>> please. I'm curious how many times the loop executes before you get a
>>>>> safepoint-based GC (and how it varies for different GC's).
>>>>>
>>>>> Style nit: while(  -> while (
>>> Yes, will fix.
>>>>>
>>>>>> A compatability request has been accepted with regard to this change.
>>>>>
>>>>> I'll record my objections again to the conversion of develop flags to
>>>>> product. <sigh>
>>> Thanks for voicing your concern. The justification for this tag being
>>> product is that it could be useful to helping diagnose user problems, it
>>> does not slow performance when off, and it minimally increases the size
>>> of the code. If any of these prove to be false in aggregated logging
>>> options, we can move it to develop level during code freeze.
>>
>> You could make that argument for every piece of logging and end up
>> with a very simple flat logging system. :) It is all very subjective
>> of course.
>>
>
> Yes, we do intend on making this assessment and argument for each
> logging level because we don't want a simple flat logging system. The
> code that we are logging isn't simple nor flat.   I can cut/paste the
> decision graph for you again.

See above.

Thanks,
David

> Thanks,
> Coleen
>
>> Thanks,
>> David
>>
>>> Thanks,
>>> Rachel
>>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Thank you very much,
>>>>>> Rachel
>>>>
>>>
>