Performance costs of untriggered UL log points?

Thomas Stüfe thomas.stuefe at gmail.com
Thu Feb 8 14:26:17 UTC 2018


Hi Kirk,

On Thu, Feb 8, 2018 at 3:12 PM, Kirk Pepperdine <kirk.pepperdine at gmail.com>
wrote:

> Hi Thomas,
>
> Our experience with logging is that frequency really really matters. Any
> efficient algorithm can become a problem if it’s called way too frequently.
> At issue with UL is that the message sizes tend to be very small (very high
> noise for the amount of signal) resulting in a very noticeable jump in
> logging frequency.


My point was that I have _untriggered_ logging points in the VM (so,
logging is switched off) causing performance degradation. Logging=on does
cost performance, sure. But I would expect logging=off to ideally cost
nothing.

Which is obviously not true, because the at each logging point the VM needs
to check if logging is enabled, but still, that check should be really
really cheap.

Otherwise, one now would have to be careful where to place log points, and
avoid placing them inside loops, for example.

Kind Regards, Thomas


In this case you might look at this as a frequency issue rather than the
> cost of a single action. I believe your findings confirm this.
>
> Kind regards,
> Kirk Pepperdine
>
>
> > On Feb 8, 2018, at 2:42 PM, Marcus Larsson <marcus.larsson at oracle.com>
> wrote:
> >
> > Hi Thomas,
> >
> >
> > On 2018-02-07 17:23, Thomas Stüfe wrote:
> >> Hi guys,
> >>
> >> short question, when creating UL, did anyone measure what an untriggered
> >> instrumentation point does cost?
> >
> > No, AFAIK no such detailed measurements were done.
> > Although, no regressions were found with logging disabled (or even
> enabled) on the larger benchmark suites.
> >
> >>
> >> I was the other day comparing performance of a patch of mine with the
> >> unpatched jdk. To my dismay my patched version was consistently ~5-10%
> >> slower. But I had logging points in that function, for example:
> >>
> >> 1865 log_trace(gc, metaspace, freelist)("Coalescation (%s): created
> >> coalesced chunk at %p, size " SIZE_FORMAT_HEX ".",
> >> 1866                (is_class() ? "class space" : "metaspace"),
> >> 1867                 p_new_chunk, p_new_chunk->word_size() *
> >> sizeof(MetaWord));
> >>
> >> The function did run ~3000 times during my test - so not an indecent
> amount
> >> of times, but something which one might conceivably want to do. Logging
> was
> >> switched off. I found that commenting out this log line would make the
> >> difference to the unpatched jdk disappear (runtime went down from ~90ms
> to
> >> ~80ms).
> >
> > I'm assuming this was on a product/fastdebug build? It sounds a bit much
> to me, but I've never really measured the overhead on that level, so I
> can't really say if it is expected or not.
> >
> >>
> >> Of course I am happy that my patch is performing well after all, but I
> am
> >> surprised too. I would have naively assumed that tracing not switched on
> >> costs next to nothing.
> >>
> >> I am aware that one needs to access some global information in order to
> >> find out the current tracing level, but maybe there is still room for
> >> improvement?
> >>
> >> In our port we have a very old and simple tracing system which is
> triggered
> >> by a simple global bit array. Accessing that is quite cheap, at least I
> >> never noticed any negative effects like described above.
> >>
> >> Kind Regards, Thomas
> >
> > Right now, an is_level() check boils down to a single compare
> instruction followed by a jump (at least on my workstation (Linux x64)).
> Different tag sets use different addresses, but otherwise it seems pretty
> optimal to me. A bitmap or something like a single global word with on/off
> status for multiple tag sets would improve cache locality, but it seems
> overly complicated IMHO.
> >
> > Thanks,
> > Marcus
>
>


More information about the hotspot-runtime-dev mailing list