Performance costs of untriggered UL log points?

Thomas Stüfe thomas.stuefe at gmail.com
Thu Feb 8 17:13:30 UTC 2018


Hi Kirk,

On Thu, Feb 8, 2018 at 4:17 PM, Kirk Pepperdine <kirk.pepperdine at gmail.com>
wrote:

> Hi Thomas,
>
> I completely understand and agree with your assessment and it is a concern
> that I raised several years ago. The Log4J model used creates huge
> performance regressions for many of the applications I’ve run across. My
> fear the the same issues would be seen in UL. Your case appears to confirm
> this.
>
>
Well, I still may be wrong, and even if I am not, it may be possible to
improve this. I like UL, it is very useful. Would regret having to limit
myself in using it.

Kind Regardd, Thomas


> Kind regards,
> Kirk Pepperdine
>
> On Feb 8, 2018, at 3:26 PM, Thomas Stüfe <thomas.stuefe at gmail.com> wrote:
>
> Hi Kirk,
>
> On Thu, Feb 8, 2018 at 3:12 PM, Kirk Pepperdine <kirk.pepperdine at gmail.com
> > wrote:
>
>> Hi Thomas,
>>
>> Our experience with logging is that frequency really really matters. Any
>> efficient algorithm can become a problem if it’s called way too frequently.
>> At issue with UL is that the message sizes tend to be very small (very high
>> noise for the amount of signal) resulting in a very noticeable jump in
>> logging frequency.
>
>
> My point was that I have _untriggered_ logging points in the VM (so,
> logging is switched off) causing performance degradation. Logging=on does
> cost performance, sure. But I would expect logging=off to ideally cost
> nothing.
>
> Which is obviously not true, because the at each logging point the VM
> needs to check if logging is enabled, but still, that check should be
> really really cheap.
>
> Otherwise, one now would have to be careful where to place log points, and
> avoid placing them inside loops, for example.
>
> Kind Regards, Thomas
>
>
> In this case you might look at this as a frequency issue rather than the
>> cost of a single action. I believe your findings confirm this.
>>
>> Kind regards,
>> Kirk Pepperdine
>>
>>
>> > On Feb 8, 2018, at 2:42 PM, Marcus Larsson <marcus.larsson at oracle.com>
>> wrote:
>> >
>> > Hi Thomas,
>> >
>> >
>> > On 2018-02-07 17:23, Thomas Stüfe wrote:
>> >> Hi guys,
>> >>
>> >> short question, when creating UL, did anyone measure what an
>> untriggered
>> >> instrumentation point does cost?
>> >
>> > No, AFAIK no such detailed measurements were done.
>> > Although, no regressions were found with logging disabled (or even
>> enabled) on the larger benchmark suites.
>> >
>> >>
>> >> I was the other day comparing performance of a patch of mine with the
>> >> unpatched jdk. To my dismay my patched version was consistently ~5-10%
>> >> slower. But I had logging points in that function, for example:
>> >>
>> >> 1865 log_trace(gc, metaspace, freelist)("Coalescation (%s): created
>> >> coalesced chunk at %p, size " SIZE_FORMAT_HEX ".",
>> >> 1866                (is_class() ? "class space" : "metaspace"),
>> >> 1867                 p_new_chunk, p_new_chunk->word_size() *
>> >> sizeof(MetaWord));
>> >>
>> >> The function did run ~3000 times during my test - so not an indecent
>> amount
>> >> of times, but something which one might conceivably want to do.
>> Logging was
>> >> switched off. I found that commenting out this log line would make the
>> >> difference to the unpatched jdk disappear (runtime went down from
>> ~90ms to
>> >> ~80ms).
>> >
>> > I'm assuming this was on a product/fastdebug build? It sounds a bit
>> much to me, but I've never really measured the overhead on that level, so I
>> can't really say if it is expected or not.
>> >
>> >>
>> >> Of course I am happy that my patch is performing well after all, but I
>> am
>> >> surprised too. I would have naively assumed that tracing not switched
>> on
>> >> costs next to nothing.
>> >>
>> >> I am aware that one needs to access some global information in order to
>> >> find out the current tracing level, but maybe there is still room for
>> >> improvement?
>> >>
>> >> In our port we have a very old and simple tracing system which is
>> triggered
>> >> by a simple global bit array. Accessing that is quite cheap, at least I
>> >> never noticed any negative effects like described above.
>> >>
>> >> Kind Regards, Thomas
>> >
>> > Right now, an is_level() check boils down to a single compare
>> instruction followed by a jump (at least on my workstation (Linux x64)).
>> Different tag sets use different addresses, but otherwise it seems pretty
>> optimal to me. A bitmap or something like a single global word with on/off
>> status for multiple tag sets would improve cache locality, but it seems
>> overly complicated IMHO.
>> >
>> > Thanks,
>> > Marcus
>>
>>
>
>


More information about the hotspot-runtime-dev mailing list