Performance costs of untriggered UL log points?
Marcus Larsson
marcus.larsson at oracle.com
Thu Feb 8 13:42:56 UTC 2018
Hi Thomas,
On 2018-02-07 17:23, Thomas Stüfe wrote:
> Hi guys,
>
> short question, when creating UL, did anyone measure what an untriggered
> instrumentation point does cost?
No, AFAIK no such detailed measurements were done.
Although, no regressions were found with logging disabled (or even
enabled) on the larger benchmark suites.
>
> I was the other day comparing performance of a patch of mine with the
> unpatched jdk. To my dismay my patched version was consistently ~5-10%
> slower. But I had logging points in that function, for example:
>
> 1865 log_trace(gc, metaspace, freelist)("Coalescation (%s): created
> coalesced chunk at %p, size " SIZE_FORMAT_HEX ".",
> 1866 (is_class() ? "class space" : "metaspace"),
> 1867 p_new_chunk, p_new_chunk->word_size() *
> sizeof(MetaWord));
>
> The function did run ~3000 times during my test - so not an indecent amount
> of times, but something which one might conceivably want to do. Logging was
> switched off. I found that commenting out this log line would make the
> difference to the unpatched jdk disappear (runtime went down from ~90ms to
> ~80ms).
I'm assuming this was on a product/fastdebug build? It sounds a bit much
to me, but I've never really measured the overhead on that level, so I
can't really say if it is expected or not.
>
> Of course I am happy that my patch is performing well after all, but I am
> surprised too. I would have naively assumed that tracing not switched on
> costs next to nothing.
>
> I am aware that one needs to access some global information in order to
> find out the current tracing level, but maybe there is still room for
> improvement?
>
> In our port we have a very old and simple tracing system which is triggered
> by a simple global bit array. Accessing that is quite cheap, at least I
> never noticed any negative effects like described above.
>
> Kind Regards, Thomas
Right now, an is_level() check boils down to a single compare
instruction followed by a jump (at least on my workstation (Linux x64)).
Different tag sets use different addresses, but otherwise it seems
pretty optimal to me. A bitmap or something like a single global word
with on/off status for multiple tag sets would improve cache locality,
but it seems overly complicated IMHO.
Thanks,
Marcus
More information about the hotspot-runtime-dev
mailing list