Performance costs of untriggered UL log points?

Thu Feb 8 13:54:16 UTC 2018

Hi Marcus,

On Thu, Feb 8, 2018 at 2:42 PM, Marcus Larsson <marcus.larsson at oracle.com>
wrote:

> Hi Thomas,
>
>
> On 2018-02-07 17:23, Thomas Stüfe wrote:
>
>> Hi guys,
>>
>> short question, when creating UL, did anyone measure what an untriggered
>> instrumentation point does cost?
>>
>
> No, AFAIK no such detailed measurements were done.
> Although, no regressions were found with logging disabled (or even
> enabled) on the larger benchmark suites.
>
>
>> I was the other day comparing performance of a patch of mine with the
>> unpatched jdk. To my dismay my patched version was consistently ~5-10%
>> slower. But I had logging points in that function, for example:
>>
>> 1865 log_trace(gc, metaspace, freelist)("Coalescation (%s): created
>> coalesced chunk at %p, size " SIZE_FORMAT_HEX ".",
>> 1866                (is_class() ? "class space" : "metaspace"),
>> 1867                 p_new_chunk, p_new_chunk->word_size() *
>> sizeof(MetaWord));
>>
>> The function did run ~3000 times during my test - so not an indecent
>> amount
>> of times, but something which one might conceivably want to do. Logging
>> was
>> switched off. I found that commenting out this log line would make the
>> difference to the unpatched jdk disappear (runtime went down from ~90ms to
>> ~80ms).
>>
>
> I'm assuming this was on a product/fastdebug build? It sounds a bit much
> to me, but I've never really measured the overhead on that level, so I
> can't really say if it is expected or not.
>
>
Yes, this was a release build under linux x64.

>
>> Of course I am happy that my patch is performing well after all, but I am
>> surprised too. I would have naively assumed that tracing not switched on
>> costs next to nothing.
>>
>> I am aware that one needs to access some global information in order to
>> find out the current tracing level, but maybe there is still room for
>> improvement?
>>
>> In our port we have a very old and simple tracing system which is
>> triggered
>> by a simple global bit array. Accessing that is quite cheap, at least I
>> never noticed any negative effects like described above.
>>
>> Kind Regards, Thomas
>>
>
> Right now, an is_level() check boils down to a single compare instruction
> followed by a jump (at least on my workstation (Linux x64)). Different tag
> sets use different addresses, but otherwise it seems pretty optimal to me.
> A bitmap or something like a single global word with on/off status for
> multiple tag sets would improve cache locality, but it seems overly
> complicated IMHO.
>
>
Ok.. maybe something went wrong. I'll take a look at the generated code.

Thanks, Thomas

> Thanks,
> Marcus
>