RFR: 8185525: [Event Request] Add Tracing event for DictionarySizes

Wed Apr 17 03:05:51 UTC 2019

On 2019-04-12 22:13, gerard ziemski wrote:
>
>
> On 4/12/19 12:45 PM, Erik Gahlin wrote:
>> On 2019-04-10 22:03, gerard ziemski wrote:
>>>
>>>
>>> On 4/10/19 1:12 PM, coleen.phillimore at oracle.com wrote:
>>>>>>
>>>>>> I noticed that events are only emitted if we are able to take the 
>>>>>> resize lock. Can this be fixed? What prevents us from always 
>>>>>> getting the data? That's how other periodic events work and 
>>>>>> losing data sometimes may lead to subtle bugs that hard to 
>>>>>> understand and replicate in systems that rely on the information. 
>>>>>> Could we retry on a failure?
>>>>> Good observation. If the resize lock is taken, then it's not 
>>>>> likely that whoever owns it will be done soon, so retrying is most 
>>>>> likely not going to succeed right away. Is it OK to tie up JFR 
>>>>> periodic thread for some time? If so, how long? 
>> There is no general upper limit for periodic events.
>>
>> If we need to wait for a safepoint, we need to do it. That said, 
>> events that can induce significant latencies or CPU overhead (even in 
>> pathological cases) are off in default.jfc and only enabled in 
>> profile.jfr, or not at all.
>>
>> As I understand it, the events themselves don't cause latencies and 
>> the tables are not expanded that often, so I think it would be okay 
>> to emit them.  If you think otherwise, I would try to scan 
>> concurrently, even if it means we are slightly off.
>>
>>>>>
>>>>>
>>>>> If the lock is taken, then it means that someone is scanning 
>>>>> through the entire table, or the table is being resized. Either 
>>>>> way, we're not loosing data, but are just temporarily blind - I 
>>>>> don't see a problem here for a long running apps, they will start 
>>>>> receiving events eventually (which happen every 10 sec by default)
>> A user can set period "everyChunk" which means events are guaranteed 
>> to be in the recording.
>>
>> I think we should try to avoid breaking that contract. When event 
>> streaming is in place, we can implement requestable events where a 
>> user can demand an event programmatically from Java. If they 
>> sometimes don't get an event, it will break their code in a subtle way.
>
> No problem, I removed the resize_lock around the JFR table statistics, 
> so we might get a slightly incorrect stats every now and then, but we 
> will be emitting the events on schedule: 
> http://cr.openjdk.java.net/~gziemski/8185525_rev7
Is it sufficient to just remove the lock to make it "work"?

I think it could be OK to use stale data, or perhaps count a value 
twice, but are there other issues that needs to be fixed as well? Robbin 
may have more information on this.

An alternative approach would be to use the last known data, if we are 
not able to take the lock. It would be old, but not out of whack.

That said, it would be interesting to have some numbers on what the cost 
would be to wait for the lock.

>
> Last question: what is the recommended way to programatically tell if 
> JFR is ON? I'm wondering whether I should collect the add/remove rates 
> for the tables only if JRF is ON. As it is right now, we collect them 
> always. It's just an atomic increment, but still, it's work only JFR 
> events need.

You can use the JFR_ONLY macro, if it's not built with JFR. If you want 
to check if a recording is running, you can use Jfr::is_recording(), but 
perhaps Jfr::is_enabled() is more accurate/correct if a recording is 
started/stopped repeatedly?

I looked at jfrPeriodic.cpp, and it seems to me that things could be 
simplified, i.e.

template<typename T>
static void emit_table_statistics(TableStatistics& statistics) {
    T event;
    event.set_bucketCount(statistics._number_of_buckets);
    ...
    event.commit();
}

Thanks
Erik