RFR: 8185525: [Event Request] Add Tracing event for DictionarySizes

Wed Apr 10 17:06:17 UTC 2019

Thank you Erik for more feedback.

New webrev:  http://cr.openjdk.java.net/~gziemski/8185525_rev5

On 4/9/19 3:50 PM, Erik Gahlin wrote:
> Thanks Gerard,
>
> In metadata.xml (and possible elsewhere) can you change the fields
>
> "varianceOfBucketCount" to "bucketCountVariance"
> "stdDevOfBucketCount" to "bucketCountStandardDeviation"

I changed those, but I also changed:

"maximumBucketCount" to "bucketCountMaximum"
"averageBucketCount" to "bucketCountAverage"

to be fully consistent.

>
> I noticed that events are only emitted if we are able to take the 
> resize lock. Can this be fixed? What prevents us from always getting 
> the data? That's how other periodic events work and losing data 
> sometimes may lead to subtle bugs that hard to understand and 
> replicate in systems that rely on the information. Could we retry on a 
> failure?
Good observation. If the resize lock is taken, then it's not likely that 
whoever owns it will be done soon, so retrying is most likely not going 
to succeed right away. Is it OK to tie up JFR periodic thread for some 
time? If so, how long?

If the lock is taken, then it means that someone is scanning through the 
entire table, or the table is being resized. Either way, we're not 
loosing data, but are just temporarily blind - I don't see a problem 
here for a long running apps, they will start receiving events 
eventually (which happen every 10 sec by default)

>
> If it is very problematic to fix, it may be OK to skip the events, but 
> then tests would need to be updated to take that into account 
> (retrying). Otherwise we may get intermittent failures.
At the startup of our jtreg JFR test, no one, besides us, should take 
the lock, so if we don't get the event, because someone else is holding 
it (too small system hash table that gets resized up immediately after 
VM starts up), we probably would want to know about it, so a failure 
here might be in fact welcome.

cheers

>
> Thanks
> Erik
>
>> hi Erik,
>>
>>
>> On 4/3/19 12:44 PM, Erik Gahlin wrote:
>>> Hi Gerard,
>>>
>>> Here are some comments about the metadata (to make it consistent 
>>> with other events).
>>>
>>> The events should not be in the "Java Application" category since 
>>> they are JVM events. You could perhaps put them in "Java Virtual 
>>> Machine, Runtime, Tables". Some comments about the names and labels 
>>> of fields.
>>>
>>> - Label: Number of buckets => Bucket Count
>>> - Label: Number of entries => Entry Count
>>> - Label: Total footprint => Total Footprint
>>>
>>> Could you remove descriptions that are exactly the same as the label.
>>>
>>> - Label: Maximum bucket size => Maximum Bucket Size
>>> - Label: Average bucket size => Average Bucket Size
>>> - Label: Variance of bucket  size => Bucket Size Variance
>>> - Name: stdDevOfBucketSize => bucketSizeStandardDeviation
>>> - Label: Standard deviation of bucket size => Bucket Size Standard 
>>> Deviation"
>>>
>>> Instead of using the word "size", it may make more sense to use the 
>>> word "count" here as well, i.e "Average Bucket Count", or maybe I'm 
>>> missing something? Is there a difference?
>>>
>>> I wonder how useful standard deviation and variance is? If support 
>>> engineers are looking at a recording, or JMC adds a rule for the 
>>> events, what would a good or bad value be? Is it possible to use the 
>>> information for troubleshooting?
>>
>> While I'm working on all the above changes you suggested, we can 
>> discuss the standard devation and variance.
>>
>> I added them because they are part of the jcmd "VM.symboltable 
>> -verbose" command, so we are consistent.
>>
>> Now, regarding how useful they are, I always understood them as a 
>> sign of imbalanced table distribution, and without a proper 
>> histogram, this is the best description of the histogram shape. In 
>> reality, however, I think that if they identify an issue, then we 
>> might have a very curious distribution (some sort of hash table 
>> attack), or we have an issue with our hash function for the 
>> particular usage case.
>>
>> Still, I'd personally elect to keep them.
>>
>> Let me ask you a different question though, Is it expensive to have 2 
>> doubles as part of an event (5 events per second)? And if so, is 
>> there currently (or planned) granularity for controlling not just 
>> which events to record, but also which attributes?
>>
>>>
>>> - Name: addRate => insertionRate
>>> - Label: Rate of addition =>  Insertation Rate
>>> - Name: removeRate => removalRate
>>> - Label: Rate of removal => Removal Rate
>>
>> Will do.
>>
>>>
>>> I'm missing unit tests for the events. Could you please add in 
>>> /test/jdk/jdk/jfr/event/runtime. They can be sanity tests. i.e the 
>>> average not exceeding max, no negative values etc.
>>
>> Working on it, do we need separate test per each event (table), or 
>> just one table will suffice (ex. StringTable)?
>>
>> Thank you for the feedback!
>>
>>
>> cheers
>>>
>>> Thanks!
>>> Erik
>>>
>>>> Hi all,
>>>>
>>>> Please review this feature, which adds tracing events for the 
>>>> internal hash tables.
>>>>
>>>> The following attributes are implemented:
>>>>
>>>> <Field type="ulong" name="numberOfBuckets" label="Number of 
>>>> buckets" description="Number of buckets" />
>>>> <Field type="ulong" name="numberOfEntries" label="Number of 
>>>> entries" description="Number of all entries" />
>>>> <Field type="ulong" contentType="bytes" name="totalFootprint" 
>>>> label="Total footprint" description="Total memory footprint (the 
>>>> table itself plus all of the entries)" />
>>>> <Field type="ulong" name="maximumBucketSize" label="Maximum bucket 
>>>> size" description="The maximum bucket length (entries in a single 
>>>> bucket)" />
>>>> <Field type="double" name="averageBucketSize" label="Average bucket 
>>>> size" description="The average bucket length (entries in a bucket)" 
>>>> /> <Field type="double" name="varianceOfBucketSize" label="Variance 
>>>> of bucket sizes" description="How far bucket lengths are spread out 
>>>> from their average value" />
>>>> <Field type="double" name="stdDevOfBucketSize" label="Standard 
>>>> deviation of bucket sizes" description="How far bucket lengths are 
>>>> spread out from their mean (expected) value" />
>>>> <Field type="double" name="addRate" label="Rate of addition" 
>>>> description="How many items were added since last event (per 
>>>> second)" />
>>>> <Field type="double" name="removeRate" label="Rate of removal" 
>>>> description="How many items were removed since last event (per 
>>>> second)" />
>>>>
>>>> This event was implemented for the following system tables:
>>>>
>>>> SymbolTable
>>>> StringTable
>>>> Placeholder Table
>>>> LoaderConstraints Table
>>>> ProtectionDomainCache Table
>>>>
>>>> Webrev:  http://cr.openjdk.java.net/~gziemski/8185525_rev1/
>>>> Bug:     https://bugs.openjdk.java.net/browse/JDK-8185525
>>>> Testing: Mach5 tier1,2,3 (another Mach5 tier1,2,3,4,5,6,7 in 
>>>> progress…)
>>>>
>>>>
>>>> Cheers
>>>>
>>>
>>>
>>
>
>