RFR: 8185525: [Event Request] Add Tracing event for DictionarySizes

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Wed Apr 10 18:12:46 UTC 2019



On 4/10/19 1:06 PM, gerard ziemski wrote:
> Thank you Erik for more feedback.
>
> New webrev:  http://cr.openjdk.java.net/~gziemski/8185525_rev5
>
>
> On 4/9/19 3:50 PM, Erik Gahlin wrote:
>> Thanks Gerard,
>>
>> In metadata.xml (and possible elsewhere) can you change the fields
>>
>> "varianceOfBucketCount" to "bucketCountVariance"
>> "stdDevOfBucketCount" to "bucketCountStandardDeviation"
>
> I changed those, but I also changed:
>
> "maximumBucketCount" to "bucketCountMaximum"
> "averageBucketCount" to "bucketCountAverage"
>
> to be fully consistent.
>
>
>>
>> I noticed that events are only emitted if we are able to take the 
>> resize lock. Can this be fixed? What prevents us from always getting 
>> the data? That's how other periodic events work and losing data 
>> sometimes may lead to subtle bugs that hard to understand and 
>> replicate in systems that rely on the information. Could we retry on 
>> a failure?
> Good observation. If the resize lock is taken, then it's not likely 
> that whoever owns it will be done soon, so retrying is most likely not 
> going to succeed right away. Is it OK to tie up JFR periodic thread 
> for some time? If so, how long?
>
> If the lock is taken, then it means that someone is scanning through 
> the entire table, or the table is being resized. Either way, we're not 
> loosing data, but are just temporarily blind - I don't see a problem 
> here for a long running apps, they will start receiving events 
> eventually (which happen every 10 sec by default)

Robbin was talking about allowing scanning the table while resizing, ie. 
not having the resize_lock, if we can accept that there might be some 
entries double counted.

Coleen
>
>>
>> If it is very problematic to fix, it may be OK to skip the events, 
>> but then tests would need to be updated to take that into account 
>> (retrying). Otherwise we may get intermittent failures.
> At the startup of our jtreg JFR test, no one, besides us, should take 
> the lock, so if we don't get the event, because someone else is 
> holding it (too small system hash table that gets resized up 
> immediately after VM starts up), we probably would want to know about 
> it, so a failure here might be in fact welcome.
>
>
> cheers
>
>
>
>>
>> Thanks
>> Erik
>>
>>> hi Erik,
>>>
>>>
>>> On 4/3/19 12:44 PM, Erik Gahlin wrote:
>>>> Hi Gerard,
>>>>
>>>> Here are some comments about the metadata (to make it consistent 
>>>> with other events).
>>>>
>>>> The events should not be in the "Java Application" category since 
>>>> they are JVM events. You could perhaps put them in "Java Virtual 
>>>> Machine, Runtime, Tables". Some comments about the names and labels 
>>>> of fields.
>>>>
>>>> - Label: Number of buckets => Bucket Count
>>>> - Label: Number of entries => Entry Count
>>>> - Label: Total footprint => Total Footprint
>>>>
>>>> Could you remove descriptions that are exactly the same as the label.
>>>>
>>>> - Label: Maximum bucket size => Maximum Bucket Size
>>>> - Label: Average bucket size => Average Bucket Size
>>>> - Label: Variance of bucket  size => Bucket Size Variance
>>>> - Name: stdDevOfBucketSize => bucketSizeStandardDeviation
>>>> - Label: Standard deviation of bucket size => Bucket Size Standard 
>>>> Deviation"
>>>>
>>>> Instead of using the word "size", it may make more sense to use the 
>>>> word "count" here as well, i.e "Average Bucket Count", or maybe I'm 
>>>> missing something? Is there a difference?
>>>>
>>>> I wonder how useful standard deviation and variance is? If support 
>>>> engineers are looking at a recording, or JMC adds a rule for the 
>>>> events, what would a good or bad value be? Is it possible to use 
>>>> the information for troubleshooting?
>>>
>>> While I'm working on all the above changes you suggested, we can 
>>> discuss the standard devation and variance.
>>>
>>> I added them because they are part of the jcmd "VM.symboltable 
>>> -verbose" command, so we are consistent.
>>>
>>> Now, regarding how useful they are, I always understood them as a 
>>> sign of imbalanced table distribution, and without a proper 
>>> histogram, this is the best description of the histogram shape. In 
>>> reality, however, I think that if they identify an issue, then we 
>>> might have a very curious distribution (some sort of hash table 
>>> attack), or we have an issue with our hash function for the 
>>> particular usage case.
>>>
>>> Still, I'd personally elect to keep them.
>>>
>>> Let me ask you a different question though, Is it expensive to have 
>>> 2 doubles as part of an event (5 events per second)? And if so, is 
>>> there currently (or planned) granularity for controlling not just 
>>> which events to record, but also which attributes?
>>>
>>>>
>>>> - Name: addRate => insertionRate
>>>> - Label: Rate of addition =>  Insertation Rate
>>>> - Name: removeRate => removalRate
>>>> - Label: Rate of removal => Removal Rate
>>>
>>> Will do.
>>>
>>>>
>>>> I'm missing unit tests for the events. Could you please add in 
>>>> /test/jdk/jdk/jfr/event/runtime. They can be sanity tests. i.e the 
>>>> average not exceeding max, no negative values etc.
>>>
>>> Working on it, do we need separate test per each event (table), or 
>>> just one table will suffice (ex. StringTable)?
>>>
>>> Thank you for the feedback!
>>>
>>>
>>> cheers
>>>>
>>>> Thanks!
>>>> Erik
>>>>
>>>>> Hi all,
>>>>>
>>>>> Please review this feature, which adds tracing events for the 
>>>>> internal hash tables.
>>>>>
>>>>> The following attributes are implemented:
>>>>>
>>>>> <Field type="ulong" name="numberOfBuckets" label="Number of 
>>>>> buckets" description="Number of buckets" />
>>>>> <Field type="ulong" name="numberOfEntries" label="Number of 
>>>>> entries" description="Number of all entries" />
>>>>> <Field type="ulong" contentType="bytes" name="totalFootprint" 
>>>>> label="Total footprint" description="Total memory footprint (the 
>>>>> table itself plus all of the entries)" />
>>>>> <Field type="ulong" name="maximumBucketSize" label="Maximum bucket 
>>>>> size" description="The maximum bucket length (entries in a single 
>>>>> bucket)" />
>>>>> <Field type="double" name="averageBucketSize" label="Average 
>>>>> bucket size" description="The average bucket length (entries in a 
>>>>> bucket)" /> <Field type="double" name="varianceOfBucketSize" 
>>>>> label="Variance of bucket sizes" description="How far bucket 
>>>>> lengths are spread out from their average value" />
>>>>> <Field type="double" name="stdDevOfBucketSize" label="Standard 
>>>>> deviation of bucket sizes" description="How far bucket lengths are 
>>>>> spread out from their mean (expected) value" />
>>>>> <Field type="double" name="addRate" label="Rate of addition" 
>>>>> description="How many items were added since last event (per 
>>>>> second)" />
>>>>> <Field type="double" name="removeRate" label="Rate of removal" 
>>>>> description="How many items were removed since last event (per 
>>>>> second)" />
>>>>>
>>>>> This event was implemented for the following system tables:
>>>>>
>>>>> SymbolTable
>>>>> StringTable
>>>>> Placeholder Table
>>>>> LoaderConstraints Table
>>>>> ProtectionDomainCache Table
>>>>>
>>>>> Webrev: http://cr.openjdk.java.net/~gziemski/8185525_rev1/
>>>>> Bug:     https://bugs.openjdk.java.net/browse/JDK-8185525
>>>>> Testing: Mach5 tier1,2,3 (another Mach5 tier1,2,3,4,5,6,7 in 
>>>>> progress…)
>>>>>
>>>>>
>>>>> Cheers
>>>>>
>>>>
>>>>
>>>
>>
>>
>



More information about the hotspot-runtime-dev mailing list