RFR 8242263: Diagnose synchronization on primitive wrappers

Mon Aug 10 20:03:16 UTC 2020

Hi Erik,

Thanks for looking into this and for the offline discussions.

On 8/10/20 10:19 AM, Erik Gahlin wrote:
>
> Hi Patricio,
>
> I have tried to review the JFR changes, but I need more information on 
> how the feature is going to be used.
>
> Is this something temporary that will be used gain insight for the 
> Valhalla project, or do you think this will be needed longer term? If 
> it is longer term, the bar is higher on what can be accepted.
>
> If this is a temporary feature, we could mark the event as 
> experimental in metadata.xml, similar to what we did with GC events 
> during the development of ZGC. Events that are experimental do not 
> show up by default in visualization tools such as JMC and can be 
> removed when they are not needed, or we have a better solution.
>
Right, it's temporary so we can mark it as experimental. I also changed 
the category name to be "Java Virtual Machine, Diagnostics".

> For events to be enabled in default.jfc, they should not cause more 
> than 1% overhead, not even in pathological cases. To me, it seems this 
> could happen if you make a loop where it is triggered all the time.
>
> For events in profile.jfc, the overhead should still be low (1-2%), 
> but the target is the typical application. For example, allocation 
> profiling is only enabled in profile.jfc. Some application that are 
> allocation intensive could cause a higher overhead than 1%, but that 
> is OK because that configuration is only to be used for a short period 
> of time. That said, there is still a budget on how much space in a 
> recording an event can take up, so it also depends on how important 
> the information is for the user. To me, it seems this event will not 
> be that important for the average user, which makes me think it should 
> be disabled by default.
>
After our off-list discussion I left it as enabled. I also removed the 
threshold setting and added the startTime=false in metadata.xml.

> To complicate things, I noticed that aa command line flag is also used 
> to enable the event. This is something we have worked hard to avoid 
> when it comes to JFR events. All configuration should happen using 
> configuration files. We had problem with this in the past where users 
> gets confused why their events are not enabled. When we ported JFR to 
> Hotspot, we got rid of those of those flags/events and it is a much 
> better situation. Again, if this a temporary event to discover usage 
> pattern for a release or two, it might be OK, but if we believe this 
> event to stick around, I think we should look into alternatives, or 
> not use JFR at all for this.
>
Yes, I see your point. Since we want to diagnose synchronization on 
these classes but we don't want to affect performance the flag has to be 
specified at startup otherwise the diagnostic instrumentation is 
disabled, so I don't see a clear way to enable the events from JFR 
alone. In any case, this will be a temporary event.

Here is v5. Let me know if you are okay with the changes.

Inc: http://cr.openjdk.java.net/~pchilanomate/8242263/v5/inc/webrev/
Full: http://cr.openjdk.java.net/~pchilanomate/8242263/v5/webrev/

Thanks Erik!

Patricio
>
> Thanks
>
> Erik
>
>> On 28 Jul 2020, at 21:16, Patricio Chilano 
>> <patricio.chilano.mateo at oracle.com 
>> <mailto:patricio.chilano.mateo at oracle.com>> wrote:
>>
>> Hi all,
>>
>> Please review the following change that adds diagnostic capabilities 
>> when synchronizing on primitive wrapper classes.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8242263
>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8242263/v1/webrev/
>>
>> The new flag allows to identify synchronization on these classes and 
>> to take one of the following actions: exit the VM with fatal error, 
>> log a warning message, or issue a JFR event. The implementation uses 
>> a simple approach where a check is added at the beginning of the 
>> monitorenter generated code when the flag is enabled to check whether 
>> the object is of a primitive wrapper class. If it is, we jump to the 
>> slow path, otherwise we just continue as always. The extra 
>> instructions will be: load the klass of the object, load the 
>> _access_flags field for that klass, AND with a constant, and branch 
>> based on the result. The code will only be generated whenever the new 
>> opt-in diagnostic flag is enabled so performance won't be affected 
>> when off.
>>
>> In addition to the purpose described in the description of the bug, 
>> this flag will also be useful when trying to diagnose possible 
>> synchronization issues if these classes ever become inline types as 
>> part of the Valhalla project.
>>
>> I added test SyncOnPrimitiveWrapperTest.java that tests for the exit 
>> and logging cases. I added test TestSyncOnPrimitiveWrapperEvent.java 
>> to test for the JFR event case.
>> I tested the patch running tiers1-6 in mach5 with the flag set to 
>> DiagnoseSyncOnPrimitiveWrappers=2.
>> I checked it builds with arm32 and ppc but can't run any tests on 
>> those platforms, so it would be good if somebody can run the new test 
>> included in the patch.
>>
>> Let me know if you think I should run or add any more tests.
>>
>> Thanks!
>>
>> Patricio
>