RFR 8242263: Diagnose synchronization on primitive wrappers

Mon Aug 10 23:25:01 UTC 2020

Hi Dan,

On 8/10/20 5:39 PM, Daniel D. Daugherty wrote:
> > Inc: http://cr.openjdk.java.net/~pchilanomate/8242263/v5/inc/webrev/
>
> src/hotspot/share/jfr/metadata/metadata.xml
>     No comments.
>
> src/hotspot/share/runtime/synchronizer.cpp
>     L577:     char* newline = (char*)strchr(ss.base(), '\n');
>     L578:     if (newline != NULL) {
>     L579:       *newline = '\0';
>     L580:     }
>     L581:     fatal("Synchronizing on object " INTPTR_FORMAT " of 
> klass %s %s", p2i(obj()), obj->klass()->external_name(), base);
>         Does this work as expected on Win* where <CR><LF> is used? I 
> don't
>         remember where we convert <LF> => <CR><LF> on Win*...
Yes, I tested it in Windows and the output is the same as in Linux and 
MacOS.
I actually forgot about the '\r\n' vs '\n' difference so after you 
noticed it, I thought about it and I was in fact expecting the output to 
be slightly different. I thought the \r would make the cursor go back to 
the beginning of the line, so that the next line would be written there 
(which is just "#\r\n"), so effectively we would lose the separating 
line between the "fatal error: ... " line and the "JRE version: ..." 
line. Not a big deal. But the output is actually the same as in Linux. 
Looking at ostream.cpp I see '\n' hardcoded when adding or checking for 
new lines. I don't find references to '\r' in other files that would 
indicate they are used in stringStream (or any other outputStream object 
for that matter). The only references to os::line_separator() are in 
vmError.cpp. So it looks like any newlines in stringStream are in fact 
encoded using '\n'.

> src/jdk.jfr/share/conf/jfr/default.jfc
>     No comments.
>
> src/jdk.jfr/share/conf/jfr/profile.jfc
>     No comments.
>
> test/hotspot/jtreg/runtime/Monitor/SyncOnPrimitiveWrapperTest.java
>     No comments.
>
> Thumbs up. No need to see a new webrev if you end up making changes
> for Win* above.
Great, thanks for reviewing this Dan!

Patricio
> Dan
>
>
> On 8/10/20 4:03 PM, Patricio Chilano wrote:
>> Hi Erik,
>>
>> Thanks for looking into this and for the offline discussions.
>>
>> On 8/10/20 10:19 AM, Erik Gahlin wrote:
>>>
>>> Hi Patricio,
>>>
>>> I have tried to review the JFR changes, but I need more information 
>>> on how the feature is going to be used.
>>>
>>> Is this something temporary that will be used gain insight for the 
>>> Valhalla project, or do you think this will be needed longer term? 
>>> If it is longer term, the bar is higher on what can be accepted.
>>>
>>> If this is a temporary feature, we could mark the event as 
>>> experimental in metadata.xml, similar to what we did with GC events 
>>> during the development of ZGC. Events that are experimental do not 
>>> show up by default in visualization tools such as JMC and can be 
>>> removed when they are not needed, or we have a better solution.
>>>
>> Right, it's temporary so we can mark it as experimental. I also 
>> changed the category name to be "Java Virtual Machine, Diagnostics".
>>
>>> For events to be enabled in default.jfc, they should not cause more 
>>> than 1% overhead, not even in pathological cases. To me, it seems 
>>> this could happen if you make a loop where it is triggered all the 
>>> time.
>>>
>>> For events in profile.jfc, the overhead should still be low (1-2%), 
>>> but the target is the typical application. For example, allocation 
>>> profiling is only enabled in profile.jfc. Some application that are 
>>> allocation intensive could cause a higher overhead than 1%, but that 
>>> is OK because that configuration is only to be used for a short 
>>> period of time. That said, there is still a budget on how much space 
>>> in a recording an event can take up, so it also depends on how 
>>> important the information is for the user. To me, it seems this 
>>> event will not be that important for the average user, which makes 
>>> me think it should be disabled by default.
>>>
>> After our off-list discussion I left it as enabled. I also removed 
>> the threshold setting and added the startTime=false in metadata.xml.
>>
>>> To complicate things, I noticed that aa command line flag is also 
>>> used to enable the event. This is something we have worked hard to 
>>> avoid when it comes to JFR events. All configuration should happen 
>>> using configuration files. We had problem with this in the past 
>>> where users gets confused why their events are not enabled. When we 
>>> ported JFR to Hotspot, we got rid of those of those flags/events and 
>>> it is a much better situation. Again, if this a temporary event to 
>>> discover usage pattern for a release or two, it might be OK, but if 
>>> we believe this event to stick around, I think we should look into 
>>> alternatives, or not use JFR at all for this.
>>>
>> Yes, I see your point. Since we want to diagnose synchronization on 
>> these classes but we don't want to affect performance the flag has to 
>> be specified at startup otherwise the diagnostic instrumentation is 
>> disabled, so I don't see a clear way to enable the events from JFR 
>> alone. In any case, this will be a temporary event.
>>
>> Here is v5. Let me know if you are okay with the changes.
>>
>> Inc: http://cr.openjdk.java.net/~pchilanomate/8242263/v5/inc/webrev/
>> Full: http://cr.openjdk.java.net/~pchilanomate/8242263/v5/webrev/
>>
>> Thanks Erik!
>>
>> Patricio
>>>
>>> Thanks
>>>
>>> Erik
>>>
>>>> On 28 Jul 2020, at 21:16, Patricio Chilano 
>>>> <patricio.chilano.mateo at oracle.com 
>>>> <mailto:patricio.chilano.mateo at oracle.com>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Please review the following change that adds diagnostic 
>>>> capabilities when synchronizing on primitive wrapper classes.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8242263
>>>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8242263/v1/webrev/
>>>>
>>>> The new flag allows to identify synchronization on these classes 
>>>> and to take one of the following actions: exit the VM with fatal 
>>>> error, log a warning message, or issue a JFR event. The 
>>>> implementation uses a simple approach where a check is added at the 
>>>> beginning of the monitorenter generated code when the flag is 
>>>> enabled to check whether the object is of a primitive wrapper 
>>>> class. If it is, we jump to the slow path, otherwise we just 
>>>> continue as always. The extra instructions will be: load the klass 
>>>> of the object, load the _access_flags field for that klass, AND 
>>>> with a constant, and branch based on the result. The code will only 
>>>> be generated whenever the new opt-in diagnostic flag is enabled so 
>>>> performance won't be affected when off.
>>>>
>>>> In addition to the purpose described in the description of the bug, 
>>>> this flag will also be useful when trying to diagnose possible 
>>>> synchronization issues if these classes ever become inline types as 
>>>> part of the Valhalla project.
>>>>
>>>> I added test SyncOnPrimitiveWrapperTest.java that tests for the 
>>>> exit and logging cases. I added test 
>>>> TestSyncOnPrimitiveWrapperEvent.java to test for the JFR event case.
>>>> I tested the patch running tiers1-6 in mach5 with the flag set to 
>>>> DiagnoseSyncOnPrimitiveWrappers=2.
>>>> I checked it builds with arm32 and ppc but can't run any tests on 
>>>> those platforms, so it would be good if somebody can run the new 
>>>> test included in the patch.
>>>>
>>>> Let me know if you think I should run or add any more tests.
>>>>
>>>> Thanks!
>>>>
>>>> Patricio
>>>
>>
>