RFR 8242263: Diagnose synchronization on primitive wrappers

Daniel D. Daugherty daniel.daugherty at oracle.com
Wed Aug 5 21:47:35 UTC 2020


I'm peeking ahead to the next webrev... :-)

 > http://cr.openjdk.java.net/~pchilanomate/8242263/v3/webrev/

General comments:
   - check files for copyright year updates.

src/hotspot/share/runtime/synchronizer.hpp
     No comments.

src/hotspot/share/runtime/synchronizer.cpp
     L507:   const markWord mark = obj->mark();
     L508:
     L509:   if (DiagnoseSyncOnPrimitiveWrappers != 0 && 
obj->klass()->is_box()) {
     L510:     return false;
     L511:   }
         The new bailout on L509-511 can move above L507.

     L573:     fatal("Synchronizing on object " INTPTR_FORMAT " of klass 
%s", p2i(obj()), obj->klass()->external_name());
         This external_name() call does not have a ResourceMark.

src/hotspot/share/logging/logTag.hpp
     No comments.

src/hotspot/share/oops/klass.hpp
     No comments.

src/hotspot/share/utilities/accessFlags.hpp
     No comments.

src/hotspot/share/runtime/globals.hpp
     L814:              "0: off "
         Missing a ';' after "off".

     L816:              "2: log message to stdout.
         Perhaps add "(by default)" after "stdout" or
         don't say where log output is at all.

src/hotspot/share/runtime/arguments.cpp
     L4197: LogConfiguration::configure_stdout(LogLevel::Info, true, 
LOG_TAGS(primitivewrappers));
         Hmmm... maybe ignore my comments about L816 in globals.hpp
         since it looks like logging output is configured to 'stdout'.
         I'm assuming that other log options to put output elsewhere
         are overridden by this code?

src/hotspot/share/classfile/systemDictionary.cpp
     L2188:     for (int i = T_BOOLEAN; i < T_LONG+1; i++) {
         nit - s/T_LONG+1/T_LONG + 1/

     L2191: _box_klasses[i]->set_prototype_header(markWord::prototype());
         I assume we're keeping the prototype_header field when Biased 
Locking
         goes away? The reason I ask:

         static markWord prototype() {
           return markWord( no_hash_in_place | no_lock_in_place );
         }

         is because without Biased Locking, do we really need the prototype
         anymore? The initial markWord won't need possible variants...

src/hotspot/share/jfr/metadata/metadata.xml
     L69:   <Event name="SyncOnPrimitiveWrapper" category="Java Application"
         Is the category "Java Application" because it's the application
         code that did something "wrong" here? Where "application" is 
loosely
         defined to include the possibility of auto generated code, library
         code and the like? Or perhaps "application" because something 
"above"
         the "Java Virtual Machine, Runtime" did the "wrong" thing here?

src/jdk.jfr/share/conf/jfr/default.jfc
     No comments.

src/jdk.jfr/share/conf/jfr/profile.jfc
     No comments.

test/lib/jdk/test/lib/jfr/EventNames.java
     No comments.

src/hotspot/cpu/aarch64/aarch64.ad
     L3517:       __ tbnz(tmp, exact_log2(JVM_ACC_IS_BOX_CLASS), cont);
<snip>
     L3578:     __ bind(cont);
     L3579:     // flag == EQ indicates success
     L3580:     // flag == NE indicates failure
         If tbnz() branches to "cont" when we have a box class, what's
         the flag value set to (EQ or NE)? And what set that flag value?
         The reason I ask is I don't think tbnz() sets condition codes...

src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp
     No comments.

src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp
     No comments.

src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp
     int MacroAssembler::biased_locking_enter(Register lock_reg,
                                              Register obj_reg,
                                              Register swap_reg,
                                              Register tmp_reg,
                                              bool swap_reg_contains_mark,
                                              Label& done,
                                              Label* slow_case,
BiasedLockingCounters* counters) {
         I think you've changed the only callers of biased_locking_enter()
         that cared about the return value with this changeset so it can
         be changed to a void function.

src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp
     No comments.

src/hotspot/cpu/arm/c2_MacroAssembler_arm.cpp
     L96:      tbnz(Rscratch, exact_log2(JVM_ACC_IS_BOX_CLASS), done);
<snip>
     L131:   bind(done);
     L132:
     L133:   // At this point flags are set as follows:
     L134:   //  EQ -> Success
     L135:   //  NE -> Failure, branch to slow path
         If tbnz() branches to "done" when we have a box class, what's
         the flag value set to (EQ or NE)? And what set that flag value?
         The reason I ask is I don't think tbnz() sets condition codes...

src/hotspot/cpu/arm/interp_masm_arm.cpp
     No comments.

src/hotspot/cpu/arm/macroAssembler_arm.cpp
     int MacroAssembler::biased_locking_enter(Register obj_reg, Register 
swap_reg, Register tmp_reg,
                                              bool swap_reg_contains_mark,
                                              Register tmp2,
                                              Label& done, Label& slow_case,
BiasedLockingCounters* counters) {
         I think you've changed the only callers of biased_locking_enter()
         that cared about the return value with this changeset so it can
         be changed to a void function.

src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp
src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp
src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
     No comments on the PPC code.

src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp
src/hotspot/cpu/s390/interp_masm_s390.cpp
src/hotspot/cpu/s390/macroAssembler_s390.cpp
     No comments on the S390 code.

src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp
     L58:     load_klass(hdr, obj, rklass_decode_tmp);
         What will this do with a 'noreg' value? (I need a refresher.)

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
     No comments.

src/hotspot/cpu/x86/interp_masm_x86.cpp
     No comments.

src/hotspot/cpu/x86/macroAssembler_x86.cpp
     int MacroAssembler::biased_locking_enter(Register lock_reg,
                                              Register obj_reg,
                                              Register swap_reg,
                                              Register tmp_reg,
                                              Register tmp_reg2,
                                              bool swap_reg_contains_mark,
                                              Label& done,
                                              Label* slow_case,
BiasedLockingCounters* counters) {
         I think you've changed the only caller of biased_locking_enter()
         that cared about the return value with this changeset so it can
         be changed to a void function.

test/hotspot/jtreg/runtime/Monitor/SyncOnPrimitiveWrapperTest.java
     L30:  * @test SyncOnPrimitiveWrapperTest
         No parameter to @test directive.

     L136:         private static long sharedCounter = 0L;
         Since you don't do anything with sharedCounter other than 
increment it,
         can the compilers optimize it away? If it can be optimized 
away, does
         that mean that:

             L142:                 synchronized (obj) {

         can also be optimized away?

         I don't think that:

             L161:                 synchronized (sharedLock1) {

         can be optimized away because it is shared by multiple threads.

test/jdk/jdk/jfr/event/runtime/TestSyncOnPrimitiveWrapperEvent.java
     Similar questions about 'counter' being optimized away.
     Similar question about "synchronized (obj) {" being optimized away.


Dan


On 8/5/20 9:01 AM, Patricio Chilano wrote:
> Hi Martin,
>
> On 8/5/20 4:47 AM, Doerr, Martin wrote:
>> Hi Patricio,
>>
>> using 8 Byte load instructions for jint fields is a terrible coding 
>> style!
>> Someone else may see it and use an 8 Byte store. Will result in great 
>> fun for debugging!
>>
>> And on Big Endian you will simply access the wrong bits.
> Ah, of course! Those 32 bits will be in the wrong place when doing the 
> test.
>
>> Note that Big Endian Platforms are AIX, old linux ppc, s390, SPARC. I 
>> don't think you have tested on them.
>>
>>> We could remove the nested synchronized statements in the run() method
>>> so that we don't generate that much logging. We could also lower
>>> LOOP_COUNT. I think the issue is also because we are running LogTest
>>> with multiple flag combinations. Not sure what we should touch first.
>>> Maybe the synchronized statements, have only one or two and test that
>>> first?
>> Sounds like helpful ideas. Please go ahead and strip things down.
> Great, I will send v3 later with those changes.
>
> Thanks Martin!
>
> Patricio
>> Thanks for taking care of it.
>>
>> Best regards,
>> Martin
>>
>>
>>> -----Original Message-----
>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>> Sent: Dienstag, 4. August 2020 20:26
>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>>> <david.holmes at oracle.com>; hotspot-runtime-dev at openjdk.java.net;
>>> hotspot-jfr-dev at openjdk.java.net
>>> Subject: Re: RFR 8242263: Diagnose synchronization on primitive 
>>> wrappers
>>>
>>> Hi Martin,
>>>
>>> On 8/4/20 1:35 PM, Doerr, Martin wrote:
>>>> Hi Patricio,
>>>>
>>>>> Ok, I'll change it to movl + testl and test it out before sending v3.
>>>> Thanks. I forgot to mention arm + aarch64.
>>>> Aarch64 uses ldrw + tbnz.
>>>> Arm 32 bit uses ldr_u32 + tbnz.
>>>>
>>>> I remember having seen the same mistake ��
>>>> And nobody noticed it on little Endian platforms.
>>> Ok, I can use a tbnz instead of test and then a branch on arm32 and 
>>> aarch64.
>>> Shouldn't a normal ldr in arm32 work fine?
>>> Also in 64 bits (either x64 or aarch64) I don't see the issue of 
>>> using a
>>> 64 bit load, besides the fact that we only care about the first 32 
>>> bits.
>>> Regardless of the endianness, aren't we masking out the upper part when
>>> we do AND/TEST or if we test a bit in the 0-31 bit range? Otherwise it
>>> seems it should have failed for one of those platforms in my tests.
>>>
>>>>> I see that some tests use @run driver/timeout=xxxx. Maybe you can
>>>>> specify that and see if that fixes it? Let me know if that works 
>>>>> and I
>>>>> can add it to the test.
>>>> That seems to have an effect. But now I'm not patient enough to 
>>>> wait for
>>> the test to finish.
>>>> Maybe the problem is that I'm using slow debug builds.
>>>> But is there a chance to make the test quicker without losing coverage
>>> significantly?
>>>> I think the test is too slow this way to run it on a regular basis. 
>>>> We'd need to
>>> spend dedicated machines for it.
>>> We could remove the nested synchronized statements in the run() method
>>> so that we don't generate that much logging. We could also lower
>>> LOOP_COUNT. I think the issue is also because we are running LogTest
>>> with multiple flag combinations. Not sure what we should touch first.
>>> Maybe the synchronized statements, have only one or two and test that
>>> first?
>>>
>>>
>>> Thanks,
>>> Patricio
>>>> Best regards,
>>>> Martin
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>>>> Sent: Dienstag, 4. August 2020 17:47
>>>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>>>>> <david.holmes at oracle.com>; hotspot-runtime-dev at openjdk.java.net;
>>>>> hotspot-jfr-dev at openjdk.java.net
>>>>> Subject: Re: RFR 8242263: Diagnose synchronization on primitive 
>>>>> wrappers
>>>>>
>>>>> Hi Martin,
>>>>>
>>>>> Thanks for fixing PPC and taking care of s390!
>>>>>
>>>>> On 8/4/20 11:18 AM, Doerr, Martin wrote:
>>>>>> Hi Patricio,
>>>>>>
>>>>>> I suggest to use movl + testl for checking the access flag as for 
>>>>>> other
>>> access
>>>>> flags on x64.
>>>>> Ok, I'll change it to movl + testl and test it out before sending v3.
>>>>>
>>>>>> New version for PPC64 and s390 see below.
>>>>>>
>>>>>> The test SyncOnPrimitiveWrapperTest produces hs_err files as
>>> expected.
>>>>> However, I'm getting timeout issues:
>>>>>> timed out (timeout set to 120000ms, elapsed time including timeout
>>>>> handling was 122372ms)
>>>>>> Can we provide more time?
>>>>> I see that some tests use @run driver/timeout=xxxx. Maybe you can
>>>>> specify that and see if that fixes it? Let me know if that works 
>>>>> and I
>>>>> can add it to the test.
>>>>>
>>>>> Thanks,
>>>>> Patricio
>>>>>> Best regards,
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>> diff -r 77852e129401 src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp
>>>>>> --- a/src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp Tue Aug 04
>>>>> 10:03:57 2020 +0200
>>>>>> +++ b/src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp     Tue Aug 04
>>>>> 16:04:31 2020 +0200
>>>>>> @@ -107,8 +107,8 @@
>>>>>>
>>>>>>       if (DiagnoseSyncOnPrimitiveWrappers != 0) {
>>>>>>         load_klass(Rscratch, Roop);
>>>>>> -    ld(Rscratch, in_bytes(Klass::access_flags_offset()), Rscratch);
>>>>>> -    andi(Rscratch, Rscratch, JVM_ACC_IS_BOX_CLASS);
>>>>>> +    lwz(Rscratch, in_bytes(Klass::access_flags_offset()), 
>>>>>> Rscratch);
>>>>>> +    testbitdi(CCR0, R0, Rscratch, 
>>>>>> exact_log2(JVM_ACC_IS_BOX_CLASS));
>>>>>>         bne(CCR0, slow_case);
>>>>>>       }
>>>>>>
>>>>>> diff -r 77852e129401 src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp
>>>>>> --- a/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp Tue Aug 04
>>>>> 10:03:57 2020 +0200
>>>>>> +++ b/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp        Tue Aug 04
>>>>> 16:04:31 2020 +0200
>>>>>> @@ -912,8 +912,8 @@
>>>>>>
>>>>>>         if (DiagnoseSyncOnPrimitiveWrappers != 0) {
>>>>>>           load_klass(tmp, object);
>>>>>> -      ld(tmp, in_bytes(Klass::access_flags_offset()), tmp);
>>>>>> -      andi(tmp, tmp, JVM_ACC_IS_BOX_CLASS);
>>>>>> +      lwz(tmp, in_bytes(Klass::access_flags_offset()), tmp);
>>>>>> +      testbitdi(CCR0, R0, tmp, exact_log2(JVM_ACC_IS_BOX_CLASS));
>>>>>>           bne(CCR0, slow_case);
>>>>>>         }
>>>>>>
>>>>>> diff -r 77852e129401 src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
>>>>>> --- a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp Tue Aug 04
>>>>> 10:03:57 2020 +0200
>>>>>> +++ b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp        Tue Aug 04
>>>>> 16:04:31 2020 +0200
>>>>>> @@ -2838,8 +2838,8 @@
>>>>>>
>>>>>>       if (DiagnoseSyncOnPrimitiveWrappers != 0) {
>>>>>>         load_klass(temp, oop);
>>>>>> -    ld(temp, in_bytes(Klass::access_flags_offset()), temp);
>>>>>> -    andi(temp, temp, JVM_ACC_IS_BOX_CLASS);
>>>>>> +    lwz(temp, in_bytes(Klass::access_flags_offset()), temp);
>>>>>> +    testbitdi(CCR0, R0, temp, exact_log2(JVM_ACC_IS_BOX_CLASS));
>>>>>>         bne(CCR0, cont);
>>>>>>       }
>>>>>>
>>>>>> diff -r 77852e129401
>>> src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp
>>>>>> --- a/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp   Tue Aug 04
>>>>> 10:03:57 2020 +0200
>>>>>> +++ b/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp   Tue Aug 04
>>>>> 16:04:31 2020 +0200
>>>>>> @@ -91,6 +91,12 @@
>>>>>>       // Save object being locked into the BasicObjectLock...
>>>>>>       z_stg(obj, Address(disp_hdr,
>>> BasicObjectLock::obj_offset_in_bytes()));
>>>>>> +  if (DiagnoseSyncOnPrimitiveWrappers != 0) {
>>>>>> +    load_klass(Z_R1_scratch, obj);
>>>>>> +    testbit(Address(Z_R1_scratch, Klass::access_flags_offset()),
>>>>> exact_log2(JVM_ACC_IS_BOX_CLASS));
>>>>>> +    z_btrue(slow_case);
>>>>>> +  }
>>>>>> +
>>>>>>       if (UseBiasedLocking) {
>>>>>>         biased_locking_enter(obj, hdr, Z_R1_scratch, 
>>>>>> Z_R0_scratch, done,
>>>>> &slow_case);
>>>>>>       }
>>>>>> diff -r 77852e129401 src/hotspot/cpu/s390/interp_masm_s390.cpp
>>>>>> --- a/src/hotspot/cpu/s390/interp_masm_s390.cpp Tue Aug 04 10:03:57
>>>>> 2020 +0200
>>>>>> +++ b/src/hotspot/cpu/s390/interp_masm_s390.cpp Tue Aug 04
>>> 16:04:31
>>>>> 2020 +0200
>>>>>> @@ -1000,6 +1000,12 @@
>>>>>>       // Load markWord from object into displaced_header.
>>>>>>       z_lg(displaced_header, oopDesc::mark_offset_in_bytes(), 
>>>>>> object);
>>>>>>
>>>>>> +  if (DiagnoseSyncOnPrimitiveWrappers != 0) {
>>>>>> +    load_klass(Z_R1_scratch, object);
>>>>>> +    testbit(Address(Z_R1_scratch, Klass::access_flags_offset()),
>>>>> exact_log2(JVM_ACC_IS_BOX_CLASS));
>>>>>> +    z_btrue(slow_case);
>>>>>> +  }
>>>>>> +
>>>>>>       if (UseBiasedLocking) {
>>>>>>         biased_locking_enter(object, displaced_header, Z_R1, 
>>>>>> Z_R0, done,
>>>>> &slow_case);
>>>>>>       }
>>>>>> diff -r 77852e129401 src/hotspot/cpu/s390/macroAssembler_s390.cpp
>>>>>> --- a/src/hotspot/cpu/s390/macroAssembler_s390.cpp Tue Aug 04
>>>>> 10:03:57 2020 +0200
>>>>>> +++ b/src/hotspot/cpu/s390/macroAssembler_s390.cpp      Tue Aug 04
>>>>> 16:04:31 2020 +0200
>>>>>> @@ -3358,6 +3358,12 @@
>>>>>>       // Load markWord from oop into mark.
>>>>>>       z_lg(displacedHeader, 0, oop);
>>>>>>
>>>>>> +  if (DiagnoseSyncOnPrimitiveWrappers != 0) {
>>>>>> +    load_klass(Z_R1_scratch, oop);
>>>>>> +    testbit(Address(Z_R1_scratch, Klass::access_flags_offset()),
>>>>> exact_log2(JVM_ACC_IS_BOX_CLASS));
>>>>>> +    z_btrue(done);
>>>>>> +  }
>>>>>> +
>>>>>>       if (try_bias) {
>>>>>>         biased_locking_enter(oop, displacedHeader, temp, Z_R0, 
>>>>>> done);
>>>>>>       }
>>>>>>
>



More information about the hotspot-runtime-dev mailing list