RFR 8242263: Diagnose synchronization on primitive wrappers
Patricio Chilano
patricio.chilano.mateo at oracle.com
Thu Aug 6 18:48:36 UTC 2020
Hi Dan,
On 8/5/20 6:47 PM, Daniel D. Daugherty wrote:
> I'm peeking ahead to the next webrev... :-)
>
> > http://cr.openjdk.java.net/~pchilanomate/8242263/v3/webrev/
: )
> General comments:
> - check files for copyright year updates.
Updated accessFlags.hpp, c1_MacroAssembler_aarch64.cpp,
c1_MacroAssembler_arm.cpp and c1_MacroAssembler_x86.cpp.
> src/hotspot/share/runtime/synchronizer.hpp
> No comments.
>
> src/hotspot/share/runtime/synchronizer.cpp
> L507: const markWord mark = obj->mark();
> L508:
> L509: if (DiagnoseSyncOnPrimitiveWrappers != 0 &&
> obj->klass()->is_box()) {
> L510: return false;
> L511: }
> The new bailout on L509-511 can move above L507.
Moved.
> L573: fatal("Synchronizing on object " INTPTR_FORMAT " of klass
> %s", p2i(obj()), obj->klass()->external_name());
> This external_name() call does not have a ResourceMark.
Good catch! I had one in a previous version but then I changed the
conditionals and lost it for the fatal error case. The test worked okay
because for the main JavaThread there is a ResourceMark in
jni_invoke_static() (jni.cpp).
> src/hotspot/share/logging/logTag.hpp
> No comments.
>
> src/hotspot/share/oops/klass.hpp
> No comments.
>
> src/hotspot/share/utilities/accessFlags.hpp
> No comments.
>
> src/hotspot/share/runtime/globals.hpp
> L814: "0: off "
> Missing a ';' after "off".
Fixed.
> L816: "2: log message to stdout.
> Perhaps add "(by default)" after "stdout" or
> don't say where log output is at all.
>
> src/hotspot/share/runtime/arguments.cpp
> L4197: LogConfiguration::configure_stdout(LogLevel::Info, true,
> LOG_TAGS(primitivewrappers));
> Hmmm... maybe ignore my comments about L816 in globals.hpp
> since it looks like logging output is configured to 'stdout'.
> I'm assuming that other log options to put output elsewhere
> are overridden by this code?
Right. So the logging is done under UL with the tag primitivewrappers.
If that tag is specified in Xlog then this conditional will be skipped
because !log_is_enabled(Info, primitivewrappers) will be false.
> src/hotspot/share/classfile/systemDictionary.cpp
> L2188: for (int i = T_BOOLEAN; i < T_LONG+1; i++) {
> nit - s/T_LONG+1/T_LONG + 1/
Fixed.
> L2191: _box_klasses[i]->set_prototype_header(markWord::prototype());
> I assume we're keeping the prototype_header field when Biased
> Locking
> goes away? The reason I ask:
>
> static markWord prototype() {
> return markWord( no_hash_in_place | no_lock_in_place );
> }
>
> is because without Biased Locking, do we really need the
> prototype
> anymore? The initial markWord won't need possible variants...
Yes, I think it can go away unless somebody finds another use for it.
Maybe Valhalla.
> src/hotspot/share/jfr/metadata/metadata.xml
> L69: <Event name="SyncOnPrimitiveWrapper" category="Java
> Application"
> Is the category "Java Application" because it's the application
> code that did something "wrong" here? Where "application" is
> loosely
> defined to include the possibility of auto generated code,
> library
> code and the like? Or perhaps "application" because something
> "above"
> the "Java Virtual Machine, Runtime" did the "wrong" thing here?
I don't know what the right category should be really. I saw the events
JavaMonitorEnter, JavaMonitorWait and JavaMonitorInflate and thought
this event should fall in the same category they do.
> src/jdk.jfr/share/conf/jfr/default.jfc
> No comments.
>
> src/jdk.jfr/share/conf/jfr/profile.jfc
> No comments.
>
> test/lib/jdk/test/lib/jfr/EventNames.java
> No comments.
>
> src/hotspot/cpu/aarch64/aarch64.ad
> L3517: __ tbnz(tmp, exact_log2(JVM_ACC_IS_BOX_CLASS), cont);
> <snip>
> L3578: __ bind(cont);
> L3579: // flag == EQ indicates success
> L3580: // flag == NE indicates failure
> If tbnz() branches to "cont" when we have a box class, what's
> the flag value set to (EQ or NE)? And what set that flag value?
> The reason I ask is I don't think tbnz() sets condition codes...
Right, it doesn't set condition codes, so I kept the version I had (more
on that below).
> src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp
> No comments.
>
> src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp
> No comments.
>
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp
> int MacroAssembler::biased_locking_enter(Register lock_reg,
> Register obj_reg,
> Register swap_reg,
> Register tmp_reg,
> bool swap_reg_contains_mark,
> Label& done,
> Label* slow_case,
> BiasedLockingCounters* counters) {
> I think you've changed the only callers of biased_locking_enter()
> that cared about the return value with this changeset so it can
> be changed to a void function.
Ok, this is what I mentioned to David in a previous email. Done.
> src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp
> No comments.
>
> src/hotspot/cpu/arm/c2_MacroAssembler_arm.cpp
> L96: tbnz(Rscratch, exact_log2(JVM_ACC_IS_BOX_CLASS), done);
> <snip>
> L131: bind(done);
> L132:
> L133: // At this point flags are set as follows:
> L134: // EQ -> Success
> L135: // NE -> Failure, branch to slow path
> If tbnz() branches to "done" when we have a box class, what's
> the flag value set to (EQ or NE)? And what set that flag value?
> The reason I ask is I don't think tbnz() sets condition codes...
Right. Same as above, I kept the version I had.
> src/hotspot/cpu/arm/interp_masm_arm.cpp
> No comments.
>
> src/hotspot/cpu/arm/macroAssembler_arm.cpp
> int MacroAssembler::biased_locking_enter(Register obj_reg,
> Register swap_reg, Register tmp_reg,
> bool swap_reg_contains_mark,
> Register tmp2,
> Label& done, Label&
> slow_case,
> BiasedLockingCounters* counters) {
> I think you've changed the only callers of biased_locking_enter()
> that cared about the return value with this changeset so it can
> be changed to a void function.
Done.
> src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp
> src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp
> src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
> No comments on the PPC code.
>
> src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp
> src/hotspot/cpu/s390/interp_masm_s390.cpp
> src/hotspot/cpu/s390/macroAssembler_s390.cpp
> No comments on the S390 code.
>
> src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp
> L58: load_klass(hdr, obj, rklass_decode_tmp);
> What will this do with a 'noreg' value? (I need a refresher.)
When not in 64 bit mode that register just won't be used.
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
> No comments.
>
> src/hotspot/cpu/x86/interp_masm_x86.cpp
> No comments.
>
> src/hotspot/cpu/x86/macroAssembler_x86.cpp
> int MacroAssembler::biased_locking_enter(Register lock_reg,
> Register obj_reg,
> Register swap_reg,
> Register tmp_reg,
> Register tmp_reg2,
> bool swap_reg_contains_mark,
> Label& done,
> Label* slow_case,
> BiasedLockingCounters* counters) {
> I think you've changed the only caller of biased_locking_enter()
> that cared about the return value with this changeset so it can
> be changed to a void function.
Done.
> test/hotspot/jtreg/runtime/Monitor/SyncOnPrimitiveWrapperTest.java
> L30: * @test SyncOnPrimitiveWrapperTest
> No parameter to @test directive.
Removed.
> L136: private static long sharedCounter = 0L;
> Since you don't do anything with sharedCounter other than
> increment it,
> can the compilers optimize it away? If it can be optimized
> away, does
> that mean that:
>
> L142: synchronized (obj) {
>
> can also be optimized away?
>
> I don't think that:
>
> L161: synchronized (sharedLock1) {
>
> can be optimized away because it is shared by multiple threads.
>
> test/jdk/jdk/jfr/event/runtime/TestSyncOnPrimitiveWrapperEvent.java
> Similar questions about 'counter' being optimized away.
> Similar question about "synchronized (obj) {" being optimized away.
I'm not sure if the compiler will optimize it away. Seems unlikely given
the counter we are incrementing is not just local to some thread.
Ok, below is v3 which has the following changes:
- Use a 32 bit load for the _access_flags field, instead of 64
- Martin's implementation for s390 and fix for PPC
- Faster LogTest version
- Above changes based on Dan review
I'm retesting in mach5 tiers1-6 (which tests x64 and aarch64 only) again
with -XX:DiagnoseSyncOnPrimitiveWrappers=2. I checked it builds in
arm32, ppc and s390.
Full: http://cr.openjdk.java.net/~pchilanomate/8242263/v3/webrev
Inc: http://cr.openjdk.java.net/~pchilanomate/8242263/v3/inc/webrev
@Martin:
Please check if the test doesn't timeout for you now with the changes I
made to LogTest.
Also, I tried to use tbnz in aarch64 and arm32 instead of tst + br
(except for c2 since we actually need to set the condition flags), but
for c1 I was getting an assertion in the compiler thread:
guarantee(chk == -1 || chk == 0) failed: Field too big for insn
This is the stack when that happens:
MacroAssembler::pd_patch_instruction_size(unsigned char*, unsigned
char*)+0x398
AbstractAssembler::bind(Label&)+0x118
MonitorEnterStub::emit_code(LIR_Assembler*)+0x28
LIR_Assembler::emit_slow_case_stubs()+0x54
Compilation::emit_code_body()+0x17c
The change was simply:
diff --git a/src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp
b/src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp
--- a/src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp
+++ b/src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp
@@ -78,7 +78,6 @@
if (DiagnoseSyncOnPrimitiveWrappers != 0) {
load_klass(hdr, obj);
- ldr(hdr, Address(hdr, Klass::access_flags_offset()));
- tst(hdr, JVM_ACC_IS_BOX_CLASS);
- br(Assembler::NE, slow_case);
+ ldrw(hdr, Address(hdr, Klass::access_flags_offset()));
+ tbnz(hdr, exact_log2(JVM_ACC_IS_BOX_CLASS), slow_case);
}
So the issue must be related to where we want to jump.
Thanks,
Patricio
> Dan
>
>
> On 8/5/20 9:01 AM, Patricio Chilano wrote:
>> Hi Martin,
>>
>> On 8/5/20 4:47 AM, Doerr, Martin wrote:
>>> Hi Patricio,
>>>
>>> using 8 Byte load instructions for jint fields is a terrible coding
>>> style!
>>> Someone else may see it and use an 8 Byte store. Will result in
>>> great fun for debugging!
>>>
>>> And on Big Endian you will simply access the wrong bits.
>> Ah, of course! Those 32 bits will be in the wrong place when doing
>> the test.
>>
>>> Note that Big Endian Platforms are AIX, old linux ppc, s390, SPARC.
>>> I don't think you have tested on them.
>>>
>>>> We could remove the nested synchronized statements in the run() method
>>>> so that we don't generate that much logging. We could also lower
>>>> LOOP_COUNT. I think the issue is also because we are running LogTest
>>>> with multiple flag combinations. Not sure what we should touch first.
>>>> Maybe the synchronized statements, have only one or two and test that
>>>> first?
>>> Sounds like helpful ideas. Please go ahead and strip things down.
>> Great, I will send v3 later with those changes.
>>
>> Thanks Martin!
>>
>> Patricio
>>> Thanks for taking care of it.
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>>> -----Original Message-----
>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>>> Sent: Dienstag, 4. August 2020 20:26
>>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>>>> <david.holmes at oracle.com>; hotspot-runtime-dev at openjdk.java.net;
>>>> hotspot-jfr-dev at openjdk.java.net
>>>> Subject: Re: RFR 8242263: Diagnose synchronization on primitive
>>>> wrappers
>>>>
>>>> Hi Martin,
>>>>
>>>> On 8/4/20 1:35 PM, Doerr, Martin wrote:
>>>>> Hi Patricio,
>>>>>
>>>>>> Ok, I'll change it to movl + testl and test it out before sending
>>>>>> v3.
>>>>> Thanks. I forgot to mention arm + aarch64.
>>>>> Aarch64 uses ldrw + tbnz.
>>>>> Arm 32 bit uses ldr_u32 + tbnz.
>>>>>
>>>>> I remember having seen the same mistake
>>>>> And nobody noticed it on little Endian platforms.
>>>> Ok, I can use a tbnz instead of test and then a branch on arm32 and
>>>> aarch64.
>>>> Shouldn't a normal ldr in arm32 work fine?
>>>> Also in 64 bits (either x64 or aarch64) I don't see the issue of
>>>> using a
>>>> 64 bit load, besides the fact that we only care about the first 32
>>>> bits.
>>>> Regardless of the endianness, aren't we masking out the upper part
>>>> when
>>>> we do AND/TEST or if we test a bit in the 0-31 bit range? Otherwise it
>>>> seems it should have failed for one of those platforms in my tests.
>>>>
>>>>>> I see that some tests use @run driver/timeout=xxxx. Maybe you can
>>>>>> specify that and see if that fixes it? Let me know if that works
>>>>>> and I
>>>>>> can add it to the test.
>>>>> That seems to have an effect. But now I'm not patient enough to
>>>>> wait for
>>>> the test to finish.
>>>>> Maybe the problem is that I'm using slow debug builds.
>>>>> But is there a chance to make the test quicker without losing
>>>>> coverage
>>>> significantly?
>>>>> I think the test is too slow this way to run it on a regular
>>>>> basis. We'd need to
>>>> spend dedicated machines for it.
>>>> We could remove the nested synchronized statements in the run() method
>>>> so that we don't generate that much logging. We could also lower
>>>> LOOP_COUNT. I think the issue is also because we are running LogTest
>>>> with multiple flag combinations. Not sure what we should touch first.
>>>> Maybe the synchronized statements, have only one or two and test that
>>>> first?
>>>>
>>>>
>>>> Thanks,
>>>> Patricio
>>>>> Best regards,
>>>>> Martin
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
>>>>>> Sent: Dienstag, 4. August 2020 17:47
>>>>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
>>>>>> <david.holmes at oracle.com>; hotspot-runtime-dev at openjdk.java.net;
>>>>>> hotspot-jfr-dev at openjdk.java.net
>>>>>> Subject: Re: RFR 8242263: Diagnose synchronization on primitive
>>>>>> wrappers
>>>>>>
>>>>>> Hi Martin,
>>>>>>
>>>>>> Thanks for fixing PPC and taking care of s390!
>>>>>>
>>>>>> On 8/4/20 11:18 AM, Doerr, Martin wrote:
>>>>>>> Hi Patricio,
>>>>>>>
>>>>>>> I suggest to use movl + testl for checking the access flag as
>>>>>>> for other
>>>> access
>>>>>> flags on x64.
>>>>>> Ok, I'll change it to movl + testl and test it out before sending
>>>>>> v3.
>>>>>>
>>>>>>> New version for PPC64 and s390 see below.
>>>>>>>
>>>>>>> The test SyncOnPrimitiveWrapperTest produces hs_err files as
>>>> expected.
>>>>>> However, I'm getting timeout issues:
>>>>>>> timed out (timeout set to 120000ms, elapsed time including timeout
>>>>>> handling was 122372ms)
>>>>>>> Can we provide more time?
>>>>>> I see that some tests use @run driver/timeout=xxxx. Maybe you can
>>>>>> specify that and see if that fixes it? Let me know if that works
>>>>>> and I
>>>>>> can add it to the test.
>>>>>>
>>>>>> Thanks,
>>>>>> Patricio
>>>>>>> Best regards,
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>> diff -r 77852e129401 src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp
>>>>>>> --- a/src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp Tue Aug 04
>>>>>> 10:03:57 2020 +0200
>>>>>>> +++ b/src/hotspot/cpu/ppc/c1_MacroAssembler_ppc.cpp Tue Aug 04
>>>>>> 16:04:31 2020 +0200
>>>>>>> @@ -107,8 +107,8 @@
>>>>>>>
>>>>>>> if (DiagnoseSyncOnPrimitiveWrappers != 0) {
>>>>>>> load_klass(Rscratch, Roop);
>>>>>>> - ld(Rscratch, in_bytes(Klass::access_flags_offset()),
>>>>>>> Rscratch);
>>>>>>> - andi(Rscratch, Rscratch, JVM_ACC_IS_BOX_CLASS);
>>>>>>> + lwz(Rscratch, in_bytes(Klass::access_flags_offset()),
>>>>>>> Rscratch);
>>>>>>> + testbitdi(CCR0, R0, Rscratch,
>>>>>>> exact_log2(JVM_ACC_IS_BOX_CLASS));
>>>>>>> bne(CCR0, slow_case);
>>>>>>> }
>>>>>>>
>>>>>>> diff -r 77852e129401 src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp
>>>>>>> --- a/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp Tue Aug 04
>>>>>> 10:03:57 2020 +0200
>>>>>>> +++ b/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp Tue Aug 04
>>>>>> 16:04:31 2020 +0200
>>>>>>> @@ -912,8 +912,8 @@
>>>>>>>
>>>>>>> if (DiagnoseSyncOnPrimitiveWrappers != 0) {
>>>>>>> load_klass(tmp, object);
>>>>>>> - ld(tmp, in_bytes(Klass::access_flags_offset()), tmp);
>>>>>>> - andi(tmp, tmp, JVM_ACC_IS_BOX_CLASS);
>>>>>>> + lwz(tmp, in_bytes(Klass::access_flags_offset()), tmp);
>>>>>>> + testbitdi(CCR0, R0, tmp, exact_log2(JVM_ACC_IS_BOX_CLASS));
>>>>>>> bne(CCR0, slow_case);
>>>>>>> }
>>>>>>>
>>>>>>> diff -r 77852e129401 src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
>>>>>>> --- a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp Tue Aug 04
>>>>>> 10:03:57 2020 +0200
>>>>>>> +++ b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp Tue Aug 04
>>>>>> 16:04:31 2020 +0200
>>>>>>> @@ -2838,8 +2838,8 @@
>>>>>>>
>>>>>>> if (DiagnoseSyncOnPrimitiveWrappers != 0) {
>>>>>>> load_klass(temp, oop);
>>>>>>> - ld(temp, in_bytes(Klass::access_flags_offset()), temp);
>>>>>>> - andi(temp, temp, JVM_ACC_IS_BOX_CLASS);
>>>>>>> + lwz(temp, in_bytes(Klass::access_flags_offset()), temp);
>>>>>>> + testbitdi(CCR0, R0, temp, exact_log2(JVM_ACC_IS_BOX_CLASS));
>>>>>>> bne(CCR0, cont);
>>>>>>> }
>>>>>>>
>>>>>>> diff -r 77852e129401
>>>> src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp
>>>>>>> --- a/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp Tue Aug 04
>>>>>> 10:03:57 2020 +0200
>>>>>>> +++ b/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp Tue Aug 04
>>>>>> 16:04:31 2020 +0200
>>>>>>> @@ -91,6 +91,12 @@
>>>>>>> // Save object being locked into the BasicObjectLock...
>>>>>>> z_stg(obj, Address(disp_hdr,
>>>> BasicObjectLock::obj_offset_in_bytes()));
>>>>>>> + if (DiagnoseSyncOnPrimitiveWrappers != 0) {
>>>>>>> + load_klass(Z_R1_scratch, obj);
>>>>>>> + testbit(Address(Z_R1_scratch, Klass::access_flags_offset()),
>>>>>> exact_log2(JVM_ACC_IS_BOX_CLASS));
>>>>>>> + z_btrue(slow_case);
>>>>>>> + }
>>>>>>> +
>>>>>>> if (UseBiasedLocking) {
>>>>>>> biased_locking_enter(obj, hdr, Z_R1_scratch,
>>>>>>> Z_R0_scratch, done,
>>>>>> &slow_case);
>>>>>>> }
>>>>>>> diff -r 77852e129401 src/hotspot/cpu/s390/interp_masm_s390.cpp
>>>>>>> --- a/src/hotspot/cpu/s390/interp_masm_s390.cpp Tue Aug 04 10:03:57
>>>>>> 2020 +0200
>>>>>>> +++ b/src/hotspot/cpu/s390/interp_masm_s390.cpp Tue Aug 04
>>>> 16:04:31
>>>>>> 2020 +0200
>>>>>>> @@ -1000,6 +1000,12 @@
>>>>>>> // Load markWord from object into displaced_header.
>>>>>>> z_lg(displaced_header, oopDesc::mark_offset_in_bytes(),
>>>>>>> object);
>>>>>>>
>>>>>>> + if (DiagnoseSyncOnPrimitiveWrappers != 0) {
>>>>>>> + load_klass(Z_R1_scratch, object);
>>>>>>> + testbit(Address(Z_R1_scratch, Klass::access_flags_offset()),
>>>>>> exact_log2(JVM_ACC_IS_BOX_CLASS));
>>>>>>> + z_btrue(slow_case);
>>>>>>> + }
>>>>>>> +
>>>>>>> if (UseBiasedLocking) {
>>>>>>> biased_locking_enter(object, displaced_header, Z_R1,
>>>>>>> Z_R0, done,
>>>>>> &slow_case);
>>>>>>> }
>>>>>>> diff -r 77852e129401 src/hotspot/cpu/s390/macroAssembler_s390.cpp
>>>>>>> --- a/src/hotspot/cpu/s390/macroAssembler_s390.cpp Tue Aug 04
>>>>>> 10:03:57 2020 +0200
>>>>>>> +++ b/src/hotspot/cpu/s390/macroAssembler_s390.cpp Tue Aug 04
>>>>>> 16:04:31 2020 +0200
>>>>>>> @@ -3358,6 +3358,12 @@
>>>>>>> // Load markWord from oop into mark.
>>>>>>> z_lg(displacedHeader, 0, oop);
>>>>>>>
>>>>>>> + if (DiagnoseSyncOnPrimitiveWrappers != 0) {
>>>>>>> + load_klass(Z_R1_scratch, oop);
>>>>>>> + testbit(Address(Z_R1_scratch, Klass::access_flags_offset()),
>>>>>> exact_log2(JVM_ACC_IS_BOX_CLASS));
>>>>>>> + z_btrue(done);
>>>>>>> + }
>>>>>>> +
>>>>>>> if (try_bias) {
>>>>>>> biased_locking_enter(oop, displacedHeader, temp, Z_R0,
>>>>>>> done);
>>>>>>> }
>>>>>>>
>>
>
More information about the hotspot-runtime-dev
mailing list