[aarch64-port-dev ] 8153107: Unbalanced recursive locking

Andrey Petushkov andrey.petushkov at gmail.com
Mon Jun 18 13:29:16 UTC 2018


Hi Dan,

The problem is that I’m not aware of any reproducer for any of the platforms part of main openjdk. 
Also Andrew Haley as well as Derek cannot help here I think since they are working with cpu/aarch64 port code, which did not have mentioned problem at all. 
Me myself is dealing with cpu/aarch32 port which is not part of main openjdk and jdk10 version of it is not yet even pushed into project-aarch32 openjdk repos. There is jdk9 version but there is no ready made reproducer there. The jdk10 version (on review now) has 100% reproducer for this problem, the runtime/CreateMirror/ArraysNewInstanceBug.java hotspot jtreg test, however I’m not sure anyone wants to try that out. Maybe Ed Nevill could help as a more or less independent party. Ed?

Thank you,
Andrey

> On 16 Jun 2018, at 23:08, Daniel D. Daugherty <daniel.daugherty at oracle.com> wrote:
> 
> Hi Andrey,
> 
> I updated the bug report with the Mach5 results so the only thing I need
> now is confirmation that ARM testing went fine. So whenever you or aph
> or both have some ARM test results we should be good to go.
> 
> In particular it would be good to verify that your test that reproduces
> this hang no longer reproduces the hang in some relatively high number
> of iterations. So if your test repros the hang in 5/100 runs, I would
> be happy if the hang didn't repro in a 1000 runs.
> 
> Dan
> 
> 
> On 6/16/18 12:11 PM, Andrey Petushkov wrote:
>> Hi Dan,
>> 
>> Thank you for sponsorship and yes,
>> > I don't really know ARM assembly, but I'm guessing that the 'eq' 
>> > parameter
>> >     controls whether the store of Rscratch into "Address(Rbox, ...)" 
>> > occurs or
>> >     not
>> You guess is perfectly correct!
>> 
>> Thanks again,
>> Andrey
>>  
>> 
>> On Fri, Jun 15, 2018 at 9:00 PM Daniel D. Daugherty <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>> wrote:
>> Andrey,
>> 
>> I'm about to send the following changeset thru Mach5 testing:
>> 
>> $ hg log -v -r tip
>> changeset:   50592:dc9e3b56c8ab
>> tag:         tip
>> user:        apetushkov
>> date:        Fri Jun 15 13:57:32 2018 -0400
>> files:       src/hotspot/cpu/arm/macroAssembler_arm.cpp 
>> src/hotspot/share/runtime/sharedRuntime.cpp
>> description:
>> 8153107: enabling ObjectSynchronizer::quick_enter() on ARM64 causes hangs
>> Summary: Always set the markword for recursive monitors in 
>> MacroAssembler::fast_lock().
>> Reviewed-by: aph, drwhite, dcubed
>> 
>> 
>> I don't expect any problems, but I'm paranoid so...
>> 
>> Dan
>> 
>> 
>> On 6/15/18 1:29 PM, Daniel D. Daugherty wrote:
>> > > http://cr.openjdk.java.net/~apetushkov/8153107/ <http://cr.openjdk.java.net/%7Eapetushkov/8153107/>
>> >
>> > src/hotspot/share/runtime/sharedRuntime.cpp
>> >     No comments.
>> >
>> > src/hotspot/cpu/arm/macroAssembler_arm.cpp
>> >     Looks correct. Here's my analysis:
>> >
>> >     The X64 version of this code does this:
>> >
>> >     src/hotspot/cpu/x86/macroAssembler_x86.cpp:
>> >
>> >       // Recursive locking.
>> >       // The object is stack-locked: markword contains stack pointer 
>> > to BasicLock.
>> >       // Locked by current thread if difference with current SP is 
>> > less than one page.
>> >       subptr(tmpReg, rsp);
>> >       // Next instruction set ZFlag == 1 (Success) if difference is 
>> > less then one page.
>> >       andptr(tmpReg, (int32_t) (NOT_LP64(0xFFFFF003) LP64_ONLY(7 - 
>> > os::vm_page_size())) );
>> >       movptr(Address(boxReg, 0), tmpReg);
>> >       if (counters != NULL) {
>> >         cond_inc32(Assembler::equal,
>> > ExternalAddress((address)counters->fast_path_entry_count_addr()));
>> >       }
>> >       jmp(DONE_LABEL);
>> >
>> >     This code path always updates "Address(boxReg, 0)" with the
>> >     "tmpReg" value so the markword always gets updated with a zero
>> >     or non-zero value.
>> >
>> >     The previous AMD64 code did this:
>> >
>> >       L3011: #ifdef AARCH64
>> >       L3012:   intptr_t mask = ((intptr_t)3) - 
>> > ((intptr_t)os::vm_page_size());
>> >       L3013:   Assembler::LogicalImmediate imm(mask, false);
>> >       L3014:   mov(Rscratch, SP);
>> >       L3015:   sub(Rscratch, Rmark, Rscratch);
>> >       L3016:   ands(Rscratch, Rscratch, imm);
>> >       L3017:   b(done, ne); // exit with failure
>> >       L3018:   str(Rscratch, Address(Rbox, 
>> > BasicLock::displaced_header_offset_in_bytes())); // set to zero
>> >       L3019:   b(done);
>> >       L3020:
>> >       L3021: #else
>> >       L3022:   // -1- test low 2 bits
>> >       L3023:   movs(Rscratch, AsmOperand(Rmark, lsl, 30));
>> >       L3024:   // -2- test (hdr - SP) if the low two bits are 0
>> >       L3025:   sub(Rscratch, Rmark, SP, eq);
>> >       L3026:   movs(Rscratch, AsmOperand(Rscratch, lsr, 
>> > exact_log2(os::vm_page_size())), eq);
>> >       L3027:   // If still 'eq' then recursive locking OK
>> >       L3028:   str(Rscratch, Address(Rbox, 
>> > BasicLock::displaced_header_offset_in_bytes()), eq); // set to zero
>> >       L3029:   b(done);
>> >       L3030: #endif
>> >
>> >     For the AARCH64 version, the key bug is the "b(done, ne); // exit 
>> > with failure"
>> >     which results in the markword not getting set to a specific value.
>> >     Someone mentioned that we're getting a random value here and that
>> >     contributes to the intermittent nature of the hang. Removing L3017
>> >     so that we always set the markword makes perfect sense.
>> >
>> >     For the non-AARCH64 version, the key bug is a little more obscure
>> >     (at least to me): "str(Rscratch, Address(Rbox, ...), eq); // set 
>> > to zero".
>> >     I don't really know ARM assembly, but I'm guessing that the 'eq' 
>> > parameter
>> >     controls whether the store of Rscratch into "Address(Rbox, ...)" 
>> > occurs or
>> >     not. Removing the "eq" parameter in the fix so that we always set the
>> >     markword makes perfect sense.
>> >
>> >
>> > So thumbs up on the fix and I can sponsor it. Please provide me with an
>> > importable changeset and I'll take care of that after confirmation that
>> > the ARM folks have tested the fix and are happy with it.
>> >
>> > Dan
>> >
>> >
>> > On 6/14/18 6:08 PM, White, Derek wrote:
>> >> Hi Andrey,
>> >>
>> >> It took me looking at the code three times, but I finally saw what 
>> >> you mean about condition code register. OK, looks fine to me.
>> >>
>> >> You still need a (R)eviewer’s OK and then a sponsor.
>> >>
>> >>
>> >>    *   Derek
>> >>
>> >> From: Andrey Petushkov [mailto:andrey.petushkov at gmail.com <mailto:andrey.petushkov at gmail.com>]
>> >> Sent: Thursday, June 14, 2018 4:55 PM
>> >> To: White, Derek <Derek.White at cavium.com <mailto:Derek.White at cavium.com>>
>> >> Cc: Andrew Haley <aph at redhat.com <mailto:aph at redhat.com>>; 
>> >> hotspot-runtime-dev at openjdk.java.net <mailto:hotspot-runtime-dev at openjdk.java.net>; 
>> >> aarch64-port-dev at openjdk.java.net <mailto:aarch64-port-dev at openjdk.java.net>; AArch32 Port Project 
>> >> <aarch32-port-dev at openjdk.java.net <mailto:aarch32-port-dev at openjdk.java.net>>
>> >> Subject: Re: [aarch64-port-dev ] 8153107: Unbalanced recursive locking
>> >>
>> >>
>> >> External Email
>> >> Hi Derek,
>> >>
>> >> that shall be ands since it expected to leave the result in the flags 
>> >> register, see the cmpFastLock instruct in aarch64.ad <http://aarch64.ad/><http://aarch64.ad <http://aarch64.ad/>>
>> >> to my taste this version is error-prone, since it's quite hard to 
>> >> deduce that contract FastLock node involves writing of non 0 value 
>> >> into DHW of stack lock when it has failed. however I don't insist. so 
>> >> if everyone agrees could someone please commit. sorry, I don't have 
>> >> necessary status
>> >>
>> >>
>> >> Regards,
>> >> Andrey
>> >>
>> >> On Thu, Jun 14, 2018 at 9:13 PM White, Derek 
>> >> <Derek.White at cavium.com <mailto:Derek.White at cavium.com><mailto:Derek.White at cavium.com <mailto:Derek.White at cavium.com>>> wrote:
>> >> Hi Andrey,
>> >>
>> >> I like this version.
>> >>
>> >> My only suggestion is minor - the "ands" at line 3016 could be a 
>> >> simple "and". It causes no harm but could be confusing to a reader:
>> >>
>> >> 3016   ands(Rscratch, Rscratch, imm);
>> >>
>> >> No need to see a new webrev.
>> >> - Derek
>> >>
>> >>> -----Original Message-----
>> >>> From: hotspot-runtime-dev 
>> >>> [mailto:hotspot-runtime-dev- <mailto:hotspot-runtime-dev-><mailto:hotspot-runtime-dev- <mailto:hotspot-runtime-dev->>
>> >>> bounces at openjdk.java.net <mailto:bounces at openjdk.java.net><mailto:bounces at openjdk.java.net <mailto:bounces at openjdk.java.net>>] On Behalf 
>> >>> Of Andrey Petushkov
>> >>> Sent: Thursday, June 14, 2018 8:25 AM
>> >>> To: Andrew Haley <aph at redhat.com <mailto:aph at redhat.com><mailto:aph at redhat.com <mailto:aph at redhat.com>>>
>> >>> Cc: 
>> >>> hotspot-runtime-dev at openjdk.java.net <mailto:hotspot-runtime-dev at openjdk.java.net><mailto:hotspot-runtime-dev at openjdk.java.net <mailto:hotspot-runtime-dev at openjdk.java.net>>; 
>> >>> aarch64-port-
>> >>> dev at openjdk.java.net <mailto:dev at openjdk.java.net><mailto:dev at openjdk.java.net <mailto:dev at openjdk.java.net>>; AArch32 Port 
>> >>> Project <aarch32-port-
>> >>> dev at openjdk.java.net <mailto:dev at openjdk.java.net><mailto:dev at openjdk.java.net <mailto:dev at openjdk.java.net>>>
>> >>> Subject: Re: [aarch64-port-dev ] 8153107: Unbalanced recursive locking
>> >>>
>> >>> External Email
>> >>>
>> >>> Hm, strange. It displays well for me in the archives page. Anyway, 
>> >>> I've put
>> >>> the webrev at http://cr.openjdk.java.net/~apetushkov/8153107/ <http://cr.openjdk.java.net/%7Eapetushkov/8153107/>
>> >>>
>> >>> On Thu, Jun 14, 2018 at 3:15 PM Andrew Haley 
>> >>> <aph at redhat.com <mailto:aph at redhat.com><mailto:aph at redhat.com <mailto:aph at redhat.com>>> wrote:
>> >>>
>> >>>> On 06/14/2018 12:59 PM, Andrey Petushkov wrote:
>> >>>>> So then if you prefer to leave the different logic in shared code
>> >>>>> between quick and slow paths I believe the fix for cpu/arm
>> >>>>> implementation (and removal of unnecessary workaround for
>> >>> cpu/aarch64) should look like this:
>> >>>> Something awful happened to the formatting of your mail.  Can you put
>> >>>> it up somewhere we can see the patch?
>> >>>>
>> >>>> -- 
>> >>>> Andrew Haley
>> >>>> Java Platform Lead Engineer
>> >>>> Red Hat UK Ltd. <https://www.redhat.com <https://www.redhat.com/>>
>> >>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332
>> >>>> <https://maps.google.com/?q=6035+332&entry=gmail&source=g <https://maps.google.com/?q=6035+332&entry=gmail&source=g>>F A671
>> >>>>
>> >
>> >
>> 
> 



More information about the hotspot-runtime-dev mailing list