RFR(L) Contended Locking reorder and cache line bucket (8049737)

Fri Sep 26 23:31:54 UTC 2014

Greetings,

I recently resync'ed the bits for this bucket with RT_Baseline
after the latest round of code cleanups. Here's the URL:

http://cr.openjdk.java.net/~dcubed/8049737-webrev/2-jdk9-hs-rt/

These are the bits that we're running through performance testing.
The testing is done and we're sifting through the results.

Looks like I also forgot to include the internal approvals I
received on the second internal code review round so I'll include
those for the record. I'm putting them at the bottom since I think
the mail-list processor will strip attachments.

Dan

On 8/19/14 12:13 PM, Daniel D. Daugherty wrote:
> Greetings,
>
> The Contended Locking reorder and cache line bucket is ready for
> public review. The Contended Locking JEP is now committed to JDK9 so
> it's time to start rolling out the buckets of changes...
>
> Here's the JEP link:
>
> JDK-8046133 JEP 143: Improve Contended Locking
> https://bugs.openjdk.java.net/browse/JDK-8046133
>
> The reorder and cache line bucket work is being tracked by the
> following bug ID:
>
> JDK-8049737 Contended Locking reorder and cache line bucket
> https://bugs.openjdk.java.net/browse/JDK-8049737
>
> Here is the webrev URL:
>
> http://cr.openjdk.java.net/~dcubed/8049737-webrev/1-jdk9-hs-rt/
>
> These changes have been through two rounds of internal review
> while waiting for the JEP process to finish. Here's the current
> commit message:
>
> 8049737: Contended Locking reorder and cache line bucket
> Summary: JEP-143/JDK-8046133 - optimization #1 - reorder and cache 
> line bucket.
> Reviewed-by: shade, dice, dholmes, dsimms
> Contributed-by: dave.dice at oracle.com, karen.kinnear at oracle.com, 
> daniel.daugherty at oracle.com
>
>
> Testing:
>
> - JPRT test jobs
> - new MonitorCacheLineStresser test runs
> - CallTimerGrid runs with product and fastdebug bits
> - Inflate2 24 hour runs with product bits
> - Aurora Adhoc nsk.sajdi and vm.parallel_class_loading
> - Aurora Adhoc RT/SVC runs
> - performance testing is in progress
>
> Thanks, in advance, for any comments, questions or suggestions.
>
> Dan

Aleksey's e-mail:

On 7/24/14 8:26 AM, Daniel D. Daugherty wrote:
> Thanks for the fast re-review! Replies embedded below...
>
>
> On 7/24/14 4:41 AM, Aleksey Shipilev wrote:
>> On 07/24/2014 06:41 AM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> These bits are ready for another preliminary review. Thanks to Aleksey,
>>> David H, Dave D and Mr Simms for chiming in on round 0.
>>>
>>> Here is the webrev URL for code review round 1:
>>>
>>> http://javaweb.us.oracle.com/~ddaugher/8049737-webrev/1-jdk9-hs-rt/
>> Happy to see these changes.
>
> Happy to make them. This bucket looks soooo much better now. Thanks
> again for your thorough round 0 review.
>
>
>> Should we put this review on public lists now?
>
> The JEP is not yet in the right state yet for an OpenJDK review.
>
>
>> === src/cpu/sparc/vm/globalDefinitions_sparc.hpp
>> === src/cpu/x86/vm/globalDefinitions_x86.hpp
>> * I still disagree about the heuristics here...
>
> Acknowledged.
>
>
>> === src/share/vm/runtime/objectMonitor.hpp
>> * The formatting of field comments is a bit off.
>
> Yes. I have a manual format fixes pass planned that I will likely
> get done before these bits make it through performance testing.
> I also have an automated indent fixes pass planned if I can finish
> getting the tool working right.
>
>
>> === src/share/vm/runtime/synchronizer.cpp
>> * Do you want to use a macro instead of SharedGLobals::_pad_prefix?
>
> I went back and forth on this one... I could use:
>
> DEFINE_PAD_MINUS_SIZE(0, DEFAULT_CACHE_LINE_SIZE, 0);
>
> but the _pad_prefix definition seems so much more clear:
>
> 415 struct SharedGlobals {
> 416 char _pad_prefix[DEFAULT_CACHE_LINE_SIZE];
>
>
>> * Sorry for not asking this earlier:
>>
>> 115 // gBlockList is really PaddedEnd<ObjectMonitor> *, but we don't
>> 116 // want to expose the PaddedEnd template more than necessary.
>> 117 ObjectMonitor * ObjectSynchronizer::gBlockList = NULL;
>>
>> How much of exposure it would be anyway? gBlockList is not used
>> globally, only a minute exposure in vmStructs.cpp and SA.
>
> It's not just gBlockList. It ripples into the ObjectMonitor *
> fields in ObjectMonitor itself and then all the local variables
> that access those fields, and... If I remember correctly, it was
> about 2-3X the changes that you see with the casts now. It was
> uglier (in my opinion) than the casts and I don't like the casts.
>
> Background:
> We don't trust malloc() and friends to give us ObjectMonitors that
> are aligned on cache line boundaries and are properly spaced between
> adjacent instances. So we allocate our own block of memory, chop it
> up, link the instances all together via the FreeNext field and we
> have to live with managing this memory manually.
>
> The fact that we do our own memory management is a little more
> exposed than I would like already and I didn't want to make that
> wart any bigger.
>
>
>> Wouldn't
>> declaring the proper type in this declaration avoid the potentially
>> unsafe casts in synchronizer.cpp?
>
> Yes, using the proper type is always better and safer because you
> might miss a cast. However, it's not just this declaration as I
> mentioned above.
>
>
>> * Also, why do we need the casts to PaddedEnd<ObjectMonitor>* in
>> monitors_iterate(),
>
> Pointer arithmetic here:
>
> 794 for (int i = _BLOCKSIZE - 1; i > 0; i--) {
> 795 mid = (ObjectMonitor *)(block + i);
>
>
>> oops_do(),
>
> Pointer arithmetic here:
>
> 820 for (int i = 1; i < _BLOCKSIZE; i++) {
> 821 ObjectMonitor* mid = (ObjectMonitor *)&block[i];
>
>> verify(),
>
> Pointer arithmetic here:
>
> 1669 for (int i = 1; i < _BLOCKSIZE; i++) {
> 1670 mid = (ObjectMonitor *)(block + i);
>
>
>> verify_objmon_isinpool(),
>
> Pointer arithmetic here:
>
> 1690 monitor < (ObjectMonitor *)&block[_BLOCKSIZE]) {
>
>> etc?
>> FWIW, I just reverted your changes in monitors_iterate(), and it
>> compiles fine with my GCC.
>
> Yes, it will compile and then it will do the wrong thing (TM). :-)
>
>
>> * deflate_idle_monitors() should split "else for" into "else { \n for
>> ..."? It reads as "else if"...
>
> That is a strange style and there are a few other instances of
> strange style in the monitor subsystem. I will put this on the
> list for my manual format fixes pass.
>
> Thanks again for the re-review.
>
> Dan
>
>
>>
>> Thanks,
>> -Aleksey.
>>
>
>
>

Mr Simms:

On 7/25/14 7:20 AM, Daniel D. Daugherty wrote:
 > Mr Simms,
 >
 > Thanks for the re-review!
 >
 > Dan
 >
 >
 > On 7/25/14 4:28 AM, David Simms wrote:
 >>
 >> I'm pretty happy with this round Dan, and in general actually,
 >>
 >> Re: compile-time vs runtime-time, we'll be getting quicker 
evaluation with this route.
 >>
 >> Cheers
 >> /Mr. Simms

Dave Dice:

On 7/25/14 7:18 AM, Daniel D. Daugherty wrote:
 > Thanks! Current commit message looks like:
 >
 > 8049737: Contended Locking reorder and cache line bucket
 > Summary: JEP-143/JDK-8046133 - optimization #1 - reorder and cache 
line bucket.
 > Reviewed-by: shade, dice, dholmes, dsimms
 > Contributed-by: dave.dice at oracle.com, karen.kinnear at oracle.com, 
daniel.daugherty at oracle.com
 >
 > So you get mentioned on two lines... :-)
 >
 > Dan
 >
 >
 > On 7/24/14 9:55 PM, Dave Dice wrote:
 >> Hi Dan,
 >>
 >> Yup, I’ve read it. It looks good. Consider it a +1 vote from me, if 
still need another vote.
 >>
 >> Regards
 >> Dave