RFR(XXS): 8235931: add OM_CACHE_LINE_SIZE and use smaller size on SPARCv9 and X64

Daniel D. Daugherty daniel.daugherty at oracle.com
Thu Dec 19 17:51:17 UTC 2019


Hi David and Claes,

Thanks for the fast reviews! Sorry for my delay in responding. I've been
working on the Async Monitor Deflation wiki updates...

Since Claes was kind enough to reply to David's post and included all
of David's comments, I'm going to reply to both here.


On 12/18/19 10:03 AM, Claes Redestad wrote:
> Hi,
>
> On 2019-12-18 06:50, David Holmes wrote:
>> Hi Dan,
>>
>> On 18/12/2019 7:34 am, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> I'm extracting another standalone fix from the Async Monitor Deflation
>>> project (JDK-8153224) and sending it out for review (and testing)
>>> separately. This is an XXS sized fix, but since it involves cache
>>> line sizes I don't think a trivial review is appropriate.

First, David's part:


>>
>> Can you elaborate on why a smaller cache-line-size assumption is 
>> beneficial here?

I originally heard about switching from 128 byte to 64 byte cache-line-size
at the June 2019 Runtime offsite. I don't remember who brought it up, but
they were saying that Intel no longer recommends using 128 byte 
cache-line-size
(and hasn't recommended that for some time). IIRC, the person also said that
their Intel contact was a bit surprised about the 128 byte cache-line-size
recommendation.

When I talked about Async Monitor Deflation in June I mentioned that I
was going to try out 64 byte cache-line-size for ObjectMonitors and so
far I haven't seen any performance issues.


>> Do you have performance numbers for the bug report?

I've been doing Aurora SPECjbb2015 runs (tuned G1, 10 iterations) between
my baseline (jdk-14+26) and the various fixes. For this fix (8235931)
relative to my first baseline run:

     criticalJOPS  -0.40%  (Non-significant)
     maxJOPS       +0.23%  (Non-significant)

I reran the baseline a second time using the same bits and for this fix
(8236035) relative to the second run:

     criticalJOPS  +6.96%  (Non-significant)
     maxJOPS       +9.96%  (Non-significant)

When I compare the two baseline runs:

     criticalJOPS   -6.65%  (Non-significant)
     maxJOPS        -8.86%  (Non-significant)

So even with 10 iterations, these SPECjbb2015 runs are all over the
place. And Aurora is agreeing with that by flagging the results as
"Non-significant" because the 'p' values are too large.

I'm doing another set of Aurora SPECjbb2015 runs and I'm bumping the
iterations from 10 to 15. In the recent past, Per told me I might
have to do 20 runs of SPECjbb2015 to shake out the variations.
We'll see what turns up with this latest set of runs...


Second, Claes' part:

> I did numerous experiments on this during work on JEP-143 and saw no
> improvement on neither "real" benchmarks nor stress tests that try to
> explicitly provoke cross-monitor false sharing on various fields in
> ObjectMonitors. I've not re-run those experiments on new hardware, so
> YMMV. Should probably check if they're still usable..

As always, I'm interested in your ObjectMonitor results.


> Specifically, for padding values below 64 we could provoke false sharing
> effects on stress tests, with negligible movement on "real" benchmarks.
> For values above 64 we just waste space.

Yup. You've said that a few times and that's one of the reasons that
I've included this change in the Async Monitor Deflation project.


> There exists a legend that you need double the physical cache line
> padding to not provoke the wrath of false sharing in the presence of
> aggressive prefetching, which is the reason why the
> "DEFAULT_CACHE_LINE_SIZE" is set to twice the typical physical cache
> line size in bytes. If we explicitly think we need to pad two lines,
> then the constant DEFAULT_CACHE_LINE_SIZE should be named differently.

I wouldn't call it "a legend". When we made this change years ago, Dice
had very specific advice from someone at Intel to make this change. He
could probably still dig up the details...


> The patch in question looks good to me,

Thanks!


> although I'd prefer something like:
>
> CACHE_LINE_SIZE 64
> DOUBLE_CACHE_LINE_SIZE 128

I think we should make that change on a more HotSpot wide basis with
a different bug ID. Also, do you see those values (64 and 128) being
appropriate for all the CPUs/platforms? Or do you foresee those values
being defined per-CPU/platform?


> .. and then use whichever is deemed more appropriate on a case by case
> basis. (Hint: it's probably CACHE_LINE_SIZE)

I accept that hint which is why I'm going with OM_CACHE_LINE_SIZE
so that ObjectMonitors can have their own case... :-)


>
> $.02
>
> /Claes

And back to David:
>
>> Otherwise the mechanics of the change look fine, except I would 
>> suggest that this:
>>
>> #ifndef OM_CACHE_LINE_SIZE
>> // Use DEFAULT_CACHE_LINE_SIZE if not already specified for
>> // the current build platform.
>> #define OM_CACHE_LINE_SIZE DEFAULT_CACHE_LINE_SIZE
>> #endif
>>
>> should go in the shared globalDefinitions.hpp file.

I intentionally didn't want to expose OM_CACHE_LINE_SIZE in
globalDefinitions.hpp. I'm trying really hard to stay out of that file.
Also, is globalDefinitions.hpp processed before or after the CPU/platform
specific files?

Obviously I need to specify OM_CACHE_LINE_SIZE in the two CPU files:

     src/hotspot/cpu/sparc/globalDefinitions_sparc.hpp
     src/hotspot/cpu/x86/globalDefinitions_x86.hpp

and both halves of the ObjectMonitor subsystem need the above logic:

     src/hotspot/share/runtime/objectMonitor.hpp
     src/hotspot/share/runtime/synchronizer.hpp

but we don't need to expose OM_CACHE_LINE_SIZE elsewhere. Of course,
if we end up adding CACHE_LINE_SIZE and DOUBLE_CACHE_LINE_SIZE to
the system *and* if we can get all the CPUs/platforms to agree to use
CACHE_LINE_SIZE for ObjectMonitors, then OM_CACHE_LINE_SIZE can be
retired.

Thanks again for the fast reviews!

Dan


>>
>> Thanks,
>> David
>> -----
>>
>>>      JDK-8235931 add OM_CACHE_LINE_SIZE and use smaller size on 
>>> SPARCv9 and X64
>>>      https://bugs.openjdk.java.net/browse/JDK-8235931
>>>
>>> Here's the webrev URL:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8235931-webrev/0-for-jdk15/
>>>
>>> Folks that have reviewed JDK-8153224 will recognize these changes as
>>> a subset of the cache line changes from the Async Monitor Deflation
>>> project. It's a subset because there are more fields that need to be
>>> on separate cache lines in the Async Monitor Deflation project.
>>>
>>> These changes are being tested in a Mach5 Tier[1-3] run that's still
>>> running (JDK14 ATR has priority). I've also run these changes thru
>>> a set of 10 SPECjbb2015 runs; I'm waiting for my various SPECjbb2015
>>> runs to finish so I can do the analysis.
>>>
>>> Thanks, in advance, for comments, questions or suggestions.
>>>
>>> Dan



More information about the hotspot-runtime-dev mailing list