RFR(XXS): 8235931: add OM_CACHE_LINE_SIZE and use smaller size on SPARCv9 and X64

Doerr, Martin martin.doerr at sap.com
Fri Dec 20 14:24:24 UTC 2019


Hi,

Just a comment on
> CACHE_LINE_SIZE 64
> DOUBLE_CACHE_LINE_SIZE 128

I think CACHE_LINE_SIZE is too unspecific.
There are L1I L1D, L2, L3, ... caches and they can have different cache line sizes.
In addition, they can change between processor models.

So we should have a better name than just "CACHE_LINE_SIZE".

PPC and s390 already have large cache lines (which are real, not faked for performance considerations).
I guess we don't want to use even more padding on these platforms.
So we will probably want to set both constants to the same value (needs to get evaluated).

Best regards,
Martin


> -----Original Message-----
> From: hotspot-runtime-dev <hotspot-runtime-dev-
> bounces at openjdk.java.net> On Behalf Of Daniel D. Daugherty
> Sent: Donnerstag, 19. Dezember 2019 18:51
> To: Claes Redestad <claes.redestad at oracle.com>; David Holmes
> <david.holmes at oracle.com>; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(XXS): 8235931: add OM_CACHE_LINE_SIZE and use smaller
> size on SPARCv9 and X64
> 
> Hi David and Claes,
> 
> Thanks for the fast reviews! Sorry for my delay in responding. I've been
> working on the Async Monitor Deflation wiki updates...
> 
> Since Claes was kind enough to reply to David's post and included all
> of David's comments, I'm going to reply to both here.
> 
> 
> On 12/18/19 10:03 AM, Claes Redestad wrote:
> > Hi,
> >
> > On 2019-12-18 06:50, David Holmes wrote:
> >> Hi Dan,
> >>
> >> On 18/12/2019 7:34 am, Daniel D. Daugherty wrote:
> >>> Greetings,
> >>>
> >>> I'm extracting another standalone fix from the Async Monitor Deflation
> >>> project (JDK-8153224) and sending it out for review (and testing)
> >>> separately. This is an XXS sized fix, but since it involves cache
> >>> line sizes I don't think a trivial review is appropriate.
> 
> First, David's part:
> 
> 
> >>
> >> Can you elaborate on why a smaller cache-line-size assumption is
> >> beneficial here?
> 
> I originally heard about switching from 128 byte to 64 byte cache-line-size
> at the June 2019 Runtime offsite. I don't remember who brought it up, but
> they were saying that Intel no longer recommends using 128 byte
> cache-line-size
> (and hasn't recommended that for some time). IIRC, the person also said
> that
> their Intel contact was a bit surprised about the 128 byte cache-line-size
> recommendation.
> 
> When I talked about Async Monitor Deflation in June I mentioned that I
> was going to try out 64 byte cache-line-size for ObjectMonitors and so
> far I haven't seen any performance issues.
> 
> 
> >> Do you have performance numbers for the bug report?
> 
> I've been doing Aurora SPECjbb2015 runs (tuned G1, 10 iterations) between
> my baseline (jdk-14+26) and the various fixes. For this fix (8235931)
> relative to my first baseline run:
> 
>      criticalJOPS  -0.40%  (Non-significant)
>      maxJOPS       +0.23%  (Non-significant)
> 
> I reran the baseline a second time using the same bits and for this fix
> (8236035) relative to the second run:
> 
>      criticalJOPS  +6.96%  (Non-significant)
>      maxJOPS       +9.96%  (Non-significant)
> 
> When I compare the two baseline runs:
> 
>      criticalJOPS   -6.65%  (Non-significant)
>      maxJOPS        -8.86%  (Non-significant)
> 
> So even with 10 iterations, these SPECjbb2015 runs are all over the
> place. And Aurora is agreeing with that by flagging the results as
> "Non-significant" because the 'p' values are too large.
> 
> I'm doing another set of Aurora SPECjbb2015 runs and I'm bumping the
> iterations from 10 to 15. In the recent past, Per told me I might
> have to do 20 runs of SPECjbb2015 to shake out the variations.
> We'll see what turns up with this latest set of runs...
> 
> 
> Second, Claes' part:
> 
> > I did numerous experiments on this during work on JEP-143 and saw no
> > improvement on neither "real" benchmarks nor stress tests that try to
> > explicitly provoke cross-monitor false sharing on various fields in
> > ObjectMonitors. I've not re-run those experiments on new hardware, so
> > YMMV. Should probably check if they're still usable..
> 
> As always, I'm interested in your ObjectMonitor results.
> 
> 
> > Specifically, for padding values below 64 we could provoke false sharing
> > effects on stress tests, with negligible movement on "real" benchmarks.
> > For values above 64 we just waste space.
> 
> Yup. You've said that a few times and that's one of the reasons that
> I've included this change in the Async Monitor Deflation project.
> 
> 
> > There exists a legend that you need double the physical cache line
> > padding to not provoke the wrath of false sharing in the presence of
> > aggressive prefetching, which is the reason why the
> > "DEFAULT_CACHE_LINE_SIZE" is set to twice the typical physical cache
> > line size in bytes. If we explicitly think we need to pad two lines,
> > then the constant DEFAULT_CACHE_LINE_SIZE should be named
> differently.
> 
> I wouldn't call it "a legend". When we made this change years ago, Dice
> had very specific advice from someone at Intel to make this change. He
> could probably still dig up the details...
> 
> 
> > The patch in question looks good to me,
> 
> Thanks!
> 
> 
> > although I'd prefer something like:
> >
> > CACHE_LINE_SIZE 64
> > DOUBLE_CACHE_LINE_SIZE 128
> 
> I think we should make that change on a more HotSpot wide basis with
> a different bug ID. Also, do you see those values (64 and 128) being
> appropriate for all the CPUs/platforms? Or do you foresee those values
> being defined per-CPU/platform?
> 
> 
> > .. and then use whichever is deemed more appropriate on a case by case
> > basis. (Hint: it's probably CACHE_LINE_SIZE)
> 
> I accept that hint which is why I'm going with OM_CACHE_LINE_SIZE
> so that ObjectMonitors can have their own case... :-)
> 
> 
> >
> > $.02
> >
> > /Claes
> 
> And back to David:
> >
> >> Otherwise the mechanics of the change look fine, except I would
> >> suggest that this:
> >>
> >> #ifndef OM_CACHE_LINE_SIZE
> >> // Use DEFAULT_CACHE_LINE_SIZE if not already specified for
> >> // the current build platform.
> >> #define OM_CACHE_LINE_SIZE DEFAULT_CACHE_LINE_SIZE
> >> #endif
> >>
> >> should go in the shared globalDefinitions.hpp file.
> 
> I intentionally didn't want to expose OM_CACHE_LINE_SIZE in
> globalDefinitions.hpp. I'm trying really hard to stay out of that file.
> Also, is globalDefinitions.hpp processed before or after the CPU/platform
> specific files?
> 
> Obviously I need to specify OM_CACHE_LINE_SIZE in the two CPU files:
> 
>      src/hotspot/cpu/sparc/globalDefinitions_sparc.hpp
>      src/hotspot/cpu/x86/globalDefinitions_x86.hpp
> 
> and both halves of the ObjectMonitor subsystem need the above logic:
> 
>      src/hotspot/share/runtime/objectMonitor.hpp
>      src/hotspot/share/runtime/synchronizer.hpp
> 
> but we don't need to expose OM_CACHE_LINE_SIZE elsewhere. Of course,
> if we end up adding CACHE_LINE_SIZE and DOUBLE_CACHE_LINE_SIZE to
> the system *and* if we can get all the CPUs/platforms to agree to use
> CACHE_LINE_SIZE for ObjectMonitors, then OM_CACHE_LINE_SIZE can be
> retired.
> 
> Thanks again for the fast reviews!
> 
> Dan
> 
> 
> >>
> >> Thanks,
> >> David
> >> -----
> >>
> >>>      JDK-8235931 add OM_CACHE_LINE_SIZE and use smaller size on
> >>> SPARCv9 and X64
> >>>      https://bugs.openjdk.java.net/browse/JDK-8235931
> >>>
> >>> Here's the webrev URL:
> >>>
> >>> http://cr.openjdk.java.net/~dcubed/8235931-webrev/0-for-jdk15/
> >>>
> >>> Folks that have reviewed JDK-8153224 will recognize these changes as
> >>> a subset of the cache line changes from the Async Monitor Deflation
> >>> project. It's a subset because there are more fields that need to be
> >>> on separate cache lines in the Async Monitor Deflation project.
> >>>
> >>> These changes are being tested in a Mach5 Tier[1-3] run that's still
> >>> running (JDK14 ATR has priority). I've also run these changes thru
> >>> a set of 10 SPECjbb2015 runs; I'm waiting for my various SPECjbb2015
> >>> runs to finish so I can do the analysis.
> >>>
> >>> Thanks, in advance, for comments, questions or suggestions.
> >>>
> >>> Dan



More information about the hotspot-runtime-dev mailing list