RFR(XXS): 8235931: add OM_CACHE_LINE_SIZE and use smaller size on SPARCv9 and X64
Doerr, Martin
martin.doerr at sap.com
Mon Jan 20 10:46:08 UTC 2020
Hi Dan,
> It's not clear whether you reviewed the entire patch or not.
Now I have reviewed the entire patch. Sorry for the delay.
> > I hope you don't mind that I have no interest in taking on the broader
> > naming difficulties using 8235931...
Perfectly ok. I appreciate cleaning things up in a separate change.
I don't like having the "#define OM_CACHE_LINE_SIZE DEFAULT_CACHE_LINE_SIZE" twice.
Maybe padded.hpp would be a good place? It's included from both, synchronizer.hpp and objectMonitor.hpp.
synchronizer.cpp:
Not sure if the padding change within SharedGlobals is beneficial and I don't see how it's related to the ObjectMonitors.
The remaining parts make sense to me. Thanks for sharing performance results.
Best regards,
Martin
> -----Original Message-----
> From: Daniel D. Daugherty <daniel.daugherty at oracle.com>
> Sent: Freitag, 17. Januar 2020 22:20
> To: Doerr, Martin <martin.doerr at sap.com>; Claes Redestad
> <claes.redestad at oracle.com>; David Holmes <david.holmes at oracle.com>;
> hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(XXS): 8235931: add OM_CACHE_LINE_SIZE and use smaller
> size on SPARCv9 and X64
>
> Martin,
>
> It's not clear whether you reviewed the entire patch or not.
>
> Currently I'm planning to list dholmes and redestad as reviewers. Do
> you also wish to be listed as a reviewer?
>
> Dan
>
>
> On 1/17/20 1:06 PM, Daniel D. Daugherty wrote:
> > Hi Martin,
> >
> > Thanks for chiming in on this review thread.
> >
> > For the purposes of this fix (8235931), I'm going to stick with
> > OM_CACHE_LINE_SIZE and we'll have to hash out a broader definition
> > of "cache line size" via a different bug ID.
> >
> > I hope you don't mind that I have no interest in taking on the broader
> > naming difficulties using 8235931...
> >
> > Dan
> >
> >
> > On 12/20/19 9:24 AM, Doerr, Martin wrote:
> >> Hi,
> >>
> >> Just a comment on
> >>> CACHE_LINE_SIZE 64
> >>> DOUBLE_CACHE_LINE_SIZE 128
> >> I think CACHE_LINE_SIZE is too unspecific.
> >> There are L1I L1D, L2, L3, ... caches and they can have different
> >> cache line sizes.
> >> In addition, they can change between processor models.
> >>
> >> So we should have a better name than just "CACHE_LINE_SIZE".
> >>
> >> PPC and s390 already have large cache lines (which are real, not
> >> faked for performance considerations).
> >> I guess we don't want to use even more padding on these platforms.
> >> So we will probably want to set both constants to the same value
> >> (needs to get evaluated).
> >>
> >> Best regards,
> >> Martin
> >>
> >>
> >>> -----Original Message-----
> >>> From: hotspot-runtime-dev <hotspot-runtime-dev-
> >>> bounces at openjdk.java.net> On Behalf Of Daniel D. Daugherty
> >>> Sent: Donnerstag, 19. Dezember 2019 18:51
> >>> To: Claes Redestad <claes.redestad at oracle.com>; David Holmes
> >>> <david.holmes at oracle.com>; hotspot-runtime-dev at openjdk.java.net
> >>> Subject: Re: RFR(XXS): 8235931: add OM_CACHE_LINE_SIZE and use
> smaller
> >>> size on SPARCv9 and X64
> >>>
> >>> Hi David and Claes,
> >>>
> >>> Thanks for the fast reviews! Sorry for my delay in responding. I've
> >>> been
> >>> working on the Async Monitor Deflation wiki updates...
> >>>
> >>> Since Claes was kind enough to reply to David's post and included all
> >>> of David's comments, I'm going to reply to both here.
> >>>
> >>>
> >>> On 12/18/19 10:03 AM, Claes Redestad wrote:
> >>>> Hi,
> >>>>
> >>>> On 2019-12-18 06:50, David Holmes wrote:
> >>>>> Hi Dan,
> >>>>>
> >>>>> On 18/12/2019 7:34 am, Daniel D. Daugherty wrote:
> >>>>>> Greetings,
> >>>>>>
> >>>>>> I'm extracting another standalone fix from the Async Monitor
> >>>>>> Deflation
> >>>>>> project (JDK-8153224) and sending it out for review (and testing)
> >>>>>> separately. This is an XXS sized fix, but since it involves cache
> >>>>>> line sizes I don't think a trivial review is appropriate.
> >>> First, David's part:
> >>>
> >>>
> >>>>> Can you elaborate on why a smaller cache-line-size assumption is
> >>>>> beneficial here?
> >>> I originally heard about switching from 128 byte to 64 byte
> >>> cache-line-size
> >>> at the June 2019 Runtime offsite. I don't remember who brought it
> >>> up, but
> >>> they were saying that Intel no longer recommends using 128 byte
> >>> cache-line-size
> >>> (and hasn't recommended that for some time). IIRC, the person also said
> >>> that
> >>> their Intel contact was a bit surprised about the 128 byte
> >>> cache-line-size
> >>> recommendation.
> >>>
> >>> When I talked about Async Monitor Deflation in June I mentioned that I
> >>> was going to try out 64 byte cache-line-size for ObjectMonitors and so
> >>> far I haven't seen any performance issues.
> >>>
> >>>
> >>>>> Do you have performance numbers for the bug report?
> >>> I've been doing Aurora SPECjbb2015 runs (tuned G1, 10 iterations)
> >>> between
> >>> my baseline (jdk-14+26) and the various fixes. For this fix (8235931)
> >>> relative to my first baseline run:
> >>>
> >>> criticalJOPS -0.40% (Non-significant)
> >>> maxJOPS +0.23% (Non-significant)
> >>>
> >>> I reran the baseline a second time using the same bits and for this fix
> >>> (8236035) relative to the second run:
> >>>
> >>> criticalJOPS +6.96% (Non-significant)
> >>> maxJOPS +9.96% (Non-significant)
> >>>
> >>> When I compare the two baseline runs:
> >>>
> >>> criticalJOPS -6.65% (Non-significant)
> >>> maxJOPS -8.86% (Non-significant)
> >>>
> >>> So even with 10 iterations, these SPECjbb2015 runs are all over the
> >>> place. And Aurora is agreeing with that by flagging the results as
> >>> "Non-significant" because the 'p' values are too large.
> >>>
> >>> I'm doing another set of Aurora SPECjbb2015 runs and I'm bumping the
> >>> iterations from 10 to 15. In the recent past, Per told me I might
> >>> have to do 20 runs of SPECjbb2015 to shake out the variations.
> >>> We'll see what turns up with this latest set of runs...
> >>>
> >>>
> >>> Second, Claes' part:
> >>>
> >>>> I did numerous experiments on this during work on JEP-143 and saw no
> >>>> improvement on neither "real" benchmarks nor stress tests that try to
> >>>> explicitly provoke cross-monitor false sharing on various fields in
> >>>> ObjectMonitors. I've not re-run those experiments on new hardware,
> so
> >>>> YMMV. Should probably check if they're still usable..
> >>> As always, I'm interested in your ObjectMonitor results.
> >>>
> >>>
> >>>> Specifically, for padding values below 64 we could provoke false
> >>>> sharing
> >>>> effects on stress tests, with negligible movement on "real"
> >>>> benchmarks.
> >>>> For values above 64 we just waste space.
> >>> Yup. You've said that a few times and that's one of the reasons that
> >>> I've included this change in the Async Monitor Deflation project.
> >>>
> >>>
> >>>> There exists a legend that you need double the physical cache line
> >>>> padding to not provoke the wrath of false sharing in the presence of
> >>>> aggressive prefetching, which is the reason why the
> >>>> "DEFAULT_CACHE_LINE_SIZE" is set to twice the typical physical cache
> >>>> line size in bytes. If we explicitly think we need to pad two lines,
> >>>> then the constant DEFAULT_CACHE_LINE_SIZE should be named
> >>> differently.
> >>>
> >>> I wouldn't call it "a legend". When we made this change years ago, Dice
> >>> had very specific advice from someone at Intel to make this change. He
> >>> could probably still dig up the details...
> >>>
> >>>
> >>>> The patch in question looks good to me,
> >>> Thanks!
> >>>
> >>>
> >>>> although I'd prefer something like:
> >>>>
> >>>> CACHE_LINE_SIZE 64
> >>>> DOUBLE_CACHE_LINE_SIZE 128
> >>> I think we should make that change on a more HotSpot wide basis with
> >>> a different bug ID. Also, do you see those values (64 and 128) being
> >>> appropriate for all the CPUs/platforms? Or do you foresee those values
> >>> being defined per-CPU/platform?
> >>>
> >>>
> >>>> .. and then use whichever is deemed more appropriate on a case by
> case
> >>>> basis. (Hint: it's probably CACHE_LINE_SIZE)
> >>> I accept that hint which is why I'm going with OM_CACHE_LINE_SIZE
> >>> so that ObjectMonitors can have their own case... :-)
> >>>
> >>>
> >>>> $.02
> >>>>
> >>>> /Claes
> >>> And back to David:
> >>>>> Otherwise the mechanics of the change look fine, except I would
> >>>>> suggest that this:
> >>>>>
> >>>>> #ifndef OM_CACHE_LINE_SIZE
> >>>>> // Use DEFAULT_CACHE_LINE_SIZE if not already specified for
> >>>>> // the current build platform.
> >>>>> #define OM_CACHE_LINE_SIZE DEFAULT_CACHE_LINE_SIZE
> >>>>> #endif
> >>>>>
> >>>>> should go in the shared globalDefinitions.hpp file.
> >>> I intentionally didn't want to expose OM_CACHE_LINE_SIZE in
> >>> globalDefinitions.hpp. I'm trying really hard to stay out of that file.
> >>> Also, is globalDefinitions.hpp processed before or after the
> >>> CPU/platform
> >>> specific files?
> >>>
> >>> Obviously I need to specify OM_CACHE_LINE_SIZE in the two CPU files:
> >>>
> >>> src/hotspot/cpu/sparc/globalDefinitions_sparc.hpp
> >>> src/hotspot/cpu/x86/globalDefinitions_x86.hpp
> >>>
> >>> and both halves of the ObjectMonitor subsystem need the above logic:
> >>>
> >>> src/hotspot/share/runtime/objectMonitor.hpp
> >>> src/hotspot/share/runtime/synchronizer.hpp
> >>>
> >>> but we don't need to expose OM_CACHE_LINE_SIZE elsewhere. Of
> course,
> >>> if we end up adding CACHE_LINE_SIZE and DOUBLE_CACHE_LINE_SIZE
> to
> >>> the system *and* if we can get all the CPUs/platforms to agree to use
> >>> CACHE_LINE_SIZE for ObjectMonitors, then OM_CACHE_LINE_SIZE can
> be
> >>> retired.
> >>>
> >>> Thanks again for the fast reviews!
> >>>
> >>> Dan
> >>>
> >>>
> >>>>> Thanks,
> >>>>> David
> >>>>> -----
> >>>>>
> >>>>>> JDK-8235931 add OM_CACHE_LINE_SIZE and use smaller size on
> >>>>>> SPARCv9 and X64
> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8235931
> >>>>>>
> >>>>>> Here's the webrev URL:
> >>>>>>
> >>>>>> http://cr.openjdk.java.net/~dcubed/8235931-webrev/0-for-jdk15/
> >>>>>>
> >>>>>> Folks that have reviewed JDK-8153224 will recognize these changes
> as
> >>>>>> a subset of the cache line changes from the Async Monitor Deflation
> >>>>>> project. It's a subset because there are more fields that need to be
> >>>>>> on separate cache lines in the Async Monitor Deflation project.
> >>>>>>
> >>>>>> These changes are being tested in a Mach5 Tier[1-3] run that's still
> >>>>>> running (JDK14 ATR has priority). I've also run these changes thru
> >>>>>> a set of 10 SPECjbb2015 runs; I'm waiting for my various SPECjbb2015
> >>>>>> runs to finish so I can do the analysis.
> >>>>>>
> >>>>>> Thanks, in advance, for comments, questions or suggestions.
> >>>>>>
> >>>>>> Dan
> >
> >
More information about the hotspot-runtime-dev
mailing list