RFR(XXS): 8235931: add OM_CACHE_LINE_SIZE and use smaller size on SPARCv9 and X64

Doerr, Martin martin.doerr at sap.com
Wed Jan 22 10:12:30 UTC 2020


Hi Dan,

> I own the problem of updating OM_CACHE_LINE_SIZE to whatever symbol
> we decide replaces DEFAULT_CACHE_LINE_SIZE in the future so my preference is to
> leave things as they are until we reach that point. Can you live with that?
Yes.

I'd slightly prefer to leave SharedGlobals unchanged, but I guess it doesn't really matter.
So I can live with your version, too.

Best regards,
Martin


> -----Original Message-----
> From: Daniel D. Daugherty <daniel.daugherty at oracle.com>
> Sent: Dienstag, 21. Januar 2020 16:22
> To: Doerr, Martin <martin.doerr at sap.com>; Claes Redestad
> <claes.redestad at oracle.com>; David Holmes <david.holmes at oracle.com>;
> hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(XXS): 8235931: add OM_CACHE_LINE_SIZE and use smaller
> size on SPARCv9 and X64
> 
> On 1/20/20 5:46 AM, Doerr, Martin wrote:
> > Hi Dan,
> >
> >> It's not clear whether you reviewed the entire patch or not.
> > Now I have reviewed the entire patch. Sorry for the delay.
> 
> Thanks for the review!
> 
> 
> >>> I hope you don't mind that I have no interest in taking on the broader
> >>> naming difficulties using 8235931...
> > Perfectly ok. I appreciate cleaning things up in a separate change.
> >
> > I don't like having the "#define OM_CACHE_LINE_SIZE
> DEFAULT_CACHE_LINE_SIZE" twice.
> > Maybe padded.hpp would be a good place? It's included from both,
> synchronizer.hpp and objectMonitor.hpp.
> 
> I don't think any of us (David H, you or me) like OM_CACHE_LINE_SIZE being
> defined in two places, but I'm trying to avoid polluting non-Monitor places
> with an OM_... symbol. That just doesn't feel right.
> 
> I own the problem of updating OM_CACHE_LINE_SIZE to whatever symbol
> we
> decide
> replaces DEFAULT_CACHE_LINE_SIZE in the future so my preference is to
> leave
> things as they are until we reach that point. Can you live with that?
> 
> 
> > synchronizer.cpp:
> > Not sure if the padding change within SharedGlobals is beneficial and I
> don't see how it's related to the ObjectMonitors.
> 
> It seemed strange to change from DEFAULT_CACHE_LINE_SIZE to
> OM_CACHE_LINE_SIZE
> in one part of synchronizer.cpp, but not in all parts... Also:
> 
> L1666: void
> ObjectSynchronizer::finish_deflate_idle_monitors(DeflateMonitorCounters*
> counters) {
> L1692:   GVars.stw_random = os::random();
> L1693:   GVars.stw_cycle++;
> 
> so two of the three fields are touched by ObjectMonitor code...
> 
> 
> > The remaining parts make sense to me.
> 
> Thanks!
> 
> 
> > Thanks for sharing performance results.
> 
> You're welcome! I'm doing another round of perf testing after rebasing
> everything to jdk-14+32... this time 25 runs per config... I'm hoping to
> get more stable results, but I'm not optimistic...
> 
> Dan
> 
> 
> >
> > Best regards,
> > Martin
> >
> >
> >> -----Original Message-----
> >> From: Daniel D. Daugherty <daniel.daugherty at oracle.com>
> >> Sent: Freitag, 17. Januar 2020 22:20
> >> To: Doerr, Martin <martin.doerr at sap.com>; Claes Redestad
> >> <claes.redestad at oracle.com>; David Holmes
> <david.holmes at oracle.com>;
> >> hotspot-runtime-dev at openjdk.java.net
> >> Subject: Re: RFR(XXS): 8235931: add OM_CACHE_LINE_SIZE and use
> smaller
> >> size on SPARCv9 and X64
> >>
> >> Martin,
> >>
> >> It's not clear whether you reviewed the entire patch or not.
> >>
> >> Currently I'm planning to list dholmes and redestad as reviewers. Do
> >> you also wish to be listed as a reviewer?
> >>
> >> Dan
> >>
> >>
> >> On 1/17/20 1:06 PM, Daniel D. Daugherty wrote:
> >>> Hi Martin,
> >>>
> >>> Thanks for chiming in on this review thread.
> >>>
> >>> For the purposes of this fix (8235931), I'm going to stick with
> >>> OM_CACHE_LINE_SIZE and we'll have to hash out a broader definition
> >>> of "cache line size" via a different bug ID.
> >>>
> >>> I hope you don't mind that I have no interest in taking on the broader
> >>> naming difficulties using 8235931...
> >>>
> >>> Dan
> >>>
> >>>
> >>> On 12/20/19 9:24 AM, Doerr, Martin wrote:
> >>>> Hi,
> >>>>
> >>>> Just a comment on
> >>>>> CACHE_LINE_SIZE 64
> >>>>> DOUBLE_CACHE_LINE_SIZE 128
> >>>> I think CACHE_LINE_SIZE is too unspecific.
> >>>> There are L1I L1D, L2, L3, ... caches and they can have different
> >>>> cache line sizes.
> >>>> In addition, they can change between processor models.
> >>>>
> >>>> So we should have a better name than just "CACHE_LINE_SIZE".
> >>>>
> >>>> PPC and s390 already have large cache lines (which are real, not
> >>>> faked for performance considerations).
> >>>> I guess we don't want to use even more padding on these platforms.
> >>>> So we will probably want to set both constants to the same value
> >>>> (needs to get evaluated).
> >>>>
> >>>> Best regards,
> >>>> Martin
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
> >>>>> bounces at openjdk.java.net> On Behalf Of Daniel D. Daugherty
> >>>>> Sent: Donnerstag, 19. Dezember 2019 18:51
> >>>>> To: Claes Redestad <claes.redestad at oracle.com>; David Holmes
> >>>>> <david.holmes at oracle.com>; hotspot-runtime-
> dev at openjdk.java.net
> >>>>> Subject: Re: RFR(XXS): 8235931: add OM_CACHE_LINE_SIZE and use
> >> smaller
> >>>>> size on SPARCv9 and X64
> >>>>>
> >>>>> Hi David and Claes,
> >>>>>
> >>>>> Thanks for the fast reviews! Sorry for my delay in responding. I've
> >>>>> been
> >>>>> working on the Async Monitor Deflation wiki updates...
> >>>>>
> >>>>> Since Claes was kind enough to reply to David's post and included all
> >>>>> of David's comments, I'm going to reply to both here.
> >>>>>
> >>>>>
> >>>>> On 12/18/19 10:03 AM, Claes Redestad wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> On 2019-12-18 06:50, David Holmes wrote:
> >>>>>>> Hi Dan,
> >>>>>>>
> >>>>>>> On 18/12/2019 7:34 am, Daniel D. Daugherty wrote:
> >>>>>>>> Greetings,
> >>>>>>>>
> >>>>>>>> I'm extracting another standalone fix from the Async Monitor
> >>>>>>>> Deflation
> >>>>>>>> project (JDK-8153224) and sending it out for review (and testing)
> >>>>>>>> separately. This is an XXS sized fix, but since it involves cache
> >>>>>>>> line sizes I don't think a trivial review is appropriate.
> >>>>> First, David's part:
> >>>>>
> >>>>>
> >>>>>>> Can you elaborate on why a smaller cache-line-size assumption is
> >>>>>>> beneficial here?
> >>>>> I originally heard about switching from 128 byte to 64 byte
> >>>>> cache-line-size
> >>>>> at the June 2019 Runtime offsite. I don't remember who brought it
> >>>>> up, but
> >>>>> they were saying that Intel no longer recommends using 128 byte
> >>>>> cache-line-size
> >>>>> (and hasn't recommended that for some time). IIRC, the person also
> said
> >>>>> that
> >>>>> their Intel contact was a bit surprised about the 128 byte
> >>>>> cache-line-size
> >>>>> recommendation.
> >>>>>
> >>>>> When I talked about Async Monitor Deflation in June I mentioned that
> I
> >>>>> was going to try out 64 byte cache-line-size for ObjectMonitors and so
> >>>>> far I haven't seen any performance issues.
> >>>>>
> >>>>>
> >>>>>>> Do you have performance numbers for the bug report?
> >>>>> I've been doing Aurora SPECjbb2015 runs (tuned G1, 10 iterations)
> >>>>> between
> >>>>> my baseline (jdk-14+26) and the various fixes. For this fix (8235931)
> >>>>> relative to my first baseline run:
> >>>>>
> >>>>>        criticalJOPS  -0.40%  (Non-significant)
> >>>>>        maxJOPS       +0.23%  (Non-significant)
> >>>>>
> >>>>> I reran the baseline a second time using the same bits and for this fix
> >>>>> (8236035) relative to the second run:
> >>>>>
> >>>>>        criticalJOPS  +6.96%  (Non-significant)
> >>>>>        maxJOPS       +9.96%  (Non-significant)
> >>>>>
> >>>>> When I compare the two baseline runs:
> >>>>>
> >>>>>        criticalJOPS   -6.65%  (Non-significant)
> >>>>>        maxJOPS        -8.86%  (Non-significant)
> >>>>>
> >>>>> So even with 10 iterations, these SPECjbb2015 runs are all over the
> >>>>> place. And Aurora is agreeing with that by flagging the results as
> >>>>> "Non-significant" because the 'p' values are too large.
> >>>>>
> >>>>> I'm doing another set of Aurora SPECjbb2015 runs and I'm bumping
> the
> >>>>> iterations from 10 to 15. In the recent past, Per told me I might
> >>>>> have to do 20 runs of SPECjbb2015 to shake out the variations.
> >>>>> We'll see what turns up with this latest set of runs...
> >>>>>
> >>>>>
> >>>>> Second, Claes' part:
> >>>>>
> >>>>>> I did numerous experiments on this during work on JEP-143 and saw
> no
> >>>>>> improvement on neither "real" benchmarks nor stress tests that try
> to
> >>>>>> explicitly provoke cross-monitor false sharing on various fields in
> >>>>>> ObjectMonitors. I've not re-run those experiments on new
> hardware,
> >> so
> >>>>>> YMMV. Should probably check if they're still usable..
> >>>>> As always, I'm interested in your ObjectMonitor results.
> >>>>>
> >>>>>
> >>>>>> Specifically, for padding values below 64 we could provoke false
> >>>>>> sharing
> >>>>>> effects on stress tests, with negligible movement on "real"
> >>>>>> benchmarks.
> >>>>>> For values above 64 we just waste space.
> >>>>> Yup. You've said that a few times and that's one of the reasons that
> >>>>> I've included this change in the Async Monitor Deflation project.
> >>>>>
> >>>>>
> >>>>>> There exists a legend that you need double the physical cache line
> >>>>>> padding to not provoke the wrath of false sharing in the presence of
> >>>>>> aggressive prefetching, which is the reason why the
> >>>>>> "DEFAULT_CACHE_LINE_SIZE" is set to twice the typical physical
> cache
> >>>>>> line size in bytes. If we explicitly think we need to pad two lines,
> >>>>>> then the constant DEFAULT_CACHE_LINE_SIZE should be named
> >>>>> differently.
> >>>>>
> >>>>> I wouldn't call it "a legend". When we made this change years ago,
> Dice
> >>>>> had very specific advice from someone at Intel to make this change.
> He
> >>>>> could probably still dig up the details...
> >>>>>
> >>>>>
> >>>>>> The patch in question looks good to me,
> >>>>> Thanks!
> >>>>>
> >>>>>
> >>>>>> although I'd prefer something like:
> >>>>>>
> >>>>>> CACHE_LINE_SIZE 64
> >>>>>> DOUBLE_CACHE_LINE_SIZE 128
> >>>>> I think we should make that change on a more HotSpot wide basis
> with
> >>>>> a different bug ID. Also, do you see those values (64 and 128) being
> >>>>> appropriate for all the CPUs/platforms? Or do you foresee those
> values
> >>>>> being defined per-CPU/platform?
> >>>>>
> >>>>>
> >>>>>> .. and then use whichever is deemed more appropriate on a case by
> >> case
> >>>>>> basis. (Hint: it's probably CACHE_LINE_SIZE)
> >>>>> I accept that hint which is why I'm going with OM_CACHE_LINE_SIZE
> >>>>> so that ObjectMonitors can have their own case... :-)
> >>>>>
> >>>>>
> >>>>>> $.02
> >>>>>>
> >>>>>> /Claes
> >>>>> And back to David:
> >>>>>>> Otherwise the mechanics of the change look fine, except I would
> >>>>>>> suggest that this:
> >>>>>>>
> >>>>>>> #ifndef OM_CACHE_LINE_SIZE
> >>>>>>> // Use DEFAULT_CACHE_LINE_SIZE if not already specified for
> >>>>>>> // the current build platform.
> >>>>>>> #define OM_CACHE_LINE_SIZE DEFAULT_CACHE_LINE_SIZE
> >>>>>>> #endif
> >>>>>>>
> >>>>>>> should go in the shared globalDefinitions.hpp file.
> >>>>> I intentionally didn't want to expose OM_CACHE_LINE_SIZE in
> >>>>> globalDefinitions.hpp. I'm trying really hard to stay out of that file.
> >>>>> Also, is globalDefinitions.hpp processed before or after the
> >>>>> CPU/platform
> >>>>> specific files?
> >>>>>
> >>>>> Obviously I need to specify OM_CACHE_LINE_SIZE in the two CPU
> files:
> >>>>>
> >>>>>        src/hotspot/cpu/sparc/globalDefinitions_sparc.hpp
> >>>>>        src/hotspot/cpu/x86/globalDefinitions_x86.hpp
> >>>>>
> >>>>> and both halves of the ObjectMonitor subsystem need the above
> logic:
> >>>>>
> >>>>>        src/hotspot/share/runtime/objectMonitor.hpp
> >>>>>        src/hotspot/share/runtime/synchronizer.hpp
> >>>>>
> >>>>> but we don't need to expose OM_CACHE_LINE_SIZE elsewhere. Of
> >> course,
> >>>>> if we end up adding CACHE_LINE_SIZE and
> DOUBLE_CACHE_LINE_SIZE
> >> to
> >>>>> the system *and* if we can get all the CPUs/platforms to agree to use
> >>>>> CACHE_LINE_SIZE for ObjectMonitors, then OM_CACHE_LINE_SIZE
> can
> >> be
> >>>>> retired.
> >>>>>
> >>>>> Thanks again for the fast reviews!
> >>>>>
> >>>>> Dan
> >>>>>
> >>>>>
> >>>>>>> Thanks,
> >>>>>>> David
> >>>>>>> -----
> >>>>>>>
> >>>>>>>>        JDK-8235931 add OM_CACHE_LINE_SIZE and use smaller size
> on
> >>>>>>>> SPARCv9 and X64
> >>>>>>>>        https://bugs.openjdk.java.net/browse/JDK-8235931
> >>>>>>>>
> >>>>>>>> Here's the webrev URL:
> >>>>>>>>
> >>>>>>>> http://cr.openjdk.java.net/~dcubed/8235931-webrev/0-for-
> jdk15/
> >>>>>>>>
> >>>>>>>> Folks that have reviewed JDK-8153224 will recognize these
> changes
> >> as
> >>>>>>>> a subset of the cache line changes from the Async Monitor
> Deflation
> >>>>>>>> project. It's a subset because there are more fields that need to
> be
> >>>>>>>> on separate cache lines in the Async Monitor Deflation project.
> >>>>>>>>
> >>>>>>>> These changes are being tested in a Mach5 Tier[1-3] run that's still
> >>>>>>>> running (JDK14 ATR has priority). I've also run these changes thru
> >>>>>>>> a set of 10 SPECjbb2015 runs; I'm waiting for my various
> SPECjbb2015
> >>>>>>>> runs to finish so I can do the analysis.
> >>>>>>>>
> >>>>>>>> Thanks, in advance, for comments, questions or suggestions.
> >>>>>>>>
> >>>>>>>> Dan
> >>>



More information about the hotspot-runtime-dev mailing list