RFR(XXS): 8235931: add OM_CACHE_LINE_SIZE and use smaller size on SPARCv9 and X64
Daniel D. Daugherty
daniel.daugherty at oracle.com
Wed Jan 22 14:42:02 UTC 2020
On 1/22/20 5:12 AM, Doerr, Martin wrote:
> Hi Dan,
>
>> I own the problem of updating OM_CACHE_LINE_SIZE to whatever symbol
>> we decide replaces DEFAULT_CACHE_LINE_SIZE in the future so my preference is to
>> leave things as they are until we reach that point. Can you live with that?
> Yes.
Thanks!
> I'd slightly prefer to leave SharedGlobals unchanged, but I guess it doesn't really matter.
> So I can live with your version, too.
Thanks for your flexibility and thanks again for the review!
Dan
>
> Best regards,
> Martin
>
>
>> -----Original Message-----
>> From: Daniel D. Daugherty <daniel.daugherty at oracle.com>
>> Sent: Dienstag, 21. Januar 2020 16:22
>> To: Doerr, Martin <martin.doerr at sap.com>; Claes Redestad
>> <claes.redestad at oracle.com>; David Holmes <david.holmes at oracle.com>;
>> hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(XXS): 8235931: add OM_CACHE_LINE_SIZE and use smaller
>> size on SPARCv9 and X64
>>
>> On 1/20/20 5:46 AM, Doerr, Martin wrote:
>>> Hi Dan,
>>>
>>>> It's not clear whether you reviewed the entire patch or not.
>>> Now I have reviewed the entire patch. Sorry for the delay.
>> Thanks for the review!
>>
>>
>>>>> I hope you don't mind that I have no interest in taking on the broader
>>>>> naming difficulties using 8235931...
>>> Perfectly ok. I appreciate cleaning things up in a separate change.
>>>
>>> I don't like having the "#define OM_CACHE_LINE_SIZE
>> DEFAULT_CACHE_LINE_SIZE" twice.
>>> Maybe padded.hpp would be a good place? It's included from both,
>> synchronizer.hpp and objectMonitor.hpp.
>>
>> I don't think any of us (David H, you or me) like OM_CACHE_LINE_SIZE being
>> defined in two places, but I'm trying to avoid polluting non-Monitor places
>> with an OM_... symbol. That just doesn't feel right.
>>
>> I own the problem of updating OM_CACHE_LINE_SIZE to whatever symbol
>> we
>> decide
>> replaces DEFAULT_CACHE_LINE_SIZE in the future so my preference is to
>> leave
>> things as they are until we reach that point. Can you live with that?
>>
>>
>>> synchronizer.cpp:
>>> Not sure if the padding change within SharedGlobals is beneficial and I
>> don't see how it's related to the ObjectMonitors.
>>
>> It seemed strange to change from DEFAULT_CACHE_LINE_SIZE to
>> OM_CACHE_LINE_SIZE
>> in one part of synchronizer.cpp, but not in all parts... Also:
>>
>> L1666: void
>> ObjectSynchronizer::finish_deflate_idle_monitors(DeflateMonitorCounters*
>> counters) {
>> L1692: GVars.stw_random = os::random();
>> L1693: GVars.stw_cycle++;
>>
>> so two of the three fields are touched by ObjectMonitor code...
>>
>>
>>> The remaining parts make sense to me.
>> Thanks!
>>
>>
>>> Thanks for sharing performance results.
>> You're welcome! I'm doing another round of perf testing after rebasing
>> everything to jdk-14+32... this time 25 runs per config... I'm hoping to
>> get more stable results, but I'm not optimistic...
>>
>> Dan
>>
>>
>>> Best regards,
>>> Martin
>>>
>>>
>>>> -----Original Message-----
>>>> From: Daniel D. Daugherty <daniel.daugherty at oracle.com>
>>>> Sent: Freitag, 17. Januar 2020 22:20
>>>> To: Doerr, Martin <martin.doerr at sap.com>; Claes Redestad
>>>> <claes.redestad at oracle.com>; David Holmes
>> <david.holmes at oracle.com>;
>>>> hotspot-runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR(XXS): 8235931: add OM_CACHE_LINE_SIZE and use
>> smaller
>>>> size on SPARCv9 and X64
>>>>
>>>> Martin,
>>>>
>>>> It's not clear whether you reviewed the entire patch or not.
>>>>
>>>> Currently I'm planning to list dholmes and redestad as reviewers. Do
>>>> you also wish to be listed as a reviewer?
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 1/17/20 1:06 PM, Daniel D. Daugherty wrote:
>>>>> Hi Martin,
>>>>>
>>>>> Thanks for chiming in on this review thread.
>>>>>
>>>>> For the purposes of this fix (8235931), I'm going to stick with
>>>>> OM_CACHE_LINE_SIZE and we'll have to hash out a broader definition
>>>>> of "cache line size" via a different bug ID.
>>>>>
>>>>> I hope you don't mind that I have no interest in taking on the broader
>>>>> naming difficulties using 8235931...
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 12/20/19 9:24 AM, Doerr, Martin wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Just a comment on
>>>>>>> CACHE_LINE_SIZE 64
>>>>>>> DOUBLE_CACHE_LINE_SIZE 128
>>>>>> I think CACHE_LINE_SIZE is too unspecific.
>>>>>> There are L1I L1D, L2, L3, ... caches and they can have different
>>>>>> cache line sizes.
>>>>>> In addition, they can change between processor models.
>>>>>>
>>>>>> So we should have a better name than just "CACHE_LINE_SIZE".
>>>>>>
>>>>>> PPC and s390 already have large cache lines (which are real, not
>>>>>> faked for performance considerations).
>>>>>> I guess we don't want to use even more padding on these platforms.
>>>>>> So we will probably want to set both constants to the same value
>>>>>> (needs to get evaluated).
>>>>>>
>>>>>> Best regards,
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>>>>>>> bounces at openjdk.java.net> On Behalf Of Daniel D. Daugherty
>>>>>>> Sent: Donnerstag, 19. Dezember 2019 18:51
>>>>>>> To: Claes Redestad <claes.redestad at oracle.com>; David Holmes
>>>>>>> <david.holmes at oracle.com>; hotspot-runtime-
>> dev at openjdk.java.net
>>>>>>> Subject: Re: RFR(XXS): 8235931: add OM_CACHE_LINE_SIZE and use
>>>> smaller
>>>>>>> size on SPARCv9 and X64
>>>>>>>
>>>>>>> Hi David and Claes,
>>>>>>>
>>>>>>> Thanks for the fast reviews! Sorry for my delay in responding. I've
>>>>>>> been
>>>>>>> working on the Async Monitor Deflation wiki updates...
>>>>>>>
>>>>>>> Since Claes was kind enough to reply to David's post and included all
>>>>>>> of David's comments, I'm going to reply to both here.
>>>>>>>
>>>>>>>
>>>>>>> On 12/18/19 10:03 AM, Claes Redestad wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 2019-12-18 06:50, David Holmes wrote:
>>>>>>>>> Hi Dan,
>>>>>>>>>
>>>>>>>>> On 18/12/2019 7:34 am, Daniel D. Daugherty wrote:
>>>>>>>>>> Greetings,
>>>>>>>>>>
>>>>>>>>>> I'm extracting another standalone fix from the Async Monitor
>>>>>>>>>> Deflation
>>>>>>>>>> project (JDK-8153224) and sending it out for review (and testing)
>>>>>>>>>> separately. This is an XXS sized fix, but since it involves cache
>>>>>>>>>> line sizes I don't think a trivial review is appropriate.
>>>>>>> First, David's part:
>>>>>>>
>>>>>>>
>>>>>>>>> Can you elaborate on why a smaller cache-line-size assumption is
>>>>>>>>> beneficial here?
>>>>>>> I originally heard about switching from 128 byte to 64 byte
>>>>>>> cache-line-size
>>>>>>> at the June 2019 Runtime offsite. I don't remember who brought it
>>>>>>> up, but
>>>>>>> they were saying that Intel no longer recommends using 128 byte
>>>>>>> cache-line-size
>>>>>>> (and hasn't recommended that for some time). IIRC, the person also
>> said
>>>>>>> that
>>>>>>> their Intel contact was a bit surprised about the 128 byte
>>>>>>> cache-line-size
>>>>>>> recommendation.
>>>>>>>
>>>>>>> When I talked about Async Monitor Deflation in June I mentioned that
>> I
>>>>>>> was going to try out 64 byte cache-line-size for ObjectMonitors and so
>>>>>>> far I haven't seen any performance issues.
>>>>>>>
>>>>>>>
>>>>>>>>> Do you have performance numbers for the bug report?
>>>>>>> I've been doing Aurora SPECjbb2015 runs (tuned G1, 10 iterations)
>>>>>>> between
>>>>>>> my baseline (jdk-14+26) and the various fixes. For this fix (8235931)
>>>>>>> relative to my first baseline run:
>>>>>>>
>>>>>>> criticalJOPS -0.40% (Non-significant)
>>>>>>> maxJOPS +0.23% (Non-significant)
>>>>>>>
>>>>>>> I reran the baseline a second time using the same bits and for this fix
>>>>>>> (8236035) relative to the second run:
>>>>>>>
>>>>>>> criticalJOPS +6.96% (Non-significant)
>>>>>>> maxJOPS +9.96% (Non-significant)
>>>>>>>
>>>>>>> When I compare the two baseline runs:
>>>>>>>
>>>>>>> criticalJOPS -6.65% (Non-significant)
>>>>>>> maxJOPS -8.86% (Non-significant)
>>>>>>>
>>>>>>> So even with 10 iterations, these SPECjbb2015 runs are all over the
>>>>>>> place. And Aurora is agreeing with that by flagging the results as
>>>>>>> "Non-significant" because the 'p' values are too large.
>>>>>>>
>>>>>>> I'm doing another set of Aurora SPECjbb2015 runs and I'm bumping
>> the
>>>>>>> iterations from 10 to 15. In the recent past, Per told me I might
>>>>>>> have to do 20 runs of SPECjbb2015 to shake out the variations.
>>>>>>> We'll see what turns up with this latest set of runs...
>>>>>>>
>>>>>>>
>>>>>>> Second, Claes' part:
>>>>>>>
>>>>>>>> I did numerous experiments on this during work on JEP-143 and saw
>> no
>>>>>>>> improvement on neither "real" benchmarks nor stress tests that try
>> to
>>>>>>>> explicitly provoke cross-monitor false sharing on various fields in
>>>>>>>> ObjectMonitors. I've not re-run those experiments on new
>> hardware,
>>>> so
>>>>>>>> YMMV. Should probably check if they're still usable..
>>>>>>> As always, I'm interested in your ObjectMonitor results.
>>>>>>>
>>>>>>>
>>>>>>>> Specifically, for padding values below 64 we could provoke false
>>>>>>>> sharing
>>>>>>>> effects on stress tests, with negligible movement on "real"
>>>>>>>> benchmarks.
>>>>>>>> For values above 64 we just waste space.
>>>>>>> Yup. You've said that a few times and that's one of the reasons that
>>>>>>> I've included this change in the Async Monitor Deflation project.
>>>>>>>
>>>>>>>
>>>>>>>> There exists a legend that you need double the physical cache line
>>>>>>>> padding to not provoke the wrath of false sharing in the presence of
>>>>>>>> aggressive prefetching, which is the reason why the
>>>>>>>> "DEFAULT_CACHE_LINE_SIZE" is set to twice the typical physical
>> cache
>>>>>>>> line size in bytes. If we explicitly think we need to pad two lines,
>>>>>>>> then the constant DEFAULT_CACHE_LINE_SIZE should be named
>>>>>>> differently.
>>>>>>>
>>>>>>> I wouldn't call it "a legend". When we made this change years ago,
>> Dice
>>>>>>> had very specific advice from someone at Intel to make this change.
>> He
>>>>>>> could probably still dig up the details...
>>>>>>>
>>>>>>>
>>>>>>>> The patch in question looks good to me,
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>>> although I'd prefer something like:
>>>>>>>>
>>>>>>>> CACHE_LINE_SIZE 64
>>>>>>>> DOUBLE_CACHE_LINE_SIZE 128
>>>>>>> I think we should make that change on a more HotSpot wide basis
>> with
>>>>>>> a different bug ID. Also, do you see those values (64 and 128) being
>>>>>>> appropriate for all the CPUs/platforms? Or do you foresee those
>> values
>>>>>>> being defined per-CPU/platform?
>>>>>>>
>>>>>>>
>>>>>>>> .. and then use whichever is deemed more appropriate on a case by
>>>> case
>>>>>>>> basis. (Hint: it's probably CACHE_LINE_SIZE)
>>>>>>> I accept that hint which is why I'm going with OM_CACHE_LINE_SIZE
>>>>>>> so that ObjectMonitors can have their own case... :-)
>>>>>>>
>>>>>>>
>>>>>>>> $.02
>>>>>>>>
>>>>>>>> /Claes
>>>>>>> And back to David:
>>>>>>>>> Otherwise the mechanics of the change look fine, except I would
>>>>>>>>> suggest that this:
>>>>>>>>>
>>>>>>>>> #ifndef OM_CACHE_LINE_SIZE
>>>>>>>>> // Use DEFAULT_CACHE_LINE_SIZE if not already specified for
>>>>>>>>> // the current build platform.
>>>>>>>>> #define OM_CACHE_LINE_SIZE DEFAULT_CACHE_LINE_SIZE
>>>>>>>>> #endif
>>>>>>>>>
>>>>>>>>> should go in the shared globalDefinitions.hpp file.
>>>>>>> I intentionally didn't want to expose OM_CACHE_LINE_SIZE in
>>>>>>> globalDefinitions.hpp. I'm trying really hard to stay out of that file.
>>>>>>> Also, is globalDefinitions.hpp processed before or after the
>>>>>>> CPU/platform
>>>>>>> specific files?
>>>>>>>
>>>>>>> Obviously I need to specify OM_CACHE_LINE_SIZE in the two CPU
>> files:
>>>>>>> src/hotspot/cpu/sparc/globalDefinitions_sparc.hpp
>>>>>>> src/hotspot/cpu/x86/globalDefinitions_x86.hpp
>>>>>>>
>>>>>>> and both halves of the ObjectMonitor subsystem need the above
>> logic:
>>>>>>> src/hotspot/share/runtime/objectMonitor.hpp
>>>>>>> src/hotspot/share/runtime/synchronizer.hpp
>>>>>>>
>>>>>>> but we don't need to expose OM_CACHE_LINE_SIZE elsewhere. Of
>>>> course,
>>>>>>> if we end up adding CACHE_LINE_SIZE and
>> DOUBLE_CACHE_LINE_SIZE
>>>> to
>>>>>>> the system *and* if we can get all the CPUs/platforms to agree to use
>>>>>>> CACHE_LINE_SIZE for ObjectMonitors, then OM_CACHE_LINE_SIZE
>> can
>>>> be
>>>>>>> retired.
>>>>>>>
>>>>>>> Thanks again for the fast reviews!
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>> JDK-8235931 add OM_CACHE_LINE_SIZE and use smaller size
>> on
>>>>>>>>>> SPARCv9 and X64
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8235931
>>>>>>>>>>
>>>>>>>>>> Here's the webrev URL:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8235931-webrev/0-for-
>> jdk15/
>>>>>>>>>> Folks that have reviewed JDK-8153224 will recognize these
>> changes
>>>> as
>>>>>>>>>> a subset of the cache line changes from the Async Monitor
>> Deflation
>>>>>>>>>> project. It's a subset because there are more fields that need to
>> be
>>>>>>>>>> on separate cache lines in the Async Monitor Deflation project.
>>>>>>>>>>
>>>>>>>>>> These changes are being tested in a Mach5 Tier[1-3] run that's still
>>>>>>>>>> running (JDK14 ATR has priority). I've also run these changes thru
>>>>>>>>>> a set of 10 SPECjbb2015 runs; I'm waiting for my various
>> SPECjbb2015
>>>>>>>>>> runs to finish so I can do the analysis.
>>>>>>>>>>
>>>>>>>>>> Thanks, in advance, for comments, questions or suggestions.
>>>>>>>>>>
>>>>>>>>>> Dan
More information about the hotspot-runtime-dev
mailing list