RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE

Tue Oct 11 10:05:15 UTC 2016

Hi,

On 2016-10-11 02:03, Coleen Phillimore wrote:
>
> Hi,
>
> Was the linear allocation in mutex.cpp the cause of the false sharing 
> that you observed?  I think I like this change better than the 
> original, because I've wondered myself why the name string was so 
> long.  So with this, we could make Monitor's smaller if they're 
> embedded in metadata or other structures.

Music to my ears!

I even think most embedded uses would see improvements if _name was 
removed entirely (or "simply" turned into a const char * so that it's 
not copied and embedded into the Monitor/Mutex)

>
> Thanks,
> Coleen
>
> On 10/10/16 2:00 PM, Doerr, Martin wrote:
>> Hi Claes,
>>
>> thank you very much for your explanations.
>>
>> I agree with you that it would be better to pad where the Monitors 
>> are used. It would still fulfill the purpose of this RFE without 
>> disturbing other usages.
>>
>> So I could introduce:
>> class PaddedMonitor : public Monitor {
>>    enum {
>>      CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE - 
>> (int)sizeof(Monitor),
>>      PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0
>>    };
>>    char _padding[PADDING_LEN];
>> };
>> and similarly PaddedMutex and replace all of the ones which get 
>> allocated in a linear fashion (mutexLocker.cpp mutex_init()).

Sure!

Some compilers may take issue with cases where PADDING_LEN == 0 (since 
char _padding[0] is technically illegal C++, but works on gcc etc) so 
maybe that special case will have to be (somewhat excessively):

PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 
DEFAULT_CACHE_LINE_SIZE

We took a look at if it'd be feasible to express class PaddedMonitor : 
public PaddedEnd<Monitor>, but it appears that'd require variadic 
template arguments (C++11) to get right (since we'd need PaddedEnd to
transitively publish constructors of Monitor).

Thanks!

/Claes

>>
>> Would you agree with this change?
>>
>> Thanks and best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>> Sent: Freitag, 7. Oktober 2016 12:35
>> To: Doerr, Martin <martin.doerr at sap.com>; 
>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; 
>> David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>; 
>> Coleen Phillimore (coleen.phillimore at oracle.com) 
>> <coleen.phillimore at oracle.com>
>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to 
>> DEFAULT_CACHE_LINE_SIZE
>>
>> Hi,
>>
>> I'm concerned that this might be an easy-but-wrong fix to a complex
>> problem, and acknowledging that there are already use cases where the
>> _name field is contra-productive. This change adds complexity that
>> makes it even less likely such uses will be optimized for in the
>> future.
>>
>> There are Padded* types put in place to deal with these concerns
>> explicitly rather than implicitly *where it matters*, which allows us
>> the choice of applying padding or not on a per use-case basis (which
>> means we can also remove the _name field for those use cases that don't
>> care about either, which might be most outside of the global lists).
>>
>> I am very concerned about false sharing, but I have no data to support
>> that this change has any measurable benefit in practice: I even did an
>> experiment years ago now where I turned _name into a pointer to not pad
>> at all and saw nothing exceeding noise levels on any benchmark.
>>
>> Thanks!
>>
>> /Claes
>>
>> On 2016-10-07 12:18, Doerr, Martin wrote:
>>> Hi Claes,
>>>
>>> what the change basically does is that the _name[] field gets 
>>> enlarged by 8 bytes on platforms with 128 byte 
>>> DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed 
>>> by the C++ compiler.
>>> What exactly is your concern about the footprint overhead?
>>> Are you not concerned about the risk of false sharing?
>>>
>>> Best regards,
>>> Martin
>>>
>>> -----Original Message-----
>>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>>> Sent: Freitag, 7. Oktober 2016 12:00
>>> To: Doerr, Martin <martin.doerr at sap.com>; 
>>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; 
>>> David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>; 
>>> Coleen Phillimore (coleen.phillimore at oracle.com) 
>>> <coleen.phillimore at oracle.com>
>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to 
>>> DEFAULT_CACHE_LINE_SIZE
>>>
>>> Hi,
>>>
>>> after due consideration I strongly consider this change unacceptable
>>> since it adds footprint overhead to performance critcial compiler and
>>> GC code with little to no data to support this won't cause regressions.
>>>
>>> Changes to Monitor/Mutex needs to be done with more surgical precision
>>> than this.
>>>
>>> If I do have a veto on the matter, here it is.
>>>
>>> Thanks!
>>>
>>> /Claes
>>>
>>> On 2016-10-07 11:34, Doerr, Martin wrote:
>>>> Hi Dan,
>>>>
>>>> thank you very much for reviewing and for investigating the history.
>>>>
>>>> It was not intended to make the functions you mentioned public. 
>>>> I've fixed that.
>>>> I also updated the copyright information.
>>>>
>>>> New webrev is here:
>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/
>>>>
>>>> @Coleen: Please use this one. I have also added reviewer attribution.
>>>>
>>>> Thanks and best regards,
>>>> Martin
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>>>> Sent: Donnerstag, 6. Oktober 2016 23:13
>>>> To: Doerr, Martin <martin.doerr at sap.com>; 
>>>> hotspot-runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to 
>>>> DEFAULT_CACHE_LINE_SIZE
>>>>
>>>> On 9/30/16 9:48 AM, Doerr, Martin wrote:
>>>>> Hi,
>>>>>
>>>>> the current implementation of Monitor padding (mutex.cpp) assumes 
>>>>> that cache lines are 64 Bytes. There's a platform dependent define 
>>>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of 
>>>>> padding is to avoid false sharing.
>>>>>
>>>>> My proposed change is here:
>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>>>> src/share/vm/runtime/mutex.hpp
>>>>         Please update the copyright year before pushing.
>>>>
>>>>         L172:   // The default length of monitor name is chosen to 
>>>> avoid
>>>> false sharing.
>>>>         L173:   enum {
>>>>         L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE -
>>>> sizeof(MonitorBase),
>>>>         L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ?
>>>> CACHE_LINE_PADDING : 64
>>>>         L176:   };
>>>>         L177:   char _name[MONITOR_NAME_LEN];          // Name of 
>>>> mutex
>>>>
>>>>             I have to say that I'm not fond of the fact that 
>>>> MONITOR_NAME_LEN
>>>>             can vary between platforms; I like that it is a minimum 
>>>> of 64 bytes
>>>>             and is still a constant.
>>>>
>>>>             I'm also not happy that the resulting sizeof(Monitor) 
>>>> may not
>>>> be a multiple
>>>>             of the DEFAULT_CACHE_LINE_SIZE. However, I have to 
>>>> mitigate
>>>> that unhappiness
>>>>             with the fact that sizeof(Monitor) hasn't been a 
>>>> multiple of
>>>> the cache line
>>>>             size since at least 2008 and no one complained (that I 
>>>> know of).
>>>>
>>>>             So if I was making this change, I would make 
>>>> MONITOR_NAME_LEN
>>>> 64 bytes
>>>>             (like it was) and add a pad field that would bring up
>>>> sizeof(Monitor)
>>>>             to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, 
>>>> Claes
>>>> would be
>>>>             unhappy with me and anyone embedding a Monitor into 
>>>> another data
>>>>             structure would be unhappy with me, but I'm used to 
>>>> that :-)
>>>>
>>>>             So what you have is fine, especially for JDK9.
>>>>
>>>>         L180:  public:
>>>>         L181: #ifndef PRODUCT
>>>>         L182:   debug_only(static bool contains(Monitor * locks, 
>>>> Monitor *
>>>> lock);)
>>>>         L183:   debug_only(static Monitor * 
>>>> get_least_ranked_lock(Monitor *
>>>> locks);)
>>>>         L184:   debug_only(Monitor *
>>>> get_least_ranked_lock_besides_this(Monitor * locks);)
>>>>         L185: #endif
>>>>         L186:
>>>>         L187:   void set_owner_implementation(Thread*
>>>> owner)                        PRODUCT_RETURN;
>>>>         L188:   void check_prelock_state     (Thread*
>>>> thread)                       PRODUCT_RETURN;
>>>>         L189:   void check_block_state       (Thread* thread)
>>>>
>>>>             These were all "protected" before. Now they are "public".
>>>>             Any particular reason?
>>>>
>>>> Thumbs up on the mechanics of this change. I'm interested in the
>>>> answer to the "protected" versus "public" question, but don't
>>>> considered that query to be a blocker.
>>>>
>>>>
>>>> The rest of this isn't code review, but some of this caught
>>>> my attention.
>>>>
>>>> src/share/vm/runtime/mutex.hpp
>>>>
>>>>         old L84: // The default length of monitor name is chosen to 
>>>> be 64
>>>> to avoid false sharing.
>>>>         old L85: static const int MONITOR_NAME_LEN = 64;
>>>>
>>>> I had to look up the history of this comment:
>>>>
>>>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp
>>>> changeset:   55:2a8eb116ebbe
>>>> user:        xlu
>>>> date:        Tue Feb 05 23:21:57 2008 -0800
>>>> summary:     6610420: Debug VM crashes during monitor lock rank 
>>>> checking
>>>>
>>>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
>>>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
>>>> --- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 -0500
>>>> +++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 -0800
>>>> @@ -82,6 +82,9 @@ class ParkEvent ;
>>>>      // *in that order*.  If their implementations change such that 
>>>> these
>>>>      // assumptions are violated, a whole lot of code will break.
>>>>
>>>> +// The default length of monitor name is choosen to be 64 to avoid
>>>> false sharing.
>>>> +static const int MONITOR_NAME_LEN = 64;
>>>> +
>>>>      class Monitor : public CHeapObj {
>>>>
>>>>       public:
>>>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
>>>>        volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
>>>>        ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
>>>>        volatile bool     _snuck;              // Used for sneaky 
>>>> locking
>>>> (evil).
>>>> -  const char * _name;                    // Name of mutex
>>>>        int NotifyCount ;                      // diagnostic assist
>>>> -  double pad [8] ;                       // avoid false sharing
>>>> +  char _name[MONITOR_NAME_LEN];          // Name of mutex
>>>>
>>>>        // Debugging fields for naming, deadlock detection, etc. 
>>>> (some only
>>>> used in debug mode)
>>>>      #ifndef PRODUCT
>>>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
>>>>         int  ILocked () ;
>>>>
>>>>       protected:
>>>> -   static void ClearMonitor (Monitor * m) ;
>>>> +   static void ClearMonitor (Monitor * m, const char* name = NULL) ;
>>>>         Monitor() ;
>>>>
>>>> So the original code had an 8-double pad for avoiding false sharing.
>>>> Sounds very much like the old ObjectMonitor padding. I'm sure at the
>>>> time that Dice determined that 8-double value, the result was to pad
>>>> the size of Monitor to an even multiple of a particular cache line
>>>> size.
>>>>
>>>> Xiobin changed the 'name' field to be an array so that the name
>>>> chars could serve double duty as the cache line pad... pun intended.
>>>> Unfortunately that pad doesn't make sure that the resulting Monitor
>>>> size is a multiple of the cache line size.
>>>>
>>>> Dan
>>>>
>>>>
>>>>> Please review. If will also need a sponsor.
>>>>>
>>>>> Thanks and best regards,
>>>>> Martin
>>>>>
>