RFR: JDK-8283674: Pad ObjectMonitor allocation size to cache line size
Daniel D.Daugherty
dcubed at openjdk.java.net
Mon Mar 28 19:48:48 UTC 2022
On Fri, 25 Mar 2022 09:02:28 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
> See discussion under [1].
>
> Since the libc malloc allocator may place ObjectMonitor instances adjacent to each other, we should pad the size of ObjectMonitor to fill a whole cache line to prevent false sharing between adjacent OMs.
>
> [1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2022-March/054187.html
The padding work on ObjectMonitors was done with the following fix:
JDK-8049737 Contended Locking reorder and cache line bucket
https://bugs.openjdk.java.net/browse/JDK-8049737
Prior to that fix, the blocks of ObjectMonitors were allocated like this:
src/share/vm/runtime/synchronizer.cpp:
L966 // 3: allocate a block of new ObjectMonitors
L967 // Both the local and global free lists are empty -- resort to malloc().
L968 // In the current implementation objectMonitors are TSM - immortal.
L969 assert(_BLOCKSIZE > 1, "invariant");
L970 ObjectMonitor * temp = new ObjectMonitor[_BLOCKSIZE];
After that fix, the allocation was "a bit" more complicated:
src/share/vm/runtime/synchronizer.cpp:
L970 // 3: allocate a block of new ObjectMonitors
L971 // Both the local and global free lists are empty -- resort to malloc().
L972 // In the current implementation objectMonitors are TSM - immortal.
L973 // Ideally, we'd write "new ObjectMonitor[_BLOCKSIZE], but we want
L974 // each ObjectMonitor to start at the beginning of a cache line,
L975 // so we use align_size_up().
L976 // A better solution would be to use C++ placement-new.
L977 // BEWARE: As it stands currently, we don't run the ctors!
L978 assert(_BLOCKSIZE > 1, "invariant");
L979 size_t neededsize = sizeof(PaddedEnd<ObjectMonitor>) * _BLOCKSIZE;
L980 PaddedEnd<ObjectMonitor> * temp;
L981 size_t aligned_size = neededsize + (DEFAULT_CACHE_LINE_SIZE - 1);
L982 void* real_malloc_addr = (void *)NEW_C_HEAP_ARRAY(char, aligned_size,
L983 mtInternal);
L984 temp = (PaddedEnd<ObjectMonitor> *)
L985 align_size_up((intptr_t)real_malloc_addr,
L986 DEFAULT_CACHE_LINE_SIZE);
So the new allocation logic handled both aligning the start of the ObjectMonitor
on a cache-line AND padding the ObjectMonitor's size out to a cache-line
boundary.
The changes in JDK-8049737 were also extensively performance tested by @cl4es.
I checked my email archive for that fix and I have private IMs back-and-forth with
Claes where he talked about his perf testing for JDK-8049737. He tested the:
- intra-field padding (not mentioned in this thread)
- the cache-line alignment for the start of the ObjectMonitor
- the padding at the end of the ObjectMonitor
together and separately trying to determine whether any of the padding was not
necessary. Claes determined that we should move forward with all three of the
above types of padding and alignment, i.e. all three resulted in improvements on
the hardware that he used at the time. The improvements varied on the number of
CPUs and the number of cores per CPU and whether all of the cores on a CPU
shared the same L1 cache or not (on Intel CPUs). Not that it matters today, but
the work on JDK-8049737 scaled more nicely on SPARC CPUs all the way up to
48 cores. On Intel chips I think the improvements dropped off after 4 or 8 cores,
but I'm not sure about that. It has been too long since we did this work.
So back in 2014, when this work was done, we determined via performance testing
that adding padding to the end of the ObjectMonitor was helpful (along with the
other 2 padding/alignment changes).
Of course, in 2020, when we did the work for JDK-8253064, we dropped the padding
at the end of ObjectMonitor because our testing didn't reveal a need for it.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7955
More information about the hotspot-runtime-dev
mailing list