Small question about JDK-8253064 and ObjectMonitor allocation

Mon Jan 31 07:35:13 UTC 2022

On 31/01/2022 3:54 pm, Thomas Stüfe wrote:
> Hi David,
> 
> Thank you for the answer!
> 
> On Mon, Jan 31, 2022 at 6:23 AM David Holmes <david.holmes at oracle.com 
> <mailto:david.holmes at oracle.com>> wrote:
> 
>     Hi Thomas,
> 
>     On 31/01/2022 2:32 pm, Thomas Stüfe wrote:
>      > Hi,
>      >
>      > I have a small question about a detail of JDK-8253064.
>      >
>      > IIUC, before this patch, the VM kept thread-local freelists of
>      > pre-allocated ObjectMonitors to reduce allocation contention. Now
>     we just
>      > malloc monitors right away.
>      >
>      > I looked through the issue and the associated PR, but could find no
>      > information on why this was done. Dan describes what he did very
>     well:
>      > https://github.com/openjdk/jdk/pull/642#issuecomment-720753946
>     <https://github.com/openjdk/jdk/pull/642#issuecomment-720753946>,
>     but not
>      > why.
>      >
>      > I assume that the complexity and memory overhead of the free
>     lists was not
>      > worth it? That you found that malloc() is on our platforms
>     "uncontented"
>      > enough?
> 
>     The issue was not about freelists and contention it was about requiring
>     type-stable-memory: that once a piece of memory was allocated as an
>     ObjectMonitor it remained forever after an ObjectMonitor. This allowed
>     for various race conditions in the old monitor code maintaining safety.
>     Over time that code changed substantially and the need for
>     type-stable-memory for ObjectMonitors disappeared, so we finally got
>     rid
>     of it and just moved to a direct allocation scheme.
> 
> 
> I think I understand, but I was specifically concerned with the question 
> of allocation contention of ObjectMonitors. That is somewhat independent 
> from the question of where OMs are allocated.
> 
> Can it happen that lock inflation happens clustered, or does that not 
> occur in reality?
> 
> AFAIU the old code managed OM storage itself, used global data 
> structures to do so, and guarded access with a mutex. To reduce 
> contention, it used a surprisingly large thread-local freelist of 1024 
> entries. This looks like contention was once a real problem.

You can always create a benchmark to show contention in the monitor 
inflation code. I don't recall now whether this was a real issue or a 
microbenchmark issue. As the code stated:

ObjectMonitor * ATTR ObjectSynchronizer::omAlloc (Thread * Self) {
     // A large MAXPRIVATE value reduces both list lock contention
     // and list coherency traffic, but also tends to increase the
     // number of objectMonitors in circulation as well as the STW
     // scavenge costs.  As usual, we lean toward time in space-time
     // tradeoffs.

     const int MAXPRIVATE = 1024 ;

so general performance was a consideration.

> OTOH the new code just uses malloc, which also may lock depending on the 
> malloc allocator internals and the used libc settings. Therefore I 
> wonder whether OM allocation is still a problem, not a problem with 
> real-life malloc, or maybe never really had been a problem and the old 
> code was just overly cautious?

Whenever we make significant changes to a subsystem we always 
investigate the performance profile of the changes. We're prepared to 
accept some performance loss if we have a good improvement in code 
complexity/maintainability etc, but if a significant performance issue 
arose we would revisit it. See for example discussion in:

https://bugs.openjdk.java.net/browse/JDK-8263864

and related.

Cheers,
David
-----

> Thanks, Thomas
> 
>