RFR: 8343840: Rewrite the ObjectMonitor lists

Tue Feb 25 13:19:16 UTC 2025

On Wed, 19 Feb 2025 20:55:28 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> I've combined two `ObjectMonitor`'s lists, `EntryList` and `cxq`, into one list. The `entry_list`.
>> 
>> This way c2 no longer has to check both `EntryList` and `cxq` in order to opt out if the "conceptual entry list" is empty, which also means that the constant question about if it's safe to first check the `EntryList` and then `cxq` will be a thing of the past.
>> 
>> In the current multi-queue design new threads where always added to the `cxq`, then `ObjectMonitor::exit` would choose a successor from the head of `EntryList`. When the `EntryList` was empty and `cxq` was not, `ObjectMonitor::exit` whould detached the singly linked `cxq` list, and add the elements to the doubly linked `EntryList`. The element that was first added to `cxq` whould be at the tail of the `EntryList`. This way you ended up working through the contending threads in LIFO-chunks.
>> 
>> The new list-design is as much a multi-queue as the current. Conceptually it can be looked upon as if the old singly linked `cxq` list doesn't end with a null pointer, but instead has a link that points to the head of the doubly linked `entry_list`.
>> 
>> You always add to the `entry_list` by Compare And Exchange to the head. The most common case is that you remove from the tail (the successor is chosen in strict FIFO order). The head is volatile, but the interior is stable.
>> 
>> The first contending thread that "pushes" itself onto `entry_list`, will be the last thread in the list. Each newly pushed thread in `entry_list` will be linked trough its next pointer, and have its prev pointer set to null, thus pushing new threads onto `entry_list` will form a singly linked list. The list is always in the right order (via the next-pointers) and is never moved to another list.
>> 
>> Since we choose the successor in FIFO order, the exiting thread needs to find the tail of the `entry_list`. This is done by walking from the `entry_list` head. While walking the list we assign the prev pointers of each thread, essentially forming a doubly linked list. The tail pointer is cached in `entry_list_tail` so that we don't need to walk from the `entry_list` head each time we need to find the tail (successor).
>> 
>> Performance wise the new design seems to be equal to the old design, even though c2 generates two less instructions per monitor unlock operation.
>> 
>> However the complexity of the source has been reduced by removing the `TS_CXQ` state and adding functions instead of inlining `cmpxchg` here and there, and the fac...
>
> src/hotspot/share/runtime/objectMonitor.cpp line 704:
> 
>> 702: 
>> 703:   for (;;) {
>> 704:     ObjectWaiter* front = Atomic::load_acquire(&_entry_list);
> 
> Technically you don't need a load_acquire here because you do not access any members of front before hitting the cmpxchg that gives you a full fence.. For good code hygiene Atomic::load would suffice.

Fixed

> src/hotspot/share/runtime/objectMonitor.cpp line 723:
> 
>> 721: 
>> 722:   for (;;) {
>> 723:     ObjectWaiter* front = Atomic::load_acquire(&_entry_list);
> 
> Technically you don't need a `load_acquire` here because you do not access any members of `front` before hitting the cmpxchg that gives you a full fence.. For good code hygiene `Atomic::load` would suffice.

Fixed

> src/hotspot/share/runtime/objectMonitor.cpp line 1264:
> 
>> 1262:     return w;
>> 1263:   }
>> 1264:   w = Atomic::load_acquire(&_entry_list);
> 
> Suggestion:
> 
>   // Need acquire here to match the implicit release of the cmpxchg that updated _entry_list, so we
>   // can access w->_next.
>   w = Atomic::load_acquire(&_entry_list);

Fixed

> src/hotspot/share/runtime/objectMonitor.cpp line 1303:
> 
>> 1301:   // Check if we are unlinking the last element in the _entry_list.
>> 1302:   // This is by far the most common case.
>> 1303:   if (currentNode->_next == nullptr) {
> 
> The direct checks of `_next` and _prev` for null/non-null do not work with your use of `set_bad_pointers`. If you actually intend to keep `set_bad_pointers` in the final code then you should be using accessors e.g.
> 
> ObjectWaiter* next() {
>   assert (_next != 0xBAD, "corrupted list!");
>   return _next;
> }

Fixed

> src/hotspot/share/runtime/objectMonitor.cpp line 1306:
> 
>> 1304:     assert(_entry_list_tail == nullptr || _entry_list_tail == currentNode, "invariant");
>> 1305: 
>> 1306:     ObjectWaiter* v = Atomic::load_acquire(&_entry_list);
> 
> Again technically you do not need `load_acquire` here because you do not access any fields of `v` when `v` could be other than the current node. `Atomic::load` will suffice.

Fixed

> src/hotspot/share/runtime/objectMonitor.cpp line 1315:
> 
>> 1313:       }
>> 1314:       // The CAS above can fail from interference IFF a contending
>> 1315:       // thread "pushed" itself onto entry_list.
> 
> Suggestion:
> 
>       // The CAS above can fail from interference IFF a contending
>       // thread "pushed" itself onto entry_list. So fall-through to
>       // building the doubly-linked list.
>       assert(currentNode->prev == nullptr, "invariant");

Fixed

> src/hotspot/share/runtime/objectMonitor.cpp line 1334:
> 
>> 1332:   }
>> 1333: 
>> 1334:   assert(currentNode->_next != nullptr, "invariant");
> 
> Suggestion:
> 
>   else {  //  currentNode->_next != nullptr
>   
>     // If we get here it means the current thread enqueued itself on the EntryList but was then able to
>     // "steal" the lock before the chosen successor was able to. Consequently currentNode must be an
>     // interior node in the EntryList, or the head.

Added the comment but left out the suggested "else" and kept the assert. I know that the if statement above always ends in a return, but if that is changed this feels safer.

> src/hotspot/share/runtime/objectMonitor.cpp line 1337:
> 
>> 1335:   assert(currentNode != _entry_list_tail, "invariant");
>> 1336: 
>> 1337:   if (currentNode->_prev == nullptr) {
> 
> Suggestion:
> 
>   // Check if we are in the singly-linked portion of the EntryList. If we are the head then we try to remove
>   // ourselves, else we convert to the doubly-linked list.
>   if (currentNode->_prev == nullptr) {

Fixed

> src/hotspot/share/runtime/objectMonitor.cpp line 1347:
> 
>> 1345:   // else we convert to the doubly-linked list.
>> 1346:   if (currentNode->_prev == nullptr) {
>> 1347:     ObjectWaiter* v = Atomic::load_acquire(&_entry_list);
> 
> Again no `load_acquire` needed.

Fixed

> src/hotspot/share/runtime/objectMonitor.cpp line 1352:
> 
>> 1350:         // The CAS above can fail from interference IFF a contending
>> 1351:         // thread "pushed" itself onto entry_list, in which case
>> 1352:         // currentNode must now be in the interior of the list.
> 
> Suggestion:
> 
>         // currentNode must now be in the interior of the list. Fall-through
>         // to building the doubly-linked list.

Fixed

> src/hotspot/share/runtime/objectMonitor.cpp line 1353:
> 
>> 1351:         // thread "pushed" itself onto entry_list, in which case
>> 1352:         // currentNode must now be in the interior of the list.
>> 1353:         assert(_entry_list != currentNode, "invariant");
> 
> Not sure you really need this. The fact the cmpxchg failed means we can't be the head of the list. Also by reading it again you are potentially finding a different head to that which existed when the cmpxchg failed.

You are right I don't really need it, but sometimes I feel that comments can rotten, but asserts can't.
I guess I put this one in so that it's easier to see what state the currentNode is in (not head) without reading through the logic that end up in the else-statement.

> src/hotspot/share/runtime/objectMonitor.cpp line 1362:
> 
>> 1360:   }
>> 1361: 
>> 1362:   // We now assume we are unlinking currentNode from the interior of a
> 
> Suggestion:
> 
>   // We now know we are unlinking currentNode from the interior of a

Fixed

> src/hotspot/share/runtime/objectMonitor.cpp line 1534:
> 
>> 1532:     ObjectWaiter* w = nullptr;
>> 1533: 
>> 1534:     w = _entry_list;
> 
> Use `Atomic::load` for consistency and good code hygiene.

Fixed

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1963144747
PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1963135003
PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1963050591
PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1967825628
PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1963136473
PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1963132242
PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1961646077
PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1961647021
PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1963137807
PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1969341568
PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1961659147
PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1963133824
PR Review Comment: https://git.openjdk.org/jdk/pull/23421#discussion_r1963141844