RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4]

Tue Nov 7 20:03:26 UTC 2023

On Wed, 1 Nov 2023 16:34:23 GMT, Oli Gillespie <ogillespie at openjdk.org> wrote:

>> Coleen -
>> 
>> I think NBQ is a reasonable choice for use here. But it's not a complete
>> solution on its own. It imposes documented requirements on clients. I don't
>> think we have a different data structure for this purpose (thread-safe FIFO
>> without locks), so any alternative would need to be invented, and would be
>> solving the same problem as NBQ and the surrounding client-provided code.
>> 
>> Oliver -
>> 
>> The current usage is not safe. The reuse can occur through the allocator. For
>> example, one thread starts a pop. Another thread steals that pop, then deletes
>> the object. Later, an allocation gets a new node at the same address as the
>> deleted node. That newly allocated node makes its way through the queue to
>> eventually become visible to that first thread's still in-progress pop. (So
>> this is an SMR bug. You generally can't delete an object while some other
>> thread might be looking at it.)
>> 
>> GlobalCounter is not a locking mechanism. It is an RCU-style synchronization
>> mechanism, so related to but different from RWLocks. In particular, readers
>> (threads in a critical section) never block due to this mechanism - only
>> write_synchronize blocks.
>> 
>> A problem with using GlobalCounter in that simplistic way is that once the
>> queue is "full", the one-in-one-out policy is going to have every allocation
>> hit GlobalCounter::write_synchronize (a potentially somewhat expensive
>> operation, since it needs to iterate over all threads), at least until the
>> queue is bulk drained. Switching over to a one-in-N-out policy could ameliate
>> that by batching the synchronizes over several nodes, and also remove the need
>> for complete bulk draining. Have min/max queue size and switching between
>> insert-only and one-in-N-out policies depending on the current size seems like
>> a possible solution.
>
> Thanks for all the details, I hadn't considered the SMR angle. I'll think about alternatives.

In a brief conversation with Kim, where I bemoaned that there should be a simpler solution, he suggested maybe a fixed ring buffer with a xchg to replace the n'th element, like:

const uint N = ...;
Symbol* volatile _delay_queue[N]; // initialize to nullptr
volatile uint _index = 0;

void add(Symbol* s) {
  ... increment refcount for s 
  uint i = Atomic::add(&_index, 1u) % N;
  Symbol* old = Atomic::xchg(&_delay_queue[i], s);
  if (old != nullptr) {
    ... decrement refcount for old, possibly deleting it
  }
}

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1385484881