RFR: Scan remembered

Mon Jan 11 17:48:26 UTC 2021

Thank you for the quick and detailed response.  I will work on addressing the various issues you've raised.  Below, I embed a response to one of your questions.

On 1/11/21, 2:59 AM, "shenandoah-dev on behalf of Roman Kennke" <shenandoah-dev-retn at openjdk.java.net on behalf of rkennke at openjdk.java.net> wrote:

> So, if I understand correctly, you intend to piggy-back the field-address onto the SATB queue, is that right? 
> It would mean that we need to double the size of the items in the buffer. And it would also mean that SATB 
> to be turned on all the time, not just during marking. Is that correct? If that is so, I wonder if it may be better
>  to do like G1 does and use a separate queue and barrier for this? It looks to me that it is doing exactly what
>  we need here.

Here are a few aspects of the current design, some of which might change as we gather feedback from initial implementation efforts:

1. The thought was to use one bit within each pointer pushed into the SATB buffer to distinguish between the traditional SATB contents (reference values overwritten) and the new reference values (addresses of fields overwritten).  Alternatively, we might push traditional SATB values to the start of the SATB buffer and push overwritten addresses to the end of the SATB buffer.
2. The traditional SATB write barrier is only enabled while either young-gen or old-gen (or both) concurrent marking is active.
3. A lighter weight pre-write barrier is active at other times.  This lighter weight barrier does not need to pre-fetch value overwritten before performing the write so the cost of this form of the barrier is much lower than the cost of the traditional SATB pre-write barrier.
4. We have prototyped an implementation of the "enhanced pre-write barrier" for which the code is roughly.  We have not yet implemented the C1 and C2 versions of this barrier:
      if (SATB buffer has fewer than 2 available slots)
        take-the-slow-path();
      else {
        push_to_satb_as_address (address overwritten into SATB buffer);
        if (SATB is enabled) {
          satb_value = fetch overwritten value();
          push_to_satb_as_overwritten_value (satb_value);
        }
      }
5. We have looked at (but not fully digested) the G1 separate queue and write barrier.  Indeed, it is very similar to what we need and ultimately, we may borrow from and/or reuse that implementation.  However, we are also thinking that it might be more efficient to piggy-back on the existing SATB infrastructure in that this approach requires fewer dedicated registers (and/or less access to dedicated thread-local fields), fewer checks along common paths (check one buffer overflow instead of two), and smaller in-lined code for fast paths.  We have one pointer to the SATB buffer, and one count (or index) of how much room remains within the SATB buffer.  Given that old-gen and young-gen marking may happen concurrently, and that a single old-gen concurrent mark may span the time required for multiple complete young-gen GC cycles, we anticipate that the genshen SATB write barrier is likely to be enabled for a higher overall percentage of total execution time than is typical with traditional Shenandoah.  This remains to be measured on aggressively scheduled workloads.  (A genshen design objective is so support higher utilization of memory with same a given allocation rate, and without pauses.  This probably means GC concurrent marking is active a larger percentage of time.)
 6. We realize this increases the amount of data traffic funneled through the SATB buffer, so we are anticipating that we would want to increase the size of each SATB buffer.

We welcome additional thoughts and recommendations.