First cut at a card table for Shenandoah

Mon Jul 27 21:57:07 UTC 2020

Hi Bernd,

I applied your patch locally to play around and with a release build I was getting
some wild performance results which were not consistent from one run to the
next. When I ran with a fastdebug build I get this assertion 100% of the time
running some DeCapo benchmarks:

#  Internal Error (../../src/hotspot/share/opto/node.cpp:268), pid=28283, tid=16131
#  assert((int)num_edges > 0) failed: need non-zero edge count for loop progress

When I ran with -XX:-EliminateAllocations the assertion went away and as you mentioned
performance stabilized. Looking at your code changes I noticed you made
ShenandoahBarrierSetC2 a subclass of CardTableBarrierSetC2. When an object is scalar
replaced (-XX:+EliminateAllocations) the GC barriers that happen directly on the object
are removed by the `eliminate_gc_barrier` calls. ShenandoahBarrierSetC2 already had
an implementation of `eliminate_gc_barrier` so the super class implementation in
CardTableBarrierSetC2 is being missed. I modified the Shenandoah impl as follows
which resolved the performance and assertion issues for me.

void ShenandoahBarrierSetC2::eliminate_gc_barrier(PhaseMacroExpand* macro, Node* n) const {
  if (is_shenandoah_wb_pre_call(n)) {
    shenandoah_eliminate_wb_pre(n, &macro->igvn());
  }
  if (n->Opcode() == Op_CastP2X) {
    CardTableBarrierSetC2::eliminate_gc_barrier(macro, n);
  }
}

I believe a few other APIs would need to also check with the super class implementation but
for my runs to complete successfully this was the only change I needed to make.

Cheers,
Charlie Gracie

On 2020-07-27, 2:02 PM, "shenandoah-dev on behalf of Mathiske, Bernd" <shenandoah-dev-retn at openjdk.java.net on behalf of mathiske at amazon.com> wrote:

    Aditya, Thomas, Roman,

    Thank you for providing these hints, which were helpful to rule out possible root causes!
    Looking at all this and at some initial profiling results, 
    Volker Simonis suggested that -XX:-EliminateAllocations might help. And it does!
    When I use this flag, performance is "back to normal" in the short benchmark runs I have conducted so far.
    I'll run some more extensive tests, with repetitions, and report some numbers, soon.

    Bernd

    On 7/23/20, 4:32 AM, "shenandoah-dev on behalf of Roman Kennke" <shenandoah-dev-retn at openjdk.java.net on behalf of rkennke at redhat.com> wrote:

        CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

        On Thu, 2020-07-23 at 12:31 +0200, Thomas Schatzl wrote:
        > Hi,
        >
        > On 22.07.20 22:59, Roman Kennke wrote:
        > > I am not very familiar with all this stuff.
        > >
        > > You should check if the C2 optimizations for card-table-barriers
        > > kick
        > > in. IIRC, there was something that elides those barriers on stores
        > > into
        > > new objects altogether, which make up the majority of stores.
        > >
        >
        >    if you are talking about eliding write barriers for new objects
        > because they are "always" allocated in young gen, and no
        > generational
        > collector is interested in young->old references, there is no such
        > thing
        > afaik.
        >
        > No collector guarantees this "always" property: e.g. CMS may
        > directly
        > decide to put new objects into old gen for a few reasons, and for
        > parallel (and g1) it e.g. can happen that a gc right after
        > allocating
        > that object (when e.g. transitioning from native slow-path code)
        > will
        > move that object into old gen. Or simply when the object is large.
        >
        > See e.g. https://bugs.openjdk.java.net/browse/JDK-8191342
        >
        > That would still require the compiler to only apply that optimization
        > if
        > it can prove that the object is "small enough" to fit into young gen
        > in
        > any case (it is probably easy to get conservative enough values for
        > that
        > from somewhere).
        >

        Thanks Thomas for clarification! :-)

        Roman