First cut at a card table for Shenandoah
Mathiske, Bernd
mathiske at amazon.com
Mon Jul 27 23:08:57 UTC 2020
Charlie,
This is highly appreciated. You pinpointed the mistake I made, not checking all facets of inheritance here. And yes, Op_CastP2X is implicated in the super class. Great progress. I'll check other inheritance avenues.
Many thanks!
Bernd
On 7/27/20, 2:58 PM, "Charlie Gracie" <Charlie.Gracie at microsoft.com> wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
Hi Bernd,
I applied your patch locally to play around and with a release build I was getting
some wild performance results which were not consistent from one run to the
next. When I ran with a fastdebug build I get this assertion 100% of the time
running some DeCapo benchmarks:
# Internal Error (../../src/hotspot/share/opto/node.cpp:268), pid=28283, tid=16131
# assert((int)num_edges > 0) failed: need non-zero edge count for loop progress
When I ran with -XX:-EliminateAllocations the assertion went away and as you mentioned
performance stabilized. Looking at your code changes I noticed you made
ShenandoahBarrierSetC2 a subclass of CardTableBarrierSetC2. When an object is scalar
replaced (-XX:+EliminateAllocations) the GC barriers that happen directly on the object
are removed by the `eliminate_gc_barrier` calls. ShenandoahBarrierSetC2 already had
an implementation of `eliminate_gc_barrier` so the super class implementation in
CardTableBarrierSetC2 is being missed. I modified the Shenandoah impl as follows
which resolved the performance and assertion issues for me.
void ShenandoahBarrierSetC2::eliminate_gc_barrier(PhaseMacroExpand* macro, Node* n) const {
if (is_shenandoah_wb_pre_call(n)) {
shenandoah_eliminate_wb_pre(n, ¯o->igvn());
}
if (n->Opcode() == Op_CastP2X) {
CardTableBarrierSetC2::eliminate_gc_barrier(macro, n);
}
}
I believe a few other APIs would need to also check with the super class implementation but
for my runs to complete successfully this was the only change I needed to make.
Cheers,
Charlie Gracie
On 2020-07-27, 2:02 PM, "shenandoah-dev on behalf of Mathiske, Bernd" <shenandoah-dev-retn at openjdk.java.net on behalf of mathiske at amazon.com> wrote:
Aditya, Thomas, Roman,
Thank you for providing these hints, which were helpful to rule out possible root causes!
Looking at all this and at some initial profiling results,
Volker Simonis suggested that -XX:-EliminateAllocations might help. And it does!
When I use this flag, performance is "back to normal" in the short benchmark runs I have conducted so far.
I'll run some more extensive tests, with repetitions, and report some numbers, soon.
Bernd
On 7/23/20, 4:32 AM, "shenandoah-dev on behalf of Roman Kennke" <shenandoah-dev-retn at openjdk.java.net on behalf of rkennke at redhat.com> wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On Thu, 2020-07-23 at 12:31 +0200, Thomas Schatzl wrote:
> Hi,
>
> On 22.07.20 22:59, Roman Kennke wrote:
> > I am not very familiar with all this stuff.
> >
> > You should check if the C2 optimizations for card-table-barriers
> > kick
> > in. IIRC, there was something that elides those barriers on stores
> > into
> > new objects altogether, which make up the majority of stores.
> >
>
> if you are talking about eliding write barriers for new objects
> because they are "always" allocated in young gen, and no
> generational
> collector is interested in young->old references, there is no such
> thing
> afaik.
>
> No collector guarantees this "always" property: e.g. CMS may
> directly
> decide to put new objects into old gen for a few reasons, and for
> parallel (and g1) it e.g. can happen that a gc right after
> allocating
> that object (when e.g. transitioning from native slow-path code)
> will
> move that object into old gen. Or simply when the object is large.
>
> See e.g. https://bugs.openjdk.java.net/browse/JDK-8191342
>
> That would still require the compiler to only apply that optimization
> if
> it can prove that the object is "small enough" to fit into young gen
> in
> any case (it is probably easy to get conservative enough values for
> that
> from somewhere).
>
Thanks Thomas for clarification! :-)
Roman
More information about the shenandoah-dev
mailing list