Against hidden read barriers in intrinsics
Roman Kennke
roman at kennke.org
Wed Jun 20 14:32:19 UTC 2018
Am 20.06.2018 um 16:23 schrieb Andrew Haley:
> We have some hand-coded AArch64 intrinsics, and they're being
> converted to the modular GC interface. However, I've noticed a
> dangerous coding practice that we should stop before it starts.
> In mainline jdk's MacroAssembler::arrays_equals, we have
>
> cmpoop(a1, a2);
> br(EQ, SAME);
> ...
>
> As of today, MacroAssembler::cmpoop() does a Brooks pointer
> dereference on *both* operands, and the following code relies on that
> for correctness. So, it's correct even for Shenandoah, but is a
> ticking bomb. If any programmer comes along and moves some code
> before that innocuous-looking cmpoop(), it'd be looking at stale
> copies of the data.
>
> When Shendandoah is merged into mainline we must not do this. Let us
> instead EXPLICITLY do
>
> resolve_for_read(IN_HEAP, a1);
> resolve_for_read(IN_HEAP, a2);
>
> ...
> cmp(a1, a2);
>
> This has the additional advantage that the read barrier is done as
> early as possible.
>
Hmm, but this would not be correct. There are two correct ways to do
object comparison with Shenandoah:
resolve_for_write(IN_HEAP, a1);
resolve_for_write(IN_HEAP, a2);
cmp(a1, a2);
I.e. *ensure* that both objects a1 and a2 are stable in to-space, then
compare. This is a fairly big hammer though: especially in the case of
potentially large arrays, this might cause copying those arrays before
comparing them. The saner alternative is:
cmp(a1, a2);
je(done);
resolve_for_read(IN_HEAP, a1);
resolve_for_read(IN_HEAP, a2);
bind(done);
I.e. it only employs read-barriers, and it only does so on the
false-path. The logic behind this is a little tricky: we want to avoid
false negatives. If we know that a1!=a2, it can only be because:
- they are truly !=
- we're comparing from- and to-space copy of the same object
we want to avoid the latter case. However, in this case we know that one
copy is already in to-space. If they are the same object, doing
read-barrier is sufficient to ensure we don't get false negatives. I
believe Aarch64 we need additional load-load-membar:
cmp(a1, a2);
je(done);
membar();
resolve_for_read(IN_HEAP, a1);
resolve_for_read(IN_HEAP, a2);
cmp(a1, a2);
bind(done);
to prevent re-ordering the RBs before the cmp.
Do you think it's better to write out all this, possibly with #if
INCLUDE_SHENANDOAHGC, in all the places where it needs to be? Or call
some abstraction layer into cmpoop() to do the right thing? Maybe better
document cmpoop() what it might or might not do, and what the user might
or might not do with it?
Roman
More information about the shenandoah-dev
mailing list