Busy-loop on Reference.get and concurrent cycle

Aleksey Shipilev shade at redhat.com
Wed Sep 6 10:20:02 UTC 2017


Hi,

I think we have a problem with code that busy-waits on Reference.get, like:

 WeakReference<Object> ref;

 while (ref.get() != null) {
   // burn!
 }

The trouble is that Shenandoah only normally does concurrent cycles. When concurrent cycle is done,
we enable SATB during the concurrent mark to capture the destructive writes that disconnect parts of
reachable heap. Everything is good so far.

Now enters Reference.get(). When reference processing is enabled, concurrent mark does not follow
through weak references, but instead records them for further processing later. This is done to
avoid making referents strongly reachable prematurely, otherwise weak references would degrade to
strong references.

Now enters a peculiar bug. Suppose at the start of marking the referent was weakly reachable. While
concurrent mark is happening, the mutator gets that referent and stores it somewhere in a strongly
reachable object, thus making it strongly reachable. But, as far as concurrent mark and reference
processing is concerned, that object is still *weakly* reachable, and thus is subject to
reclamation, which gives you a dangling pointer. Regular SATB does not capture this, because it
records *old* values before the store, not the new ones. G1 had discovered it, and fixed with adding
the *result* of Reference.get() to a SATB [1], thus guaranteeing (with some overkill) that the *new*
pointer is discovered via SATB. Shenandoah had followed suit. The dangling pointer danger is
averted: the polled referent is always alive, in case we store it somewhere.

Now it doubles back to Shenandoah. Since the *only* normal cycle we do is the concurrent cycle, that
means SATB is always enabled when we process references. Which means if mutator calls Reference.get
before the concurrent mark is finished, the referent would be deemed alive. In the code above, when
application is running in a busy loop, it is virtually guaranteed. Note that it is irrelevant for
current code that mutator only calls get() for the null-check: the SATB code fires on referent read.

Thus, we never reclaim this reference, and the whole thing livelocks. (I have jtreg test like that).
The only way to get out of this now is making a Full STW GC that will do the mark without SATB
enabled. I am at loss how to fix this properly in Shenandoah. Ideas?

Thanks,
-Aleksey

[1] https://bugs.openjdk.java.net/browse/JDK-7009266



More information about the shenandoah-dev mailing list