Probably a bug

Kirill A. Korinsky kirill at korins.ky
Mon Feb 17 14:59:41 UTC 2020


Good day,

I'd like to ask for advice because it looks like I've discovered something that might be related to Shenandoah bug.

I haven't got any proof that it is inside Shenandoah, nor a simple test case to reproduce it.

It appears inside Akka and you can read my hunting with Akka team here: https://github.com/akka/akka/issues/28601 <https://github.com/akka/akka/issues/28601>

As summary:
 - it appears as infinite loop inside Akka queue that is lock-free linked-queue that's implemented via getObjectVolatile(), getAndSet() and few more atomic/unsafe calls.
 - if I've enabled any debugging such as XX:+ShenandoahVerify the bug is disappear => I can't provide any hs_err_log :(
 - it exists on OpenJDK-8 from fedora 31 and at shipilev/openjdk-shenandoah:8-fastdebug
 - it is very difficult to achieve and it is very fragile. In real life, it appears only at one and bigger cluster, at my synthetic test case it requires to bootstrap an application and uses the unreachable Akka system
 - to achieve this bug I should have a lot of garbage inside heap that produced by bootstrapping an application when it builds its index. The index has size 0,5gb..1gb (and the heap is 2gb) and the size depends on DB that is continuously updating, and the bug is achievable at any possibly size of the index.
 - if I switch to G1 for example it disappears.

Right now I have two possible sources of this bug:
 - very strange race condition inside Akka.
 - a bug inside Shenandoah that is related to the missed barrier or deeper.

To eliminate or confirm Shenandoah related possibility I need some advice on how to do it because I can't prepare easy to reproduce code :(

-- 
wbr, Kirill



More information about the shenandoah-dev mailing list