RFR(S): 8231612: 100% cpu on arm32 in Service Thread

Mon Nov 18 22:51:12 UTC 2019

> On Nov 18, 2019, at 8:10 AM, Aleksei Voitylov <aleksei.voitylov at bell-sw.com> wrote:
> 
> Hi,
> 
> please review this build-only workaround for a GCC bug [1]. The problem
> surfaced on ARM32 when new code [2] got inlined with old code. In GCC at
> RTL level there is no good distinction between pointer and int, which
> makes it hard for the aliasing code to correctly recognize base
> addresses for objects, which leads to incorrect aliasing. As a
> consequence one of the loads in this code was completely eliminated on
> ARM32 by DSE. All GCC versions I could reach are affected.
> 
> Thanks to the GCC community, the fix will appear in GCC 10 but before it
> becomes mainstream we need a workaround. The workaround is to disable
> -ftree-pre optimization on ARM32 which triggers the bug. The problem
> does not surface on x86 in this code.
> 
> issue: https://bugs.openjdk.java.net/browse/JDK-8231612
> webrev: http://cr.openjdk.java.net/~avoitylov/webrev.8231612.02/
> 
> Testing on ARM32: the problem described in 8231612 (100% CPU utilization
> in Service Thread) no longer appears, the load is not eliminated as
> observed by disassembly.
> Testing on x86_64: linux-x86_64-server-release builds OK after this change.
> 
> Thanks,
> 
> -Aleksei
> 
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462
> [2] https://bugs.openjdk.java.net/browse/JDK-8226366

This change to limit the optimization of OopStorage.cpp doesn't seem
sufficient to me, based on my reading the gcc bug thread.  From that
thread it sounds like any use of our CmpxchgByteUsingInt helper may be
miscompiled.  (And who knows what else...)

There are certainly other uses of that helper.  On that platform I
think it gets used for any Atomic::cmpxchg of a bool value; at least,
that's where I *think* OopStorage is hitting it.  (I'm guessing the
problem is the cmpxchg in has_cleanup_work_and_reset.)

Just egrepping for "cmpxchg\((true|false)" finds

./share/gc/g1/g1RemSet.cpp:      bool marked_as_dirty = Atomic::cmpxchg(true, &_contains[region], false) == false;
./share/gc/g1/g1RemSet.cpp:    return !Atomic::cmpxchg(true, &_collection_set_iter_state[region], false);
./share/gc/g1/g1RemSet.cpp:        !Atomic::cmpxchg(true, &_fast_reclaim_handled, false)) {
./share/gc/z/zRootsIterator.cpp:  if (!_claimed && Atomic::cmpxchg(true, &_claimed, false) == false) {
./share/gc/z/zRootsIterator.cpp:  if (!_claimed && Atomic::cmpxchg(true, &_claimed, false) == false) {
./share/gc/shared/oopStorage.cpp:  return Atomic::cmpxchg(false, &needs_cleanup_requested, true);
./share/gc/shared/ptrQueue.cpp:      Atomic::cmpxchg(true, &_transfer_lock, false)) {
./share/services/memTracker.cpp:  if (Atomic::cmpxchg(true, &g_final_report_did_run, false) == false) {

There may be others with non-literal new-value arguments.  And there
may be non-bool byte-sized cmpxchg's, though I don't offhand know of any.

The two in zRootsIterator don't matter for arm32 (ZGC isn't supported
on 32bit platforms), but is this problem limited to arm32?  The gcc
bug thread suggests it isn't affecting x86_32/64, and might be limited
to arm(32?).

Also, the latest comment in the gcc thread says the proposed fix
doesn't work, so maybe not (yet) fixed in gcc 10.