DoubleAccumulator and RISC-V weirdness

Aleksey Shipilev shade at redhat.com
Wed Aug 24 17:23:42 UTC 2022


(cc-ing risc-port-dev@)

Hi Doug,

A (little?) puzzle for you. We in RISC-V land are facing a weird bug. On this test:

import java.util.concurrent.atomic.DoubleAccumulator;
import java.util.function.DoubleBinaryOperator;

public class Serial {
     public static void main(String[] args) {
         for (int i = 0; i < 12000; i++) {
             test(i);
         }
     }

     static void test(int i) {
         DoubleBinaryOperator plus = (x, y) -> x + y;
         DoubleAccumulator a = new DoubleAccumulator(plus, 13.9d);
         DoubleAccumulator b = new DoubleAccumulator(plus, 13.9d);

         a.accumulate(17.5d);
         b.accumulate(17.5d);
         if (a.get() != b.get())
             throw new RuntimeException("Unexpected value, iter: " + i);

	/*
	System.out.println("==== a before: " + a.get());
	a.debug();
	System.out.println("==== b before: " + b.get());
	b.debug();

	System.out.println("!!! RESET");
	*/

         a.reset();
         b.reset();

         /*
         System.out.println("==== a after: " + a.get());
	a.debug();
         System.out.println("==== b after: " + b.get());
	b.debug();
         */

         if (a.get() != b.get()) {
             throw new RuntimeException("Unexpected value after reset, iter: " + i);
         }
     }

}

...and RISC-V machines, we have:

$ ~/test-jdk/bin/java Serial
Exception in thread "main" java.lang.RuntimeException: Unexpected value after reset, iter: 3777
	at Serial.test(Serial.java:28)
	at Serial.main(Serial.java:8)

I cannot find anything obviously wrong with RISC-V port CASes implementation, so I started 
suspecting DoubleAccumulator itself.

Here is what I know:
  - DoubleAccumulator/Striped64 use weakCompareAndSetRelease;
  - RISC-V implements weak CASes with LR/SC;
  - On our existing hardware implementations, LR/SC seem to be so weak, they spuriously fail often;
  - Replacing weak CASes implementations with strong ones (= introducing retry loops) fixes the test;

If I dump stuff from the DoubleAccumulator itself by using:

     /**
      * Show me what you got.
      */
     public void debug() {
         System.out.println("Base: " + longBitsToDouble(base));
         System.out.println("Cells: ");
         Cell[] cs = cells;
         if (cs != null) {
             for (Cell c : cells) {
                 if (c != null) {
                     System.out.println(" " + longBitsToDouble(c.value));
                 } else {
                     System.out.println(" null");
                 }
             }
         } else {
             System.out.println("  Nothing.");
         }
         System.out.println();
     }

...then "a" and "b" are structurally different, and they do produce different sums after the reset:

==== a before: 31.4
Base: 31.4
Cells:
   Nothing.

==== b before: 31.4
Base: 13.9
Cells:
  null
  17.5

!!! RESET

==== a after: 13.9
Base: 13.9
Cells:
   Nothing.

==== b after: 27.8
Base: 13.9
Cells:
  null
  13.9


So what I think happens is following:
  - In DoubleAccumulator.accumulate(), we tried to weak-CAS-add the value to base;
  - That weak-CAS spuriously failed (even in single-threaded mode!), we proceeded to create the cell;
  - DoubleAccumulator.reset() came, reset both base and the cell to identity;
  - Then we asked for DoubleAccumulator.get(), and got twice the value (base + cell);

This looks to be a bona-fide DoubleAccumulator logic bug, that we were just able to detect when weak 
CAS spuriously failed in single-threaded workload.

Shouldn't we reset cells to 0, rather than to "identity"?

diff --git a/src/java.base/share/classes/java/util/concurrent/atomic/DoubleAccumulator.java 
b/src/java.base/share/classes/java/util/concurrent/atomic/DoubleAccumulator.java
index d04b8253ee7..4bb638348a6 100644
--- a/src/java.base/share/classes/java/util/concurrent/atomic/DoubleAccumulator.java
+++ b/src/java.base/share/classes/java/util/concurrent/atomic/DoubleAccumulator.java
@@ -160,7 +160,7 @@ public class DoubleAccumulator extends Striped64 implements Serializable {
          if (cs != null) {
              for (Cell c : cs)
                  if (c != null)
-                    c.reset(identity);
+                    c.reset(0L);
          }
      }

This fixes the test on RISC-V too. On x86_64, java/util/concurrent tests also pass.

Thoughts? If you agree this is a bug, I can PR the fix :)

-- 
Thanks,
-Aleksey



More information about the riscv-port-dev mailing list