DoubleAccumulator and RISC-V weirdness
Aleksey Shipilev
shade at redhat.com
Wed Aug 24 17:23:42 UTC 2022
(cc-ing risc-port-dev@)
Hi Doug,
A (little?) puzzle for you. We in RISC-V land are facing a weird bug. On this test:
import java.util.concurrent.atomic.DoubleAccumulator;
import java.util.function.DoubleBinaryOperator;
public class Serial {
public static void main(String[] args) {
for (int i = 0; i < 12000; i++) {
test(i);
}
}
static void test(int i) {
DoubleBinaryOperator plus = (x, y) -> x + y;
DoubleAccumulator a = new DoubleAccumulator(plus, 13.9d);
DoubleAccumulator b = new DoubleAccumulator(plus, 13.9d);
a.accumulate(17.5d);
b.accumulate(17.5d);
if (a.get() != b.get())
throw new RuntimeException("Unexpected value, iter: " + i);
/*
System.out.println("==== a before: " + a.get());
a.debug();
System.out.println("==== b before: " + b.get());
b.debug();
System.out.println("!!! RESET");
*/
a.reset();
b.reset();
/*
System.out.println("==== a after: " + a.get());
a.debug();
System.out.println("==== b after: " + b.get());
b.debug();
*/
if (a.get() != b.get()) {
throw new RuntimeException("Unexpected value after reset, iter: " + i);
}
}
}
...and RISC-V machines, we have:
$ ~/test-jdk/bin/java Serial
Exception in thread "main" java.lang.RuntimeException: Unexpected value after reset, iter: 3777
at Serial.test(Serial.java:28)
at Serial.main(Serial.java:8)
I cannot find anything obviously wrong with RISC-V port CASes implementation, so I started
suspecting DoubleAccumulator itself.
Here is what I know:
- DoubleAccumulator/Striped64 use weakCompareAndSetRelease;
- RISC-V implements weak CASes with LR/SC;
- On our existing hardware implementations, LR/SC seem to be so weak, they spuriously fail often;
- Replacing weak CASes implementations with strong ones (= introducing retry loops) fixes the test;
If I dump stuff from the DoubleAccumulator itself by using:
/**
* Show me what you got.
*/
public void debug() {
System.out.println("Base: " + longBitsToDouble(base));
System.out.println("Cells: ");
Cell[] cs = cells;
if (cs != null) {
for (Cell c : cells) {
if (c != null) {
System.out.println(" " + longBitsToDouble(c.value));
} else {
System.out.println(" null");
}
}
} else {
System.out.println(" Nothing.");
}
System.out.println();
}
...then "a" and "b" are structurally different, and they do produce different sums after the reset:
==== a before: 31.4
Base: 31.4
Cells:
Nothing.
==== b before: 31.4
Base: 13.9
Cells:
null
17.5
!!! RESET
==== a after: 13.9
Base: 13.9
Cells:
Nothing.
==== b after: 27.8
Base: 13.9
Cells:
null
13.9
So what I think happens is following:
- In DoubleAccumulator.accumulate(), we tried to weak-CAS-add the value to base;
- That weak-CAS spuriously failed (even in single-threaded mode!), we proceeded to create the cell;
- DoubleAccumulator.reset() came, reset both base and the cell to identity;
- Then we asked for DoubleAccumulator.get(), and got twice the value (base + cell);
This looks to be a bona-fide DoubleAccumulator logic bug, that we were just able to detect when weak
CAS spuriously failed in single-threaded workload.
Shouldn't we reset cells to 0, rather than to "identity"?
diff --git a/src/java.base/share/classes/java/util/concurrent/atomic/DoubleAccumulator.java
b/src/java.base/share/classes/java/util/concurrent/atomic/DoubleAccumulator.java
index d04b8253ee7..4bb638348a6 100644
--- a/src/java.base/share/classes/java/util/concurrent/atomic/DoubleAccumulator.java
+++ b/src/java.base/share/classes/java/util/concurrent/atomic/DoubleAccumulator.java
@@ -160,7 +160,7 @@ public class DoubleAccumulator extends Striped64 implements Serializable {
if (cs != null) {
for (Cell c : cs)
if (c != null)
- c.reset(identity);
+ c.reset(0L);
}
}
This fixes the test on RISC-V too. On x86_64, java/util/concurrent tests also pass.
Thoughts? If you agree this is a bug, I can PR the fix :)
--
Thanks,
-Aleksey
More information about the riscv-port-dev
mailing list