Numeric (and accumulators)

Sat Sep 15 06:23:12 PDT 2012

On 09/14/12 17:10, Brian Goetz wrote:

> These guys also need companion classes that are mutable,

Like for example Atomic{Integer,Long}? :-)

It has always been a little weird that JDK provides
immutable ones (Number}, and atomically mutable ones (AtomicX},
but not plain mutable ones. I think the main reason is that
the ugly singleton array alternative has always been available.
But the same combinatorics-reducing forces that lead
to Numeric also lead to  finally supplying these as well.
We defined Atomic{Integer,Long} to extend Number
(thus not needing a Numeric interface). We could
similarly add plain MutableInteger and MutableLong
(plus MutableDouBle and even AtomicDouble, despite the
concurrency oddities of AtomicDouble (bitwise "=="
used in CAS and floating-point "==" are not the same for double).

But there's a much better move here -- take the
opportunity to provide standardized forms of structured
accumulators. We can define classes that are updatable only via
a lambda supplied in constructor. This not only nice matches
the semantics of reductions and other bulk-op combinations,
but lends itself to encapsulation such that concurrency
style/semantics can be chosen via a factory.
Here's what it might look like for "Long":

public abstract class LongAccumulator implements Numeric<Long> {
     // Nested decl until function names straightened out
     public static interface LongByLongToLong { long apply(long a, long b); }

     // factories for standard implementations
     public static LongAccumulator sequentialAccumulator
         (long initialValue, LongByLongToLong updateFunction);
     public static LongAccumulator lockedAccumulator
         (long initialValue, LongByLongToLong updateFunction);
     public static LongAccumulator optimisticAccumulator
         (long initialValue, LongByLongToLong updateFunction);

     public void update(long x); // cannot return value
     public long get();
     public void reset(long initialValue);
     public long getThenReset(long initialValue);

     // plus implement Numeric....
     public long getLong();
     public int getInt();
     public short getShort();
     public byte getByte();
     public double getDouble();
     public float getFloat();
     public Number getNumber();

     // More factories for convenience/efficency

     public static LongAccumulator sequentialSum();
     public static LongAccumulator lockedSum();
     public static LongAccumulator optimisticSum();

     public static LongAccumulator sequentialMinimum();
     public static LongAccumulator lockedMinimum();
     public static LongAccumulator optimisticMinimum();

     public static LongAccumulator sequentialMinimum();
     public static LongAccumulator lockedMaximum();
     public static LongAccumulator optimisticMaximum();

     protected LongAccumulator(); // extensibility hook
}

(As is often the case, there seems to be no perfect
naming convention factory methods. Suggestions for improvment
would be welcome.)

The "optimistic" versions has the property
that the update function could be invoked more than once
(on CAS failure). Rather than pure CAS based though, these
should be a refactoring of the jsr166e LongAdder etc classes,
to build in better scalability. They turn out to be very useful
in cases where you need to do combinations that cannot easily
be arranged as structured parallel reductions -- they add some
per-update overhead (CAS vs plain write) but self-adjust space
to reduce contention among threads to near-optimal
levels (at the expense of dynamically adding a non-trivial
amount of space overhead, but only when needed -- basically they
save you from having to create multiple reduction targets yourself).

In some testing I've done using them; as in:
   for each element x in parallel { adder.update(x); }
is only around 25% slower than clever reduction schemes even
on machines with lots of cores and updates. This is worth
avoiding when possible, but the option is worth providing
for the cases where people don't know of an appropriate reduction.

The "protected" no-op ctor, still allows people to add
further variants like plain CAS-based "atomic".

Additionally, we'd need a non-numeric generics-based version.
A pure interface one would be easy, nut I think that even
here, an abstract class version works out better.
With interfaces, you'd be tempted to make
LongAccumulator etc extend it, which hits enough overload/override
snags to be unworkable. Additionally, it wouldn't
support such convenient use of pre-supplied forms.
So:

public abstract class Accumulator<T> {
     // Nested decl until function names straightened out
     public static interface BiFun<A,B,T> { T apply(A a, B b); }

     public void update(T x);
     public T get();
     public void reset(T initialValue);
     public T getThenReset(T initialValue);

     public static <T> Accumulator sequentialAccumulator
         (T initialValue, BiFun<T,T,T> updateFunction);
     public static <T> Accumulator lockedAccumulator
         (T initialValue, BiFun<T,T,T> updateFunction);
     public static <T> Accumulator optimisticAccumulator
         (T initialValue, BiFun<T,T,T> updateFunction);

     protected Accumulator(); // extensibility hook
}

All of these might live in package java.util.functions?

-Doug