Initial request for review of JDK-8006572 DoubleStream.sum()/average() implementations that reduce numerical errors

Wed Nov 20 09:49:46 UTC 2013

Hi Joe,

On Nov 20, 2013, at 9:21 AM, Joe Darcy <joe.darcy at oracle.com> wrote:

> Hi Paul,
> 
> On 11/18/2013 07:38 AM, Paul Sandoz wrote:
>> Hi Joe,
>> 
>> You can use the three arg form of collect for DoublePipeline.sum/average impls, which is already used for average:
>> 
>>     public final OptionalDouble average() {
>>         double[] avg = collect(() -> new double[2],
>>                                (ll, i) -> {
>>                                    ll[0]++;
>>                                    ll[1] += i;
>>                                },
>>                                (ll, rr) -> {
>>                                    ll[0] += rr[0];
>>                                    ll[1] += rr[1];
>>                                });
>>         return avg[0] > 0
>>                ? OptionalDouble.of(avg[1] / avg[0])
>>                : OptionalDouble.empty();
>>     }
>> 
>> That would be the most expedient way.
> 
> Thanks for the tip. For the moment, I'm feeling a bit expedient and used the array-based approach in an iteration of the change:
> 
>    http://cr.openjdk.java.net/~darcy/8006572.3/
> .

In DoublePipeline:

 378     @Override
 379     public final double sum() {
 380         double[] summation = collect(() -> new double[2],
 381                                (ll, d) -> {
 382                                    Collectors.sumWithCompensation(ll, d);
 383                                },
 384                                (ll, rr) -> {
 385                                    Collectors.sumWithCompensation(ll, rr[0]);
 386                                    Collectors.sumWithCompensation(ll, rr[1]);
 387                                });
 388         return summation[0];
 389     }

I think you can replace the lambda expression at #381 with a method reference.

 410     @Override
 411     public final OptionalDouble average() {
 412         double[] avg = collect(() -> new double[3],
 413                                (ll, d) -> {
 414                                    ll[2]++;
 415                                    Collectors.sumWithCompensation(ll, d);
 416                                },
 417                                (ll, rr) -> {
 418                                    Collectors.sumWithCompensation(ll, rr[0]);
 419                                    Collectors.sumWithCompensation(ll, rr[1]);
 420                                    ll[2] += rr[2];
 421                                });
 422         return avg[0] > 0
 423                ? OptionalDouble.of(avg[0] / avg[2])
 424                : OptionalDouble.empty();
 425     }

#422 needs to be "avg[2] > 0" since you changed the location of the count. It suggests we don't have tests for a non-empty stream whose sum is zero :-)

Other non-test code looks good.

> This allows one of the kernels of the compensated summation logic to be shared in two of the use-sites.
> 
> For the test, all 6 scenarios fail on a JDK *without* the compensated summation code and all 6 pass with it; all the other java/util/streams regression tests pass with the new code too.
> 

I still strongly encourage you to remove the Iterator and replace with Stream.iterate. You can probably half the number of lines of test code.

The following:

  80     private static  Stream<Double> testBoxedStream(double base, double increment, int count) {
  81         TestDoubleIterator tdi = new TestDoubleIterator(base, increment, count);
  82         Double[] tmp = new Double[count];
  83         int i = 0;
  84         while(tdi.hasNext()) {
  85             tmp[i] = tdi.next();
  86             i++;
  87         }
  88         return Stream.of(tmp);
  89     }

can be replaced with a one liner:

  DoubleStream.iterate(base, e -> increment).limit(count).boxed();

You can abstract the creation of DoubleStream with a Suppiler:

  // Factory for double a stream of [base, increment, ..., increment] limited to a size of count
  Supplier<DoubleStream> ds = () -> DoubleStream.iterate(base, e -> increment).limit(count);

so you don't have to keep repeating the passing of the base/increment/count arguments each time:

  failures += compareUlpDifference(expectedSum, ds.get().sum(), 3);
  failures += compareUlpDifference(expectedSum, ds.get().average() getAsDouble(), 3);

  double collectorSum = ds.get().boxed().collect(Collectors.summingDouble(Function.identity());
  ...
  double collectorAvg = ds.get().boxed().collect(Collectors. averagingDouble(Function.identity()));

--

You can even abstract out the compareUlpDifference for the expected and threshold:

  Function<Double, Integer> comp = d -> compareUlpDifference(expected, d, threshold);
  ..
  failures += comp.apply(ds.get().boxed().collect(Collectors. averagingDouble(Function.identity())))

Anyway... not suggesting you do that, just wanted to point out a simple partial function trick that i have found useful that can, when used appropriately, be an effective way to reduce code boilerplate.

--

My preference is to use TestNG and have separate test methods rather than explicitly managing failures and when a failure occurs having to dig into the logs rather than looking at the test report itself (which if really good will have a link back to the source code of the failing test). I don't seem to be getting much traction when i have previously suggested this :-) i doubt i will now!

> Off-list, I've asked a numerically inclined colleague to look over the summation logic and how the compensated summation states are combined.
> 

OK.

Paul.