Initial request for review of JDK-8006572 DoubleStream.sum()/average() implementations that reduce numerical errors
Paul Sandoz
paul.sandoz at oracle.com
Wed Nov 20 09:49:46 UTC 2013
Hi Joe,
On Nov 20, 2013, at 9:21 AM, Joe Darcy <joe.darcy at oracle.com> wrote:
> Hi Paul,
>
> On 11/18/2013 07:38 AM, Paul Sandoz wrote:
>> Hi Joe,
>>
>> You can use the three arg form of collect for DoublePipeline.sum/average impls, which is already used for average:
>>
>> public final OptionalDouble average() {
>> double[] avg = collect(() -> new double[2],
>> (ll, i) -> {
>> ll[0]++;
>> ll[1] += i;
>> },
>> (ll, rr) -> {
>> ll[0] += rr[0];
>> ll[1] += rr[1];
>> });
>> return avg[0] > 0
>> ? OptionalDouble.of(avg[1] / avg[0])
>> : OptionalDouble.empty();
>> }
>>
>> That would be the most expedient way.
>
> Thanks for the tip. For the moment, I'm feeling a bit expedient and used the array-based approach in an iteration of the change:
>
> http://cr.openjdk.java.net/~darcy/8006572.3/
> .
In DoublePipeline:
378 @Override
379 public final double sum() {
380 double[] summation = collect(() -> new double[2],
381 (ll, d) -> {
382 Collectors.sumWithCompensation(ll, d);
383 },
384 (ll, rr) -> {
385 Collectors.sumWithCompensation(ll, rr[0]);
386 Collectors.sumWithCompensation(ll, rr[1]);
387 });
388 return summation[0];
389 }
I think you can replace the lambda expression at #381 with a method reference.
410 @Override
411 public final OptionalDouble average() {
412 double[] avg = collect(() -> new double[3],
413 (ll, d) -> {
414 ll[2]++;
415 Collectors.sumWithCompensation(ll, d);
416 },
417 (ll, rr) -> {
418 Collectors.sumWithCompensation(ll, rr[0]);
419 Collectors.sumWithCompensation(ll, rr[1]);
420 ll[2] += rr[2];
421 });
422 return avg[0] > 0
423 ? OptionalDouble.of(avg[0] / avg[2])
424 : OptionalDouble.empty();
425 }
#422 needs to be "avg[2] > 0" since you changed the location of the count. It suggests we don't have tests for a non-empty stream whose sum is zero :-)
Other non-test code looks good.
> This allows one of the kernels of the compensated summation logic to be shared in two of the use-sites.
>
> For the test, all 6 scenarios fail on a JDK *without* the compensated summation code and all 6 pass with it; all the other java/util/streams regression tests pass with the new code too.
>
I still strongly encourage you to remove the Iterator and replace with Stream.iterate. You can probably half the number of lines of test code.
The following:
80 private static Stream<Double> testBoxedStream(double base, double increment, int count) {
81 TestDoubleIterator tdi = new TestDoubleIterator(base, increment, count);
82 Double[] tmp = new Double[count];
83 int i = 0;
84 while(tdi.hasNext()) {
85 tmp[i] = tdi.next();
86 i++;
87 }
88 return Stream.of(tmp);
89 }
can be replaced with a one liner:
DoubleStream.iterate(base, e -> increment).limit(count).boxed();
You can abstract the creation of DoubleStream with a Suppiler:
// Factory for double a stream of [base, increment, ..., increment] limited to a size of count
Supplier<DoubleStream> ds = () -> DoubleStream.iterate(base, e -> increment).limit(count);
so you don't have to keep repeating the passing of the base/increment/count arguments each time:
failures += compareUlpDifference(expectedSum, ds.get().sum(), 3);
failures += compareUlpDifference(expectedSum, ds.get().average() getAsDouble(), 3);
double collectorSum = ds.get().boxed().collect(Collectors.summingDouble(Function.identity());
...
double collectorAvg = ds.get().boxed().collect(Collectors. averagingDouble(Function.identity()));
--
You can even abstract out the compareUlpDifference for the expected and threshold:
Function<Double, Integer> comp = d -> compareUlpDifference(expected, d, threshold);
..
failures += comp.apply(ds.get().boxed().collect(Collectors. averagingDouble(Function.identity())))
Anyway... not suggesting you do that, just wanted to point out a simple partial function trick that i have found useful that can, when used appropriately, be an effective way to reduce code boilerplate.
--
My preference is to use TestNG and have separate test methods rather than explicitly managing failures and when a failure occurs having to dig into the logs rather than looking at the test report itself (which if really good will have a link back to the source code of the failing test). I don't seem to be getting much traction when i have previously suggested this :-) i doubt i will now!
> Off-list, I've asked a numerically inclined colleague to look over the summation logic and how the compensated summation states are combined.
>
OK.
Paul.
More information about the core-libs-dev
mailing list