Possible long overflow in Collectors.averagingLong and LongStream.average

Tagir F. Valeev amaembo at gmail.com
Sat Aug 15 10:15:32 UTC 2015


Hello!

Just for information: as finite stream size is very unlikely to exceed
Long.MAX_VALUE (and even if exceeds it will take enormous CPU time to
process such stream), the problem can be fixed just by tracking 128
bit sum instead of 64 bit. It's not very difficult to implement it:
just two long values should be used with proper carry. Here's the
sample implementation:

https://gist.github.com/amaembo/1edf2045b687a0ac9db8

Here averageInt and averageLong collectors are implemented which work
exactly like ones available in JDK, but don't overflow. The final
division is implemented using BigDecimal only if long overflow
occurred, thus for non-overflow scenarios the overhead compared to
current implementation is quite low (would be even lower had
Long.compareUnsigned be intrinsified).

Tracking the sum in BigInteger variable would be significantly slower
and require continuous heap allocations as BigInteger is immutable.

With best regards,
Tagir Valeev.

TFV> The following code prints -1 twice, while users would normally expect
TFV> something like 9.223372036854776E18:

TFV> double avg1 = Stream.of(Long.MAX_VALUE, Long.MAX_VALUE).collect(
TFV>                 Collectors.averagingLong(Long::valueOf));
TFV> System.out.println(avg1);

TFV> double avg2 = LongStream.of(Long.MAX_VALUE, Long.MAX_VALUE).average()
TFV>                 .getAsDouble();
TFV> System.out.println(avg2);

TFV> That's because in both cases internally sum is calculated in the long
TFV> variable where it may silently overflow. The documentation for both
TFV> methods says nothing about such possibility. I guess if it's too late
TFV> to fix this behavior, then probably it should be at least properly
TFV> documented?

TFV> With best regards,
TFV> Tagir Valeev.




More information about the core-libs-dev mailing list