RFR: 8153293 - Stream API: Preserve SORTED and DISTINCT characteristics for boxed() and asLongStream() operations
Hello! Please review and sponsor the following patch: http://cr.openjdk.java.net/~tvaleev/webrev/8153293/r1/ The patch preserves more characteristics on primitive stream operations: IntStream/LongStream/DoubleStream.boxed() preserves SORTED and DISTINCT IntStream.asLongStream() preserves SORTED and DISTINCT IntStream.asDoubleStream() and LongStream.asDoubleStream() preserves SORTED (different longs can be converted into the same double, so DISTINCT is not preserved here; not sure whether this is possible for ints) Fixing the boxed() case is especially important as distinct() for primitive streams is implemented like boxed().distinct().unbox, so the actual distinct() operation cannot take the advantage of DISTINCT flag (skip the operation at all) or SORTED flag (switch to more efficient implementation). Here's the small JMH benchmark which measures the performance boost of quite common operation: sort the input numbers and leave only distinct ones: http://cr.openjdk.java.net/~tvaleev/webrev/8153293/jmh/ new Random(1).ints(size).sorted().distinct().toArray() I've got the following results. 9ea+111: Benchmark (size) Mode Cnt Score Error Units SortDistinctTest.sortedDistinct 10 avgt 30 0,612 ± 0,004 us/op SortDistinctTest.sortedDistinct 1000 avgt 30 92,848 ± 1,039 us/op SortDistinctTest.sortedDistinct 100000 avgt 30 32147,205 ± 3487,422 us/op 9ea+111 patched: Benchmark (size) Mode Cnt Score Error Units SortDistinctTest.sortedDistinct 10 avgt 30 0,435 ± 0,001 us/op SortDistinctTest.sortedDistinct 1000 avgt 30 40,555 ± 0,772 us/op SortDistinctTest.sortedDistinct 100000 avgt 30 9031,651 ± 73,956 us/op With best regards, Tagir Valeev.
Hi Tagir, good catch! I like the proposal.
(different longs can be converted into the same double, so DISTINCT is not preserved here; not sure whether this is possible for ints)
I think IntStream.asDoubleStream() can also preserve DISTINCT as different ints can't be mapped to the same double. Math.ulp((double) Integer.MIN_VALUE) ~ 4.7E-7 in contrast to Math.ulp((double) Long.MIN_VALUE) = 2048.0 So there are more than enough doubles in the vicinity of large int values. It's only when ulp get's >= 1.0 that distinct integral values need to be mapped to the same double (that happens between 1.0E15 and 1.0E16 for longs). Please anyone correct me if I'm wrong. Regards, Stefan 2016-04-01 18:25 GMT+02:00 Tagir F. Valeev <amaembo@gmail.com>:
Hello!
Please review and sponsor the following patch: http://cr.openjdk.java.net/~tvaleev/webrev/8153293/r1/
The patch preserves more characteristics on primitive stream operations: IntStream/LongStream/DoubleStream.boxed() preserves SORTED and DISTINCT IntStream.asLongStream() preserves SORTED and DISTINCT IntStream.asDoubleStream() and LongStream.asDoubleStream() preserves SORTED (different longs can be converted into the same double, so DISTINCT is not preserved here; not sure whether this is possible for ints)
Fixing the boxed() case is especially important as distinct() for primitive streams is implemented like boxed().distinct().unbox, so the actual distinct() operation cannot take the advantage of DISTINCT flag (skip the operation at all) or SORTED flag (switch to more efficient implementation).
Here's the small JMH benchmark which measures the performance boost of quite common operation: sort the input numbers and leave only distinct ones: http://cr.openjdk.java.net/~tvaleev/webrev/8153293/jmh/
new Random(1).ints(size).sorted().distinct().toArray()
I've got the following results.
9ea+111:
Benchmark (size) Mode Cnt Score Error Units SortDistinctTest.sortedDistinct 10 avgt 30 0,612 ± 0,004 us/op SortDistinctTest.sortedDistinct 1000 avgt 30 92,848 ± 1,039 us/op SortDistinctTest.sortedDistinct 100000 avgt 30 32147,205 ± 3487,422 us/op
9ea+111 patched:
Benchmark (size) Mode Cnt Score Error Units SortDistinctTest.sortedDistinct 10 avgt 30 0,435 ± 0,001 us/op SortDistinctTest.sortedDistinct 1000 avgt 30 40,555 ± 0,772 us/op SortDistinctTest.sortedDistinct 100000 avgt 30 9031,651 ± 73,956 us/op
With best regards, Tagir Valeev.
On 4 Apr 2016, at 21:28, Stefan Zobel <spliterator@gmail.com> wrote:
Hi Tagir,
good catch! I like the proposal.
Me too. +1. Extra bonus points to test, in addition to the tested characteristics, that the boxed/converted elements remain sorted/distinct :-)
(different longs can be converted into the same double, so DISTINCT is not preserved here; not sure whether this is possible for ints)
I think IntStream.asDoubleStream() can also preserve DISTINCT as different ints can't be mapped to the same double.
Yes, there are 53 bits to play with, which is more than enough to exactly represent the full range of int values. Paul.
Math.ulp((double) Integer.MIN_VALUE) ~ 4.7E-7
in contrast to
Math.ulp((double) Long.MIN_VALUE) = 2048.0
So there are more than enough doubles in the vicinity of large int values. It's only when ulp get's >= 1.0 that distinct integral values need to be mapped to the same double (that happens between 1.0E15 and 1.0E16 for longs). Please anyone correct me if I'm wrong.
Regards, Stefan
Hi Tagir, a few comments below a) IntPipeline: public final mapToObj(IntFunction<? extends U> mapper, int opFlags) should be private, not public b) IntPipeline: asDoubleStream() - as already discussed, 0 should be passed as the opFlags value to the DoublePipeline.StatelessOp constructor c) I think, the new "private final mapToObj(...)" methods can be "private" (dropping the "final") Regards, Stefan 2016-04-01 18:25 GMT+02:00 Tagir F. Valeev <amaembo@gmail.com>:
Hello!
Please review and sponsor the following patch: http://cr.openjdk.java.net/~tvaleev/webrev/8153293/r1/
The patch preserves more characteristics on primitive stream operations: IntStream/LongStream/DoubleStream.boxed() preserves SORTED and DISTINCT IntStream.asLongStream() preserves SORTED and DISTINCT IntStream.asDoubleStream() and LongStream.asDoubleStream() preserves SORTED (different longs can be converted into the same double, so DISTINCT is not preserved here; not sure whether this is possible for ints)
Fixing the boxed() case is especially important as distinct() for primitive streams is implemented like boxed().distinct().unbox, so the actual distinct() operation cannot take the advantage of DISTINCT flag (skip the operation at all) or SORTED flag (switch to more efficient implementation).
Here's the small JMH benchmark which measures the performance boost of quite common operation: sort the input numbers and leave only distinct ones: http://cr.openjdk.java.net/~tvaleev/webrev/8153293/jmh/
new Random(1).ints(size).sorted().distinct().toArray()
I've got the following results.
9ea+111:
Benchmark (size) Mode Cnt Score Error Units SortDistinctTest.sortedDistinct 10 avgt 30 0,612 ± 0,004 us/op SortDistinctTest.sortedDistinct 1000 avgt 30 92,848 ± 1,039 us/op SortDistinctTest.sortedDistinct 100000 avgt 30 32147,205 ± 3487,422 us/op
9ea+111 patched:
Benchmark (size) Mode Cnt Score Error Units SortDistinctTest.sortedDistinct 10 avgt 30 0,435 ± 0,001 us/op SortDistinctTest.sortedDistinct 1000 avgt 30 40,555 ± 0,772 us/op SortDistinctTest.sortedDistinct 100000 avgt 30 9031,651 ± 73,956 us/op
With best regards, Tagir Valeev.
Hi Tagir, another minor issue. The testFlags() methods in IntPrimitiveOpsTests / LongPrimitiveOpsTests each have a duplicated assert: IntPrimitiveOpsTests: assertFalse(IntStreams.of(1, 10).boxed().spliterator() .hasCharacteristics(Spliterator.SORTED)); LongPrimitiveOpsTests: assertFalse(LongStreams.of(1, 10).boxed().spliterator() .hasCharacteristics(Spliterator.SORTED)); The asserts for IntStreams.range(1, 10).asDoubleStream() would have to be changed to account for DISTINCTness, of course. Regards, Stefan 2016-04-01 18:25 GMT+02:00 Tagir F. Valeev <amaembo@gmail.com>:
Hello!
Please review and sponsor the following patch: http://cr.openjdk.java.net/~tvaleev/webrev/8153293/r1/
The patch preserves more characteristics on primitive stream operations: IntStream/LongStream/DoubleStream.boxed() preserves SORTED and DISTINCT IntStream.asLongStream() preserves SORTED and DISTINCT IntStream.asDoubleStream() and LongStream.asDoubleStream() preserves SORTED (different longs can be converted into the same double, so DISTINCT is not preserved here; not sure whether this is possible for ints)
Fixing the boxed() case is especially important as distinct() for primitive streams is implemented like boxed().distinct().unbox, so the actual distinct() operation cannot take the advantage of DISTINCT flag (skip the operation at all) or SORTED flag (switch to more efficient implementation).
Here's the small JMH benchmark which measures the performance boost of quite common operation: sort the input numbers and leave only distinct ones: http://cr.openjdk.java.net/~tvaleev/webrev/8153293/jmh/
new Random(1).ints(size).sorted().distinct().toArray()
I've got the following results.
9ea+111:
Benchmark (size) Mode Cnt Score Error Units SortDistinctTest.sortedDistinct 10 avgt 30 0,612 ± 0,004 us/op SortDistinctTest.sortedDistinct 1000 avgt 30 92,848 ± 1,039 us/op SortDistinctTest.sortedDistinct 100000 avgt 30 32147,205 ± 3487,422 us/op
9ea+111 patched:
Benchmark (size) Mode Cnt Score Error Units SortDistinctTest.sortedDistinct 10 avgt 30 0,435 ± 0,001 us/op SortDistinctTest.sortedDistinct 1000 avgt 30 40,555 ± 0,772 us/op SortDistinctTest.sortedDistinct 100000 avgt 30 9031,651 ± 73,956 us/op
With best regards, Tagir Valeev.
Hello! Thank you, Stefan and Paul for review. Here's updated webrev: http://cr.openjdk.java.net/~tvaleev/webrev/8153293/r2/ Changes: - all new mapToObj are private and not final - IntStream.asDoubleStream preserves distinct now - Tests fixed according to Stefan's suggestions - Additional tests added which test how sorted and distinct actually work, as Paul suggests. Values close to Integer.MAX_VALUE and Long.MAX_VALUE are tested. TreeSet<Long> and TreeSet<Double> is used to produce expected result. With best regards, Tagir Valeev. SZ> Hi Tagir, SZ> another minor issue. The testFlags() methods in IntPrimitiveOpsTests SZ> / LongPrimitiveOpsTests each have a duplicated assert: SZ> IntPrimitiveOpsTests: SZ> assertFalse(IntStreams.of(1, 10).boxed().spliterator() SZ> .hasCharacteristics(Spliterator.SORTED)); SZ> LongPrimitiveOpsTests: SZ> assertFalse(LongStreams.of(1, 10).boxed().spliterator() SZ> .hasCharacteristics(Spliterator.SORTED)); SZ> The asserts for IntStreams.range(1, 10).asDoubleStream() would have SZ> to be changed to account for DISTINCTness, of course. SZ> Regards, SZ> Stefan SZ> 2016-04-01 18:25 GMT+02:00 Tagir F. Valeev <amaembo@gmail.com>:
Hello!
Please review and sponsor the following patch: http://cr.openjdk.java.net/~tvaleev/webrev/8153293/r1/
The patch preserves more characteristics on primitive stream operations: IntStream/LongStream/DoubleStream.boxed() preserves SORTED and DISTINCT IntStream.asLongStream() preserves SORTED and DISTINCT IntStream.asDoubleStream() and LongStream.asDoubleStream() preserves SORTED (different longs can be converted into the same double, so DISTINCT is not preserved here; not sure whether this is possible for ints)
Fixing the boxed() case is especially important as distinct() for primitive streams is implemented like boxed().distinct().unbox, so the actual distinct() operation cannot take the advantage of DISTINCT flag (skip the operation at all) or SORTED flag (switch to more efficient implementation).
Here's the small JMH benchmark which measures the performance boost of quite common operation: sort the input numbers and leave only distinct ones: http://cr.openjdk.java.net/~tvaleev/webrev/8153293/jmh/
new Random(1).ints(size).sorted().distinct().toArray()
I've got the following results.
9ea+111:
Benchmark (size) Mode Cnt Score Error Units SortDistinctTest.sortedDistinct 10 avgt 30 0,612 ± 0,004 us/op SortDistinctTest.sortedDistinct 1000 avgt 30 92,848 ± 1,039 us/op SortDistinctTest.sortedDistinct 100000 avgt 30 32147,205 ± 3487,422 us/op
9ea+111 patched:
Benchmark (size) Mode Cnt Score Error Units SortDistinctTest.sortedDistinct 10 avgt 30 0,435 ± 0,001 us/op SortDistinctTest.sortedDistinct 1000 avgt 30 40,555 ± 0,772 us/op SortDistinctTest.sortedDistinct 100000 avgt 30 9031,651 ± 73,956 us/op
With best regards, Tagir Valeev.
Hi Tagir, thanks! Looks fine to me now - especially the new testSortDistinct() methods. But note: I'm only a participant, not a reviewer of any sort. Thanks, Stefan 2016-04-12 18:37 GMT+02:00 Tagir F. Valeev <amaembo@gmail.com>:
Hello!
Thank you, Stefan and Paul for review. Here's updated webrev:
http://cr.openjdk.java.net/~tvaleev/webrev/8153293/r2/
Changes: - all new mapToObj are private and not final - IntStream.asDoubleStream preserves distinct now - Tests fixed according to Stefan's suggestions - Additional tests added which test how sorted and distinct actually work, as Paul suggests. Values close to Integer.MAX_VALUE and Long.MAX_VALUE are tested. TreeSet<Long> and TreeSet<Double> is used to produce expected result.
With best regards, Tagir Valeev.
On 12 Apr 2016, at 18:37, Tagir F. Valeev <amaembo@gmail.com> wrote:
Hello!
Thank you, Stefan and Paul for review. Here's updated webrev:
Looks good. I will push for you this week. Paul.
participants (3)
-
Paul Sandoz
-
Stefan Zobel
-
Tagir F. Valeev