Collectors.toSet() small performance improvement
Tagir F. Valeev
amaembo at gmail.com
Tue Aug 9 05:31:05 UTC 2016
Hello!
I'd like to propose a simple performance patch for Collectors.toSet():
--- a/src/java.base/share/classes/java/util/stream/Collectors.java Tue Aug 02 07:19:06 2016 +0530
+++ b/src/java.base/share/classes/java/util/stream/Collectors.java Tue Aug 09 11:47:20 2016 +0700
@@ -295,7 +295,12 @@
public static <T>
Collector<T, ?, Set<T>> toSet() {
return new CollectorImpl<>((Supplier<Set<T>>) HashSet::new, Set::add,
- (left, right) -> { left.addAll(right); return left; },
+ (left, right) -> {
+ if (left.size() < right.size()) {
+ right.addAll(left); return right;
+ }
+ left.addAll(right); return left;
+ },
CH_UNORDERED_ID);
}
This will add the smaller set to the larger which may improve parallel
performance when subtasks produced uneven results. In normal case the
additional overhead is insignificant (additional HashSet size
comparison per each combine operation which usually performed
~4*parallelism times).
Here's simple JDK benchmark with results (Core-i7-4702MQ 4 HT cores,
Win7, JDK 9-ea+129 64bit) which confirms the speedup:
http://cr.openjdk.java.net/~tvaleev/patches/toSet/
When data is unevenly distributed, it's significantly faster:
Original:
ToSet.toSetParallel 10000 true avgt 50 88,580 ± 0,874 us/op
ToSet.toSetParallel 100000 true avgt 50 676,243 ± 41,188 us/op
ToSet.toSetParallel 1000000 true avgt 50 9126,399 ± 83,260 us/op
Patched:
ToSet.toSetParallel 10000 true avgt 50 62,121 ± 1,207 us/op
ToSet.toSetParallel 100000 true avgt 50 556,968 ± 37,404 us/op
ToSet.toSetParallel 1000000 true avgt 50 6572,372 ± 183,466 us/op
When data is evenly distributed, no statistically significant changes
observed:
Original:
ToSet.toSetParallel 10000 false avgt 50 177,137 ± 5,941 us/op
ToSet.toSetParallel 100000 false avgt 50 2056,332 ± 87,433 us/op
ToSet.toSetParallel 1000000 false avgt 50 51864,198 ± 560,883 us/op
Patched:
ToSet.toSetParallel 10000 false avgt 50 172,606 ± 7,321 us/op
ToSet.toSetParallel 100000 false avgt 50 1922,478 ± 79,593 us/op
ToSet.toSetParallel 1000000 false avgt 50 52648,100 ± 621,526 us/op
What do you think? Should I open an issue?
With best regards,
Tagir Valeev.
More information about the core-libs-dev
mailing list