accumulate locally and mutatively but combine concurrently (a use case)
Brian Goetz
brian.goetz at oracle.com
Fri May 10 09:51:32 PDT 2013
The second version of freezer() is obviously simpler and less intrusive.
One nice property at the user level that it provides is that it
becomes always possible to return an immutable List, a notable lack in
the current implementation.
A freezy collector would need a combine() function that doesn't try to
be mutative, but that can easily be accomodated.
Further, it would be useful if the freeze function could selectively
freeze only the root vs all nodes. For example, in toList, I might not
bother making the leaves immutable (not only is wrapping wasted work,
but it makes the path to the data longer and makes it impossible for
combiner functions to "scoop out" the innards), but would want to make
the returned root immutable.
We can use a characteristic bit to indicate "freezer is a no-op", which
avoids one of the nasty interactions with being used as a downstream
collector of groupingBy -- groveling through all the map values just to
apply an identity function on each one.
The interaction that we're concerned about is this one:
groupingBy(f, freezingCollector)
Here, the freeze function for groupingBy needs to freeze the values in
the map (again, a bit helps optimize away the noop cases):
freeze(Map m) {
if (downstream.isFreezy())
m.replaceAll(downstream.freeze());
// now do the map freeze
}
I think the key thing that is different that makes this practical where
it wasn't before was the presence of the characteristics set, which
enable us to optimize away some expensive no-ops.
On 5/9/2013 9:51 PM, John Rose wrote:
> How does a Collector represent the following computation:
>
> 1. Somebody passes out stream chunks to several processor threads.
>
> 2a. Each thread accumulates results into a mutable aggregate ArrayList<T>.
> 2b. (Or the aggregate could be something even cleverer that doesn't do log(N) copying of its values.)
> 2c. Each such ArrayList<T> is serially modified by thread-confined side effects.
> 2d. The JIT can understand the loop that does this.
> 2e. At the smallest scale, the writes into the arrays are performed by unrolled loops, even vectorized ones, with minimal per-element overhead.
>
> 3a. Each thread produces a finished ArrayList<T> aggregate of results for roll-up.
> 3b. Each thread "freezes" its result aggregate before handoff. The resulting frozen aggregate is (1) thread-safe and (2) immutable.
> 3c. (For some mutable aggregates, the freezing step could also include operations like right-sizing or balancing.)
> 3d. The freezing operation includes a suitable memory fence operation to settle all side effects to the aggregate.
>
> 4. The guy(s) that passed out stream chunks waits for the resulting aggregate and combines them, using O(log N) pointer pasting operations.
>
> It seems to me that step 2 is well represented by a STRICTLY_MUTATIVE but not CONCURRENT Collector.
>
> And the other steps are well represented by a CONCURRENT and possibly UNORDERED one.
>
> But running step 2 with a CONCURRENT Collector would be bad, since even if each thread had its own accumulator, the accumulator would have to provide for safely concurrent access. The JVM can improve the computation with such things using biased locking and lock coarsening, but they will never be fast like a normal loop into a local array.
>
> Do we need a collector mode which says, "I accumulate locally and mutatively but I combine concurrently"?
>
> Even more: Should the transition from mutative to immutable be represented by a type shift? In Collector<T,M,C>, M is the mutable aggregate, and each thread would have an export step that freeze its M into a C value for subsequent combination (roll-up) across threads:
>
> BiFunction<M, T, M> accumulator();
> Function<M, C> freezer();
> BinaryOperator<C> combiner();
>
> (Note: "freezer" is not a real proposal!)
>
> Having the extra distinction of M vs. C will cause friction in other places, but perhaps it is an important distinction between larval and adult stages[1] of a data structure. The distinction can be erased, but even then there still needs to be (in step 3b above) an explicit operation (a Consumer<R>, the final side effect!) which freezes a value of type R.
>
> BiFunction<R, T, R> accumulator();
> Consumer<R> freezer();
> BinaryOperator<R> combiner();
>
> Given the choice I would prefer the type distinction. Even better would be for someone to point out the right way to use the current Collector API to do steps 1-4 above.
>
> – John
>
> [1] https://blogs.oracle.com/jrose/entry/larval_objects_in_the_vm
>
>
More information about the lambda-dev
mailing list