[foreign-memaccess+abi] RFR: 8264933: Improve stream support in memory segments
Maurizio Cimadamore
mcimadamore at openjdk.java.net
Fri Apr 9 12:08:20 UTC 2021
On Fri, 9 Apr 2021 09:53:44 GMT, Rémi Forax <github.com+828220+forax at openjdk.org> wrote:
> I fail to see how a Spliterator that asks a stream to do has many recursive calls/creates as many sub-segments has the number of elements is a good idea ?
This is something that has been present since the second iteration of the API. And, it can be used quite effectively. We have benchmarks (see ParallelSum) which use fork join recursive actions to do a parallel sum of the contents of a segment. Provided that the "size of the element" is chosen appropriately, the speedup obtained is rather nice:
Benchmark Mode Cnt Score Error Units
ParallelSum.segment_serial avgt 30 86.004 ? 0.941 ms/op
vs:
Benchmark Mode Cnt Score Error Units
ParallelSum.segment_stream_parallel avgt 30 45.211 ? 1.105 ms/op
ParallelSum.segment_stream_parallel_bulk avgt 30 23.057 ? 0.353 ms/op
Here we compare a serial sum of all the elements in a segment (with a flat for loop) against a parallel sum using parallel streams; in the first parallel benchmark, we use a split size that is the same as the element size (e.g. 4 bytes); while this improves throughput, it is not ideal as too many intermediate segments are created. But that's why the spliterator/stream methods accept an element layout: you can also specify a "bulk element" (e.g. 1024 ints each) and then process these in parallel. As you can see the speed up increases another 2x by doing this.
Considering how little code is needed to write this:
@Benchmark
public int segment_stream_parallel_bulk() {
return segment.parallelStream(ELEM_LAYOUT_BULK).mapToInt(SEGMENT_TO_INT_BULK).sum();
}
I think this makes sense; of course it's not a silver bullet, and has to be handled with care, but here we assume the audience knows what they are doing.
Apart from parallel processing, turning a segment into a stream of slices is also useful to perform ad-hoc marshalling/unmarshalling - as written in the panama-dev email:
segment.stream(C_POINTER)
.map(MemoryAccess::getAddress)
.map(CLinker::toJavaString)
.toArray(String[]::new);
In this case, we're not after performances - we just want to express more directly what would otherwise be expressed using a big for loop.
-------------
PR: https://git.openjdk.java.net/panama-foreign/pull/494
More information about the panama-dev
mailing list