[foreign-memaccess] [Rev 01] RFR: Alternative scalable MemoryScope
Maurizio Cimadamore
mcimadamore at openjdk.java.net
Tue May 5 10:10:10 UTC 2020
On Tue, 5 May 2020 10:09:28 GMT, Peter Levart <plevart at openjdk.org> wrote:
>> This is an alternative MemoryScope which is more scalable when used in a scenario where child scope is frequently
>> acquired and closed concurrently from multiple threads (for example in parallel Stream.findAny())
>
> Peter Levart has updated the pull request incrementally with one additional commit since the last revision:
>
> Don't re-use acquires/releases LongAdder(s) fot duped scope
My comment here is mostly non-code-related; I've already seen the code and attempted to prove its correctness in this
email:
https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/066190.html
My concerns are:
* some of the HB relationships in the code are quite non-obvious, subtle and hard to reason about (well, at least they
were for me) - this means that, compared to current approach, there will be an high maintenance cost associated with
this
* In some of the benchmarks provided by Peter, the stream version of findAny is going several orders of magnitude
_slower_ than a serial for/each loop anyway. So, are we sure we're solving the right problem here? I.e. is there a use
case where, by reducing contention, we get back performances that would be considered *acceptable* for a
performance-savy user? Or is this something that just make something 100x worse as opposed to 1000x worse? Data is
reported below:
w/o patch
ParallelSum.find_any_stream_parallel avgt 10 1332.687 ± 733.535 ms/op
ParallelSum.find_any_stream_serial avgt 10 440.260 ± 3.110 ms/op
ParallelSum.find_first_loop_serial avgt 10 5.809 ± 0.044 ms/op
w/ patch
ParallelSum.find_any_stream_parallel avgt 10 80.280 ± 13.183 ms/op
ParallelSum.find_any_stream_serial avgt 10 317.388 ± 2.787 ms/op
ParallelSum.find_first_loop_serial avgt 10 5.790 ± 0.038 ms/op
(full email here: https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-April/066136.html)
Few questions come up naturally here - do these number reflect some intrinsic problem with the memory segment
spliterator per se, or do they reflect some more general problem with using shortcircuiting operations (such as find
any) which consume almost zero CPU time in a stream (and, worse, a parallel stream) ?
With this I'm not of course saying that the patch (and associated improvement) is not important, but I'm wondering
whether the added benefit would be worth the added maintenance cost given that, from the numbers above, it still
doesn't look like `findAny` is the way to go for processing a segment efficiently anyway?
-------------
PR: https://git.openjdk.java.net/panama-foreign/pull/142
More information about the panama-dev
mailing list