[foreign-memaccess] [Rev 01] RFR: Alternative scalable MemoryScope

Tue May 5 10:31:45 UTC 2020

On Tue, 5 May 2020 10:07:59 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> Peter Levart has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Don't re-use acquires/releases LongAdder(s) fot duped scope
>
> My comment here is mostly non-code-related; I've already seen the code and attempted to prove its correctness in this
> email:
> https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/066190.html
> 
> My concerns are:
> 
> * some of the HB relationships in the code are quite non-obvious, subtle and hard to reason about (well, at least they
>   were for me) - this means that, compared to current approach, there will be an high maintenance cost associated with
>   this
> 
> * In some of the benchmarks provided by Peter, the stream version of findAny is going several orders of magnitude
>   _slower_ than a serial for/each loop anyway. So, are we sure we're solving the right problem here? I.e. is there a use
>   case where, by reducing contention, we get back performances that would be considered *acceptable* for a
>   performance-savy user? Or is this something that just make something 100x worse as opposed to 1000x worse? Data is
>   reported below:
> 
> w/o patch
> ParallelSum.find_any_stream_parallel    avgt   10  1332.687 ± 733.535  ms/op
> ParallelSum.find_any_stream_serial      avgt   10   440.260 ±   3.110  ms/op
> ParallelSum.find_first_loop_serial      avgt   10     5.809 ±   0.044  ms/op
> 
> w/ patch
> ParallelSum.find_any_stream_parallel    avgt   10   80.280 ± 13.183  ms/op
> ParallelSum.find_any_stream_serial      avgt   10  317.388 ±  2.787  ms/op
> ParallelSum.find_first_loop_serial      avgt   10    5.790 ±  0.038  ms/op
> (full email here: https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-April/066136.html)
> 
> Few questions come up naturally here - do these number reflect some intrinsic problem with the memory segment
> spliterator per se, or do they reflect some more general problem with using shortcircuiting operations (such as find
> any) which consume almost zero CPU time in a stream (and, worse, a parallel stream) ?  With this I'm not of course
> saying that the patch (and associated improvement) is not important, but I'm wondering whether the added benefit would
> be worth the added maintenance cost given that, from the numbers above, it still doesn't look like `findAny` is the way
> to go for processing a segment efficiently anyway?

I agree that the main problem here is the constant acquire/close-ing of memory segment in Spliterator.tryAdvance(). It
is interesting to observe a slight performance improvement even in serial stream (according to above results) while the
concurrent contention in current MemoryScope is obvious in parallel stream. The above results are of course "distorted"
by the fact that there are lots of small elements in the testing stream (int == 4 bytes) so performance figures are
dominated by the overhead. But there may be valid compositions of .findAny() or such that use bigger elements so that
the overhead plays a minor role. Still, current MemoryScope could present a problem even in such compositions where the
alternative MemoryScope would be perfectly acceptable. Anyway I was also thinking of "improving" the Spliterator API
with two empty default methods that would present real improvement in such scenarios:

public interface Spliterator<T> {

    /**
     * Before a sequence of {@link #tryAdvance} calls from a thread, this method may
     * be called in the same thread to establish a context that may help in improving the
     * efficiency of subsequent {@code tryAdvance} calls. If this method is called and it returns
     * normally, then it is guaranteed that the sequence of {@code tryAdvance} calls will be
     * followed by a call to {@link #endTryAdvanceSequence} method to end such sequence.
     * The end of sequence of {@code tryAdvance} calls may be announced before all elements of
     * the Spliterator have been exhausted.
     *
     * @implNote the default implementation of this method does nothing
     * @since 15
     */
    default void beginTryAdvanceSequence() {
    }

    /**
     * If {@link #beginTryAdvanceSequence} method is called and the call returns normally before
     * a sequence of {@link #tryAdvance} calls, then this method is guaranteed to be called to end such
     * sequence and give this Spliterator a chance to clean-up what was established in
     * {@code beginTryAdvanceSequence}.
     *
     * @implNote the default implementation of this method does nothing
     * @since 15
     */
    default void endTryAdvanceSequence() {
    }

Would this be an acceptable "addtition" to Spliterator API?

-------------

PR: https://git.openjdk.java.net/panama-foreign/pull/142