RFR (14) 8235837: Memory access API refinements
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Jan 16 19:08:34 UTC 2020
On 16/01/2020 15:15, Maurizio Cimadamore wrote:
> Btw, on my machine I see lots of unrolling, but no vectorization, not
> even for ByteBuffer.
Sorry, I misread your original email - you said that to get
vectorization you updated the benchmark so that it always stored the
same value. I indeed do get vectorization in that case - but I also get
"vmovdqu" to be generated if I change the memory segment benchmark to do
the same thing - and get again similar perf numbers:
Benchmark Mode Cnt Score Error Units
LoopOverNew.buffer_loop avgt 30 0.418 ? 0.004 ms/op
LoopOverNew.segment_loop avgt 30 0.415 ? 0.001 ms/op
LoopOverNew.unsafe_loop avgt 30 0.396 ? 0.002 ms/op
Also, please do take into account that the bytebuffer benchmark is
giving good numbers, but it does so by 'cheating' and use the Unsafe way
to force cleanup of the off-heap memory (see call to
Unsafe::invokeCleaner). If we remove that call (not to depend on
Unsafe), then the numbers are quite different:
Benchmark Mode Cnt Score Error Units
LoopOverNew.buffer_loop avgt 30 2.120 ? 0.667 ms/op
Which is ~5x worse. Now, I agree with you that we should strive to
generate the best possible code (since that seems to happen for other
JDK APIs :-) ), but I think that when evaluating the performances of the
new memory API we should also factor other considerations in (such as
the cost of actually allocating a segment vs. a buffer).
Maurizio
More information about the panama-dev
mailing list