RFR (14) 8235837: Memory access API refinements

Thu Jan 16 19:08:34 UTC 2020

On 16/01/2020 15:15, Maurizio Cimadamore wrote:
> Btw, on my machine I see lots of unrolling, but no vectorization, not 
> even for ByteBuffer. 

Sorry, I misread your original email - you said that to get 
vectorization you updated the benchmark so that it always stored the 
same value. I indeed do get vectorization in that case - but I also get 
"vmovdqu" to be generated if I change the memory segment benchmark to do 
the same thing  - and get again similar perf numbers:

Benchmark                 Mode  Cnt  Score   Error  Units
LoopOverNew.buffer_loop   avgt   30  0.418 ? 0.004  ms/op
LoopOverNew.segment_loop  avgt   30  0.415 ? 0.001  ms/op
LoopOverNew.unsafe_loop   avgt   30  0.396 ? 0.002  ms/op

Also, please do take into account that the bytebuffer benchmark is 
giving good numbers, but it does so by 'cheating' and use the Unsafe way 
to force cleanup of the off-heap memory (see call to 
Unsafe::invokeCleaner). If we remove that call (not to depend on 
Unsafe), then the numbers are quite different:

Benchmark                Mode  Cnt  Score   Error  Units
LoopOverNew.buffer_loop  avgt   30  2.120 ? 0.667  ms/op

Which is ~5x worse. Now, I agree with you that we should strive to 
generate the best possible code (since that seems to happen for other 
JDK APIs :-) ), but I think that when evaluating the performances of the 
new memory API we should also factor other considerations in (such as 
the cost of actually allocating a segment vs. a buffer).

Maurizio