an opencl binding - zcl/panama

Tue Jan 28 23:28:53 UTC 2020

On 28/01/2020 23:12, Michael Zucchi wrote:
>
> On 28/1/20 11:10 pm, Maurizio Cimadamore wrote:
>> So, I took a better look and I have some news.
>>
>> The first thing tripping the benchmark up is this:
>>
>>         int len = (int)(seg.byteSize() >>> 3);
>>
>> If you replace it with:
>>
>>         int len = ((int)seg.byteSize() >>> 3);
>>
>> Or, even better, with:
>>
>>         int len = ((int)seg.byteSize() / 8);
>>
>
> A bit pedantic perhaps but the first allows a maximum equivalent long 
> length of 0x7fffffff entries (matching long[].length max). The second 
> 0x1fffffff, and the third 0x0fffffff.
>
>>
>> Then the segment version comes out on top:
>>
>>   0.497758726 array
>>   0.836574479 bb stream
>>   0.446651107 segment
>>   0.482202441 bb index
>>   2.767206835 bb over segment
>>
>> Of course I'm not suggesting that the code you wrote doesn't make 
>> sense - I think this shows that (a) segments have the potential to be 
>> very fast but (b) we have some work to do on the VM side to smooth 
>> out the performance side of things.
>>
> Nice!  And thanks for the detail.
>
> I guess it means the bulk interface isn't really necessary if it isn't 
> otherwise more convenient or you're doing more than a copy. (with the 
> obvious caveat that bulk copies can hide internal / jvm specific 
> details like the long opcode issue).
>
>
It depends what you are doing - if all you are doing is really moving 
data from A to B, doing it in bulk is way faster than doing it element 
by element. For instance, if you want to copy a 1000 element heap array 
off heap, and you do a comparison between copying element by element and 
copying in bulk, you should see a difference. This is why the ByteBuffer 
API also has bulk get/put methods.

Maurizio

>
>