an opencl binding - zcl/panama
Michael Zucchi
notzed at gmail.com
Mon Jan 27 05:07:52 UTC 2020
On 27/1/20 10:50 am, Maurizio Cimadamore wrote:
>
> To clarify this point - just to make sure the message is clear - my
> observation about MemoryAddress::copy is not about code clarity - the
> performance model of the two versions is radically different. The
> former copies element one by one - the second makes a bulk transfer.
> If you do a benchmark there's just no comparison between the two
> versions.
>
>
I know it's /way too early/ to talk about performance, but well you did
bring it up and so I did some benchmarks over beer. I don't want to
belabour the point, I was just super-bored and curious.
The break-even point here is about 16 longs so a loop is currently
better for where i'm using it, and even up to 256 the time is dwarfed by
allocateNative() if used. And in some other testing I found
varLongPtr.[gs]et(,i) is a still a good bit slower than ByteBuffer -
which I believe is the performance target.
And allocateNative needs to accept zero length. Zero is a valid
size/length for everything else - malloc(), new foo[],
bytebuffer.allocate*().
code:
https://code.zedzone.space/cvs?p=zcl;a=blob;f=src/notzed.zcl.demo/classes/au/notzed/zcl/test/TestCopies.java;hb=refs/heads/foreign-abi
I know microbenchmarks are troublesome and particularly tricky with the
jvm but I think this should be valid enough to compare, in context. I
will add jmh stuff another time (i'm not that bored).
results on ryzen 3900x @ 65W:
(1<<20 loops of copying n longs to native memory, the names should be
obvious enough or see the code)
1 0.003127785 copyLoop pre-alloc
1 0.013420417 copyBulk pre-alloc
1 0.021303411 copyLoop stack
1 0.031355621 copyBulk stack
1 0.069702844 copyLoop
1 0.085855976 copyBulk
2 0.004140896 copyLoop pre-alloc
2 0.013306925 copyBulk pre-alloc
2 0.022433079 copyLoop stack
2 0.031112806 copyBulk stack
2 0.072346833 copyLoop
2 0.087935206 copyBulk
4 0.005569264 copyLoop pre-alloc
4 0.012972718 copyBulk pre-alloc
4 0.024447177 copyLoop stack
4 0.030642624 copyBulk stack
4 0.073238806 copyLoop
4 0.089866427 copyBulk
8 0.007541993 copyLoop pre-alloc
8 0.013031979 copyBulk pre-alloc
8 0.026512729 copyLoop stack
8 0.030515275 copyBulk stack
8 0.075191718 copyLoop
8 0.091164331 copyBulk
16 0.010611670 copyLoop pre-alloc
16 0.013174737 copyBulk pre-alloc
16 0.030881722 copyBulk stack
16 0.031274650 copyLoop stack
16 0.078464155 copyLoop
16 0.092814466 copyBulk
32 0.013133039 copyBulk pre-alloc
32 0.018813391 copyLoop pre-alloc
32 0.030916538 copyBulk stack
32 0.040851569 copyLoop stack
32 0.088000238 copyLoop
32 0.090662230 copyBulk
64 0.013267801 copyBulk pre-alloc
64 0.031037295 copyBulk stack
64 0.034713629 copyLoop pre-alloc
64 0.060002694 copyLoop stack
64 0.099641809 copyBulk
64 0.110758427 copyLoop
128 0.013510577 copyBulk pre-alloc
128 0.031517445 copyBulk stack
128 0.072396035 copyLoop pre-alloc
128 0.105774349 copyLoop stack
128 0.115714109 copyBulk
128 0.160517851 copyLoop
256 0.014341926 copyBulk pre-alloc
256 0.032109265 copyBulk stack
256 0.133029261 copyLoop pre-alloc
256 0.183378902 copyLoop stack
256 0.186902259 copyBulk
256 0.292598473 copyLoop
More information about the panama-dev
mailing list