an opencl binding - zcl/panama
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Mon Jan 27 23:44:47 UTC 2020
So... float access operations are slow, as described in the bug I
linked, so I think that explains why TestMemory is slow. We should be
able to fix that soon.
But you mentioned earlier that:
> And in some other testing I found varLongPtr.[gs]et(,i) is a still a
> good bit slower than ByteBuffer
Is _some other testing_ your TestMemory.java? Or is there another test
w/o floats where you observed significantly worse performance numbers
than BBs?
Thanks
Maurizio
On 27/01/2020 23:34, Michael Zucchi wrote:
> On 27/1/20 10:06 pm, Maurizio Cimadamore wrote:
>>
>> On 27/01/2020 05:07, Michael Zucchi wrote:
>>> The break-even point here is about 16 longs so a loop is currently
>>> better for where i'm using it, and even up to 256 the time is
>>> dwarfed by allocateNative() if used. And in some other testing I
>>> found varLongPtr.[gs]et(,i) is a still a good bit slower than
>>> ByteBuffer - which I believe is the performance target.
>>
>> I think VarHandles and BB should be roughly the same - at least in
>> the Panama branch, but there are some tips and tricks to be mindful of:
>>
>> * the VarHandle should always be in a final static (you follow this
>> guideline in your Native.java)
>> * when accessing arrays, indexed accessors should be preferred to
>> single element accessor + MemroyAddress::addOffset
>> * when using indexed accessors it is important that the index being
>> passed is a "long", not an "int" or some other type (you want to make
>> sure the VarHandle invocation is 'exact').
>>
>> You seem to follow all these advices. By any chance, is "varLongPtr"
>> a var handle which accesses memory and get/set MemoryAddresses? Or
>> does it just retrieve longs (I can't find varLongPtr in the benchmark
>> you linked)? If the former, I'm pretty sure the slow down is related
>> to this:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8237349?filter=37749
>
> I don't really follow the details of that bug but I think so yes.
>
> That was just a bit of psedo-code. It's an indexed long handle and
> the types are longs not addresses in the test. I always use the
> appropriate one for the type where it's needed (e.g. never read a
> pointer into long, except for the cl_property stuff which is
> specifically using intptr_t).
>
> In TestMemory.java I have some microbenchmarks that sum a float[]
> using the main mechanisms.
>
> I get:
>
> 0.716024899 array
> 0.720067300 bb stream
> 3.739500995 segment
> 0.716123384 bb index
> 1.934859031 bb over segment
>
> I was surprised at the floatbuffer ones, last time i tried (probably 5
> years+ ago) they were iirc half the speed of an array.
>
> And the last one ... that's using
> seg.asByteBuffer().order(...).asFloatBuffer() and calling the "bb
> stream" routine.
>
> I know why the bulk operations exist but they're often a bit of a pain
> to use so decently performing iterated or strided access is still
> important. For work I often do various signal processing things on
> data and i've generally shied away from indexed buffer access and
> would use the bulk operations where I was concerned with performance,
> but it's often messy even if it does allow sharing an implementation.
>
> Z
>
More information about the panama-dev
mailing list