an opencl binding - zcl/panama
Michael Zucchi
notzed at gmail.com
Tue Jan 28 01:13:14 UTC 2020
On 27/1/20 11:07 pm, Maurizio Cimadamore wrote:
>
> On 26/01/2020 04:52, Michael Zucchi wrote:
>>
>> So to get a cl_int (conveniently cl types match java types and are
>> platform independent) in c it's trivial:
>>
>> cl_int value;
>> clGetPlatformInfo(plat, name, sizeof(int), &value, NULL);
>> use value
>
> So, here's my thinking on this topic; right now jextract is pretty
> good at dealing with foreign functions - generating a static wrapper
> around each native function declared in the header. It also tries, for
> structs, to generate layouts and pairs of getters/setters (one per
> field).
>
> But there's a gap (which was also present in the old jextract) when it
> comes to primitive data types - many well-behaved libraries (opencl
> and OpenGL comes to mind - but there are more) define the 'vocabulary'
> of types they are going to work on, using a bunch of typedefs.
> Typically (but I guess not always) these types are defined in a way so
> that they are portable across platforms.
>
Even if they aren't ... an extractor knows the type sizes. This is btw
specifically why I asked about platform-independent memorylayouts for an
abi. At the moment a generator will have to work out which 'abi' type
corresponds to a 64-bit-unsigned-integer rather than just saying
'64-bit-unsigned-integer' - which is the information it actually knows.
For an api with defined sizes this means it can't even generate
platform-independent bindings since i has to specify the abi sizes every
time. But it seems from jextract output this isn't even being
considered since it includes SysV as of now.
> Since jextract drops such typedefs on the floor, you get no benefit
> there. E.g. as the user, you have to work out that CL_INT is really
> C_INT (but is it, really?). Or that "cl_platform_info" is just
> C_POINTER. This is suboptimal IMHO. As we try to wrap struct access
> and function access, I think jextract should similarly auto-generate
> layouts and accessors for these 'basic types'. This means that users
> will be able to do:
>
> segment = MemorySegment.allocateNative(CL_INT);
> cl_int$set(segment, 42);
>
> I'm not saying this will not remove _all_ the boilerplate in your
> Native.java - but I think this will go a long way to make bindings
> more usable than they are right now. And, as an added bonus, I believe
> that a client using these bindings will be more portable as well (to
> work on a new platform you probably just have to tweak the set of
> static imports - but the bulk of the code should remain valid, since
> the code speaks in terms of CL_INT and not interms of plain ABI types).
>
I agree there definitely needs to be some mechanism for relating
type-size information to the java environment where it isn't fixed. I
know it was just an example but you can't change the case because CL_INT
might be a constant and being inconsistent with C is a cognitive overhead.
[just some opinion stuff here: I agree the idea is necessary and that
approach is sound, so these pet peeves are just that]
I personally detest the $ syntax but what can you do.
vp = alloc(4);
v = getInt(vp);
vp = alloc(cl_int);
v = cl_int$get(vp)
It doesn't look much longer but c_int$get() is much harder to type both
for the _ and the $.
And that has to be done very time it's used, compared to the
'boilerplate' which was written once and forgotten about. Infact /I
would probably still/ create a bunch of easier to type methods to save
me typing and make the code more readable. Java has no pre-processor
and it's not like the jvm will care.
Would it also provide indexed accessors of at least 1 dimension?
vp = alloc(4 * length)
v = getInt(vp, i);
vs
vp = alloc(cl_int.byteSize() * length)
v = cl_int$get(i);
vs
vp = alloc(MemoryLayout.ofSequence(cl_int, length))
v = cl_int$get(i);
Presumably the last would be the most 'recommended'?
[end opinion]
re: your other email about float/long indexed operations. Sorry I can't
remember if I was just thinking of the float[] stuff or had another test
I threw away, I think it was a combination and I just changed the code
to use float later.
Regardless, here's the same for long[]. it's not that much different
apart from bytebuffer over segment:
0.244465929 array
0.372150498 bb stream
1.705681232 segment
0.246059758 bb index
2.117147168 bb over segment
Maybe the full-buffer processing just hits a particularly optimised
hotspot path or something. Or my microbenchmark is flawed, it would be
nice to know if so.
And while creating that I also spotted a paste-o bug in the float[] one
where I was only processing 1/2 the MemorySegment which demonstrates
that bug you mention:
0.681415255 array
0.684985190 bb stream
7.169770986 segment
0.681508681 bb index
1.883210464 bb over segment
As I said I don't really want to belabour this detail so early in
development, and I've got some jextract issues which are more interesting.
I would however be generally curious as to the intended and/or
recommended approach to processing array oriented data that is stored in
a MemorySegment. Copy it all to an array? Copy in chunks? What about
mapping to a Stream of primitives or record-types for e.g. numerical
processing?
Cheers,
Z
More information about the panama-dev
mailing list