an opencl binding - zcl/panama

Tue Jan 28 01:13:14 UTC 2020

On 27/1/20 11:07 pm, Maurizio Cimadamore wrote:
>
> On 26/01/2020 04:52, Michael Zucchi wrote:
>>
>> So to get a cl_int (conveniently cl types match java types and are 
>> platform independent) in c it's trivial:
>>
>> cl_int value;
>> clGetPlatformInfo(plat, name,  sizeof(int), &value, NULL);
>> use value
>
> So, here's my thinking on this topic; right now jextract is pretty 
> good at dealing with foreign functions - generating a static wrapper 
> around each native function declared in the header. It also tries, for 
> structs, to generate layouts and pairs of getters/setters (one per 
> field).
>
> But there's a gap (which was also present in the old jextract) when it 
> comes to primitive data types - many well-behaved libraries (opencl 
> and OpenGL comes to mind - but there are more) define the 'vocabulary' 
> of types they are going to work on, using a bunch of typedefs. 
> Typically (but I guess not always) these types are defined in a way so 
> that they are portable across platforms.
>
Even if they aren't ... an extractor knows the type sizes.  This is btw 
specifically why I asked about platform-independent memorylayouts for an 
abi.  At the moment a generator will have to work out which 'abi' type 
corresponds to a 64-bit-unsigned-integer rather than just saying 
'64-bit-unsigned-integer' - which is the information it actually knows.

For an api with defined sizes this means it can't even generate 
platform-independent bindings since i has to specify the abi sizes every 
time.  But it seems from jextract output this isn't even being 
considered since it includes SysV as of now.

> Since jextract drops such typedefs on the floor, you get no benefit 
> there. E.g. as the user, you have to work out that CL_INT is really 
> C_INT (but is it, really?). Or that "cl_platform_info" is just 
> C_POINTER. This is suboptimal IMHO. As we try to wrap struct access 
> and function access, I think jextract should similarly auto-generate 
> layouts and accessors for these 'basic types'. This means that users 
> will be able to do:
>
> segment = MemorySegment.allocateNative(CL_INT);
> cl_int$set(segment, 42);
>
> I'm not saying this will not remove _all_ the boilerplate in your 
> Native.java - but I think this will go a long way to make bindings 
> more usable than they are right now. And, as an added bonus, I believe 
> that a client using these bindings will be more portable as well (to 
> work on a new platform you probably just have to tweak the set of 
> static imports - but the bulk of the code should remain valid, since 
> the code speaks in terms of CL_INT and not interms of plain ABI types).
>

I agree there definitely needs to be some mechanism for relating 
type-size information to the java environment where it isn't fixed. I 
know it was just an example but you can't change the case because CL_INT 
might be a constant and being inconsistent with C is a cognitive overhead.

[just some opinion stuff here: I agree the idea is necessary and that 
approach is sound, so these pet peeves are just that]

I personally detest the $ syntax but what can you do.

  vp = alloc(4);
  v = getInt(vp);

  vp = alloc(cl_int);
  v = cl_int$get(vp)

It doesn't look much longer but c_int$get() is much harder to type both 
for the _ and the $.

And that has to be done very time it's used, compared to the 
'boilerplate' which was written once and forgotten about.  Infact /I 
would probably still/ create a bunch of easier to type methods to save 
me typing and make the code more readable.  Java has no pre-processor 
and it's not like the jvm will care.

Would it also provide indexed accessors of at least 1 dimension?

  vp = alloc(4 * length)
  v = getInt(vp, i);
vs
  vp = alloc(cl_int.byteSize() * length)
  v = cl_int$get(i);
vs
  vp = alloc(MemoryLayout.ofSequence(cl_int, length))
  v = cl_int$get(i);

Presumably the last would be the most 'recommended'?

[end opinion]

re: your other email about float/long indexed operations.  Sorry I can't 
remember if I was just thinking of the float[] stuff or had another test 
I threw away, I think it was a combination and I just changed the code 
to use float later.

Regardless, here's the same for long[]. it's not that much different 
apart from bytebuffer over segment:

   0.244465929 array
   0.372150498 bb stream
   1.705681232 segment
   0.246059758 bb index
   2.117147168 bb over segment

Maybe the full-buffer processing just hits a particularly optimised 
hotspot path or something.  Or my microbenchmark is flawed,  it would be 
nice to know if so.

And while creating that I also spotted a paste-o bug in the float[] one 
where I was only processing 1/2 the MemorySegment which demonstrates 
that bug you mention:

   0.681415255 array
   0.684985190 bb stream
   7.169770986 segment
   0.681508681 bb index
   1.883210464 bb over segment

As I said I don't really want to belabour this detail so early in 
development, and I've got some jextract issues which are more interesting.

I would however be generally curious as to the intended and/or 
recommended approach to processing array oriented data that is stored in 
a MemorySegment.  Copy it all to an array?  Copy in chunks?  What about 
mapping to a Stream of primitives or record-types for e.g. numerical 
processing?

Cheers,
  Z