an opencl binding - zcl/panama

Tue Jan 28 08:39:34 UTC 2020

On 28/01/2020 01:13, Michael Zucchi wrote:
> On 27/1/20 11:07 pm, Maurizio Cimadamore wrote:
>>
>> On 26/01/2020 04:52, Michael Zucchi wrote:
>>>
>>> So to get a cl_int (conveniently cl types match java types and are 
>>> platform independent) in c it's trivial:
>>>
>>> cl_int value;
>>> clGetPlatformInfo(plat, name,  sizeof(int), &value, NULL);
>>> use value
>>
>> So, here's my thinking on this topic; right now jextract is pretty 
>> good at dealing with foreign functions - generating a static wrapper 
>> around each native function declared in the header. It also tries, 
>> for structs, to generate layouts and pairs of getters/setters (one 
>> per field).
>>
>> But there's a gap (which was also present in the old jextract) when 
>> it comes to primitive data types - many well-behaved libraries 
>> (opencl and OpenGL comes to mind - but there are more) define the 
>> 'vocabulary' of types they are going to work on, using a bunch of 
>> typedefs. Typically (but I guess not always) these types are defined 
>> in a way so that they are portable across platforms.
>>
> Even if they aren't ... an extractor knows the type sizes.  This is 
> btw specifically why I asked about platform-independent memorylayouts 
> for an abi.  At the moment a generator will have to work out which 
> 'abi' type corresponds to a 64-bit-unsigned-integer rather than just 
> saying '64-bit-unsigned-integer' - which is the information it 
> actually knows.
>
> For an api with defined sizes this means it can't even generate 
> platform-independent bindings since i has to specify the abi sizes 
> every time.  But it seems from jextract output this isn't even being 
> considered since it includes SysV as of now.
We touched on this before - an extraction run will generate bindings for 
the platform on which the extraction is run. Sometimes the differences 
are just too big to be reconciled under a common class. That said, if 
you are relatively confident that the Java signatures of the functions 
involved are stable, it's not too hard to hack on the generated code and 
to make it more platform independent, by replacing ABI=specific 
constants with dynamically chosen one. (but that trick won't work every 
time).
>
>> Since jextract drops such typedefs on the floor, you get no benefit 
>> there. E.g. as the user, you have to work out that CL_INT is really 
>> C_INT (but is it, really?). Or that "cl_platform_info" is just 
>> C_POINTER. This is suboptimal IMHO. As we try to wrap struct access 
>> and function access, I think jextract should similarly auto-generate 
>> layouts and accessors for these 'basic types'. This means that users 
>> will be able to do:
>>
>> segment = MemorySegment.allocateNative(CL_INT);
>> cl_int$set(segment, 42);
>>
>> I'm not saying this will not remove _all_ the boilerplate in your 
>> Native.java - but I think this will go a long way to make bindings 
>> more usable than they are right now. And, as an added bonus, I 
>> believe that a client using these bindings will be more portable as 
>> well (to work on a new platform you probably just have to tweak the 
>> set of static imports - but the bulk of the code should remain valid, 
>> since the code speaks in terms of CL_INT and not interms of plain ABI 
>> types).
>>
>
> I agree there definitely needs to be some mechanism for relating 
> type-size information to the java environment where it isn't fixed. I 
> know it was just an example but you can't change the case because 
> CL_INT might be a constant and being inconsistent with C is a 
> cognitive overhead.
>
> [just some opinion stuff here: I agree the idea is necessary and that 
> approach is sound, so these pet peeves are just that]
>
> I personally detest the $ syntax but what can you do.
>
>  vp = alloc(4);
>  v = getInt(vp);
>
>  vp = alloc(cl_int);
>  v = cl_int$get(vp)
>
> It doesn't look much longer but c_int$get() is much harder to type 
> both for the _ and the $.

Uhm. As much as I understand your distaste for $ and _, I don't think 
it's even fair to make a comparison like that because:

1) in the first version, the size is explicit - so the user has to know 
how many bytes a CL_INT is
2) in the first version.how do I distinguish between multiple accessors 
which return a Java int, but might work on different layouts

Sorry, but I'd pick the second version any day of the week. 
Explicit-ness has a value.

>
> And that has to be done very time it's used, compared to the 
> 'boilerplate' which was written once and forgotten about.  Infact /I 
> would probably still/ create a bunch of easier to type methods to save 
> me typing and make the code more readable.  Java has no pre-processor 
> and it's not like the jvm will care.
>
> Would it also provide indexed accessors of at least 1 dimension?
>
>  vp = alloc(4 * length)
>  v = getInt(vp, i);
> vs
>  vp = alloc(cl_int.byteSize() * length)
>  v = cl_int$get(i);
> vs
>  vp = alloc(MemoryLayout.ofSequence(cl_int, length))
>  v = cl_int$get(i);
>
> Presumably the last would be the most 'recommended'?
Indexed accessors (at least for one dimensions) are something we are 
also considered adding, yes.
>
> [end opinion]
>
> re: your other email about float/long indexed operations.  Sorry I 
> can't remember if I was just thinking of the float[] stuff or had 
> another test I threw away, I think it was a combination and I just 
> changed the code to use float later.
>
> Regardless, here's the same for long[]. it's not that much different 
> apart from bytebuffer over segment:
>
>   0.244465929 array
>   0.372150498 bb stream
>   1.705681232 segment
>   0.246059758 bb index
>   2.117147168 bb over segment
>
> Maybe the full-buffer processing just hits a particularly optimised 
> hotspot path or something.  Or my microbenchmark is flawed,  it would 
> be nice to know if so.
If you have the test somewhere I'd love to take a look and maybe port it 
on top of JMH. We are looking at these performance potholes now, so it 
is a great time to report such issues.
>
> And while creating that I also spotted a paste-o bug in the float[] 
> one where I was only processing 1/2 the MemorySegment which 
> demonstrates that bug you mention:
>
>   0.681415255 array
>   0.684985190 bb stream
>   7.169770986 segment
>   0.681508681 bb index
>   1.883210464 bb over segment
>
> As I said I don't really want to belabour this detail so early in 
> development, and I've got some jextract issues which are more 
> interesting.
>
> I would however be generally curious as to the intended and/or 
> recommended approach to processing array oriented data that is stored 
> in a MemorySegment.  Copy it all to an array?  Copy in chunks?  What 
> about mapping to a Stream of primitives or record-types for e.g. 
> numerical processing?

I'd say the fastest operation would probably be to turn the segment into 
a bytebuffer and then take it from there. Array might be better (but has 
initial higher cost) if you care a lot about locality, so... it depends?

Maurizio

>
> Cheers,
>  Z
>