[foreign-jextract] RFR 8237577 minimal jextract tool on jextract API

Wed Jan 22 12:05:41 UTC 2020

On 22/01/2020 08:00, Ty Young wrote:
>
> On 1/22/20 12:46 AM, John Rose wrote:
>> On Jan 21, 2020, at 6:31 PM, Ty Young <youngty1997 at gmail.com 
>> <mailto:youngty1997 at gmail.com>> wrote:
>>>
>>> Without a higher level abstraction I don't think I'm personally 
>>> going to be using this as-is. Hopefully the jextract API really lets 
>>> me improve things otherwise I think I'm going to do everything 
>>> myself so that the bindings more closely resemble the old API.
>>
>> This is a fair observation.  It reminds me of the distinction in the
>> git tools between porcelain and plumbing[1].  Maybe we’re still working
>> on the plumbing?  But even if that’s the case it’s not too early to
>> collect observations and requests for “porcelain”.
>
>
> "plumbing" is a very apt way to describe Memory Access.
>
>
> To be clear, I've been advocating for "porcelain" for 
> awhile(relatively speaking). Like I said, I kinda knew it would turn 
> out bad and had(if it wasn't evident before) the intention to wrap it 
> in a higher level API of my own but uh... what jextract generates 
> right now is a bit of an octopus.
>
>
> It has:

>
>
> - constants(good)
>
>
> - var handles - even for struct fields(bad)

Var handles allow you to access field with whatever atomic access you 
want - in case the base getter/setter is not good.

>
>
> - memory layouts - not what I want/need(bad)
Memory layouts are necessary to instantiate structs - or also to ask 
questions like 'what is the layout of field XYZ'
>
>
> - method handles - functions themselves already exist(bad)
This probably can be hidden
>
>
> - functions - generates(good) but also generates struct field 
> setters/getters?(bad)
So... if VarHandle for struct fields are bad, but static struct field 
getter/setter are also bad... what the heck should the tool generate? (I 
obviously know what your answer is)
>
>
> So it seems kinda broken. The structs especially should probably be 
> put into a class. That alone would clean things up a bit, I think. The 
> exposure of memory method/var handles should be made an optional, non 
> default jextract switch I think.

Yes - I was waiting you to say that :-). "We need structs!"

You have went through the list of generated stuff with a very specific 
question in mind - which is: is this artifact something I'd like to use 
in my code? And I understand if the answer to many of those is "no, sorry".

But there's another question to be asked - which is missing from all the 
analysis here - is the set of generated plumbing _complete_ ? E.g. can 
you take what is being generated and use it to interact with the native 
library? And the answer there is "yes". Is it low level? Yes, of course 
- but at least is still Java. Being able to develop against a library 
using your IDE and not having to jump from javac to gcc and back to 
fixup things IMHO is a big step forward.

All the high-level moves that have been listed over the last few weeks 
are relatively obvious - but they all have a cost which you are 
unwilling to see. Let's say that we create a new class for each new 
struct - and that all static function wrappers do 
marshalling/unmarshalling of the incoming structs (to convert them to 
memory segments and back). Is it more usable? Yes, of course. Is it 
free? No. And same thing for Pointer.

To be 100% clear - here we are arguing about the difference between:

MemorySegment point = MemorySegment.allocateNative(POINT$LAYOUT);
point$x$set(point, 10);
point$y$set(point, 11);

and:

Point p = new Point();
p.x$set(10);
p.y$set(11);

Modulo cosmetic differences (e.g.in one case setters are static methods, 
in the other are instance methods), the main difference here is the fact 
that the latter appeals to nominal types, while the former does not. 
E.g. in the second example you know the thing you are working on it's a 
point, which accessor it has, so, from a Java perspective, it is harder 
to make silly mistakes (like instantiating a Point and then using the 
Circle accessors to get its fields).

<sidebar>
the first version has also a more subtle advantage: it makes it dead 
simple to see that you are effectively dealing with native code and 
off-heap allocation, which the second version completely hides. Given 
that we are essentially adding native programming abilities to the Java 
platform, and that we will see Java programs starting to adopt them, 
perhaps having the places where native interconnect happens look 
"different" might not be such a bad idea
</sidebar>

That said, what if somebody wants structs to be modeled in a slightly 
different way than what jextract does (I have, for example, zero 
confidence that we can find a getter/setter naming convention that will 
convince more than 50% of the users :-) )? Again, define an even 
higher-level abstraction, and marshal/unmarshal it into a jextract 
struct. Do you see the problem here?

With this, I'm not necessarily closing the door on "porcelaine" - but 
I'd like to stress that following porcelaine leads you to a very 
slippery slope. Say you have structs like Point above - then it becomes 
sad that you can have Point, but as soon as C code does Point* you are 
back to a MemoryAddress... so... let's add pointers!

Adding pointers is way harder than adding structs as it comes with the 
usual caveats and limitations of Java generics - you'd like to have 
something which captures the spirit of the C type (e.g. 
Pointer<Pointer<Foo>>) - but there's no good way to do that in a fully 
type-safe way. Either you go full blown Panama/foreign style and bake 
your own type tokens (LayoutType objects), or you are left with two choices:

* have raw pointers (which JNR and Graal do, for instance) - not much 
better than MemoryAddress IMHO - the C signature is still not reflected 
in the Java binding
* have "unsafe" generic pointers (I believe JavaCPP goes down this path 
- with its PointerPointer class) - where you can fool static type system 
into thinking that a pointer<A> is a pointer<B> (so that a get() 
operation can misbehave, or not do what you expect)

(and then there's the whole discussion about native arrays - should we 
have Array<Foo> or just use pointers...). Stuff like this seems a lot 
less of a slamdunk than structs might seem at first. And, if a developer 
doesn't care about all these seemingly high-level abstractions (because 
they are wrapping into an higher level API anyway), isn't all this stuff 
just... in the way?

Maybe when value types will make object creation cheap enough, adding 
things like structs and pointers will be a no brainer - right now it's 
not, and we should at least be honest about the cost of the things that 
are being proposed as possible "extensions". And, again, as I said 
several other times, I'd like to evaluate the need for such "extensions" 
after we take a good look at a sizeable corpus of extracted libraries.

>
>
> also, NVML uses a Pointer to an empty struct to represent a GPU:
>
>
> typedef struct nvmlDevice_st* nvmlDevice_t;
>
>
> which seems to be omitted from the generated class. I'm not entirely 
> sure how to handle this manually either.
AFAIK jextract never modeled these things directly (in fact with old 
jextract you got Pointer<Pointer<nvmlDevice_st>>, not 
Pointer<nvmlDevice_t>).
> I'm guessing I need to use ForeignUnsafe to get the Pointer from 
> C(hence why, in the old API it's Pointer<Pointer<nvmlDevice_st>>). I 
> guess the fact that a struct is being used is irrelevant from the eyes 
> of Memory Access, since it doesn't have field and is therefore in an 
> undefined state?
If this is, as I think, an opaque pointer, then this is just a 
MemoryAddress. You get back nvmlDevice_t as MemoryAddress from functions 
and you can pass them on to other functions.
>
>
>>
>> — John
>>
>> [1]: https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain