A few considerations (mostly) on jextract

Fri May 28 14:03:21 UTC 2021

On 17/05/2021 22:27, Maurizio Cimadamore wrote:
>
>
> On 17/05/2021 19:14, Markus Knetschke wrote:
> > On 17/5/2021 11:54, Maurizio Cimadamore wrote:
> >>> 1) the classes for structs jextract generates don't feel very oo to
> >>> me. I would like to see a class with constructor and getter/setter
> >>> methods encapsulating the MemorySegment.
> >> I hear you - this is indeed possible, at the cost of more allocation.
> >> For now we have tried to stick to a principle that jextract should not
> >> add overhead, but I tend to agree that, in the case of struct, this ends
> >> up being punitive at times (especially when you have structs nested
> >> inside structs nested inside structs).
> >>
> >> I believe at some point some other experiment will be made to make
> >> struct handling more OO (as you say), and, if we can get some help from
> >> Valhalla primitive types, maybe we won't even pay a performance price!
> > Don't get me wrong, I don't suggest removing the
> > public static  MemoryAddress xxx$get(MemorySegment seg)
> > accessors but add a MemorySegment field, a constructor, and accessors.
> > So everyone can choose if one uses the static methods or calls the constructor.
> > Bonus Points if jextract generates additional downcalls extracting the
> > MemorySegment from wrapper objects and upcall wrapper calling
> > the constructor of the wrapper objects.
> > I've tried a simple annotation processor which works well
> > when there is a solution for 2). So it's not the highest priority
> > and I guess we get at least 2 release cycles before jextract
> > reaches preview status. Enough time to experiment with multiple variants.
>
> So you are suggesting to offer both variants - both static (which is
> what we have) and object-oriented, with instance methods.
>
> Yeah - the two might definitively coexist, and you are right that, there
> are, in general, good use cases for both.
Yes so anyone who wants the maximum performance could use the static
accessors and who might accept the "performance loss" through the
object instantiation (I guess C2 could often optimize the allocation away),
could use the oo model.
>
> >
> >>> 3) I often encounter fixed-length arrays in structs holding stings.
> >>> The best way I've found to extract them safely is to first get the
> >>> start offset of the string with MemoryLayout.byteOffset(path) then
> >>> extracting the size of the struct field with
> >>> MemoryLayout.select(path).byteSize() and reading the string with
> >>> CLinker.toJavaString(struct.asSlice(offset, size)) this is very bulky
> >>> for simple things like putting ten strings from a struct into a
> >>> record. I would like to see a simpler way for this for example a
> >>> MemorySegment.asSlice(MemoryLayout, PathElement...) method. (This
> >>> would be handy too for byte array struct fields.
> >> In the new API, we have a new method in MemoryLayout like this:
> >>
> >> default MethodHandle sliceHandle(PathElement... elements) {
> >>
> >> Which I think does what you want?
> >>
> > It looks a bit better than my own public static asSlice method but I'm not sure
> > if a task simple as reading a fixed-length string of a struct should
> > be as long as
> > CLinker.toJavaString((MemorySegment)
> > getSubvolInfoIoctlArgs.sliceHandle(groupElement("name")).invoke(ioctlArg))
> > or reading a fixed number of bytes
> > ((MemorySegment)
> > getSubvolInfoIoctlArgs.sliceHandle(groupElement("parent_uuid")).invoke(ioctlArg)).toByteArray()
> > having a
> > MethodHandle(MemorySegment, PathElement...)MemorySegment
> > would be more useful because you could .bindTo() the struct and access multiple
> > fields through it. Bringing the aboth example to:
> > var getSlice = getSubvolInfoIoctlArgs.sliceHandle().bindTo(ioctlArg)
> > CLinker.toJavaString((MemorySegment) getSlice.invoke(groupElement("name"))
> > ((MemorySegment) getSlice.invoke(groupElement("parent_uuid"))).toByteArray()
> >
> > But with this, the needed cast and the try {} catch (Throwable ignored) {}
> > remain and are looking very ugly in my mind.
> > On the other hand, it's only a problem if you don't want to use jextract.
>
> I'll need to think more about this - but it seems to me that with the
> asSlice MethodHandle you want to build up as much access as you can
> before hand. For instance, you could create the sliceHandle before hand
> and stick it into a static final field (as that's the only way the MH
> will optimize properly). Then you can combine the slice method handle
> with a method handle for CLinker::toJavaString (using
> filterReturnValue). That way you can simply access the fixed length
> string by calling a single method handle, and, since the method handle
> is static, it will be optimized accordingly.
>
> In other words, path elements are static coordinates which you should
> use to build var/method handles, which you then store in static fields.
> That way access expression is optimized all the way down. Having a
> method handle whose access expression is "parametric" means that there's
> no way for the underlying access to be optimized.
This also calls for automatic generation from jextract. Sadly this
must be either a
configuration in jextract to enable this or a second accessor because
not all char[xxx]
containing strings.

I think there are two variants someone uses the Java Foreign Linker API.
The first one is the "sqlite wrapper" way where you want the best
possible performance
and jit compiler compatibility even if you have to write a bit more code.

The other one is the "I just want one piece of info on startup" this
code will mostly
never see the jit compiler. For this, I'm not sure if it's a good idea
to treat readability
with performance.
On the other side, I also see the burden of adding and supporting a second API.
In the end, I could only tell what I would like to see in the API as a consumer.

The other way is using jextract only for the "low level" stuff and
adding comfort features
with other tools. This doesn't work currently because the generated
code exposes not
enough information, I've mentioned the FunctionDescriptor already.
Another piece I miss is the type information of struct * parameter. It
would be nice
to have a way to get to the information, for example through an
annotation on the parameter.
An example would be:

struct fuse *fuse_new(struct fuse_args *args, const struct fuse_operations *op,
              size_t op_size, void *private_data);

would result in:

public static @Struct(name = "fuse") MemoryAddress fuse_new(
    @Struct(name = "fuse_args", type = fuse_args.class) Addressable args,
    @Struct(name = "fuse_operations", type = fuse_operations.class)
Addressable op,
    @Type(name = "size_t") long op_size,
    @TypePtr(name = "void") Addressable private_data
)
(the return value has no type because "fuse" is a opaque type)
with this, a code generator could generate a wrapper that adds
typesafety, and anyone using
the jextract generated code could also use the type names without
having to always look into
the source header file. This is especially useful for headers that
don't expose parameter names.
For example, the fuse read upcall
    int (*read) (const char *, char *, size_t, off_t, struct fuse_file_info *);

looks like this:

int apply(MemoryAddress x0, MemoryAddress x1, long x2, long x3,
MemoryAddress x4);

I've now also found the time to port my tests to the new preview
(after a lot of meetings about
how much code needs a rewrite because of JEP411) and I love a lot of
the changes.
Two questions:
What is now the recommended way to get a MemorySegment from
MemoryAddress parameters of
upcalls which lifetime is mostly restricted to the call after the
removal of asSegmentRestricted(long)?
Creating a new scope for each call sounds like a lot of overhead, so
should I inject an outside
managed scope into the implementations or use the scope of the Addr
(which for the upcalls is
the global scope) and if this is the way to go why is the asSegment
without scope is removed?

And the second question is there a reason why ResourceScope.Handle
isn't AutoCloseable?
The provided example would look a lot cleaner with it:

try (ResourceScope.Handle segmentHandle = segment.scope().acquire()) {
   <critical operation on segment>
}

Best regards,
Markus Knetschke