performance and memory optimization of layouts

Fri Aug 14 00:32:53 UTC 2020

>
> Thanks. If Netbean's profiler really is that inaccurate then I'll give 
> that a shot.
It's not matter of how accurate it is. In general, looking at "how many 
Optionals" there are in the heap is just not a great way to approach 
things. An application creates many objects during its lifetime - you 
can try to reduce these, of course, but first you should get some sense 
of how, in general the GC is affecting the performance of your app (if 
at all), otherwise you risk spending time optimizing things that didn't 
optimizing.
>
>> There is a better way to do what you want to do: stop resisting and 
>> use jextract (which you categorically refuse to do, based on other 
>> unproven assumptions/claims) :-) :-) :-)
>>
>
> I'm sorry, but no. I 100% understand that the approach I take has 
> *many* flaws and that I am wrong in somecases, but I refuse to let 
> this idea or even an implication that jextract/raw FMA is somehow 
> perfect and my abstraction layer is terrible and/or jextract will fix 
> all my problems.
>
>
> Tthe bindings jextract creates is *unsafe*. You have *zero* checks to 
> validate that the incoming MemoryAddress is of the correct size, 
> alignment, signedness, etc. You're, as far as I can tell, instead 
> crossing your fingers and hoping that people will somehow associate a 
> seemingly non-helpfully named file, the generated *constants* class 
> file(not *layouts*), as a provider of the appropriate ValueLayouts. 
> Here is a generated NVML function binding from jextract:
>
>
> public static MethodHandle nvmlDeviceGetCount_v2$MH() {
>         return nvml_h$constants.nvmlDeviceGetCount_v2$MH();
>     }
>     public static int nvmlDeviceGetCount_v2 
> (jdk.incubator.foreign.MemoryAddress deviceCount) {
>         try {
>             return 
> (int)nvml_h$constants.nvmlDeviceGetCount_v2$MH().invokeExact(deviceCount);
>         } catch (Throwable ex) {
>             throw new AssertionError(ex);
>         }
>     }
>
>
> (The JDK version I'm using is a bit old, but nothing has changed AFAIK 
> that matters here AFAIK)
>
>
> So you have an overloaded object type(MemoryAddress) with zero checks 
> and no documentation as to where the appropriate 
> MemoryLayout(ValueLayout, actually)  to create it is to be found. If 
> this function accepted an enum value, you'd be forced, unless 
> something's change, to plug in numbers that aren't understandable 
> without documentation. You're already required to do that for the 
> return value.
>
>
I'm not crossing my finger or anything. My team and I have used jextract 
quite a lot lately, and, as I said, I think it's ok.

You probably have looked at this:

https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_jextract.md

And honestly, putting together these samples, even the most complex 
ones, hasn't really been a difficult process.

Of course using jextract doesn't provide many safety guarantees - if 
something takes a MemoryAddress you can't really tell what pointer it is 
you have to create (and same for MemorySegment). But, in reality, when 
you interact with a native library, you kind of know how to use it (the 
library comes with a documentation, after all).

So, if you accept the fact that the bindings generated by jextract are 
no more and no less than a Java representation of a set of opaque C 
functions in a JNI file somewhere, I think the rest follows - except for 
the fact that now at least everything is in one place (in Java) and 
interacting/bulding such libraries is easier and more convenient.

Of course you are free to approach this in any way you like - from where 
I look it seems to me that you never gave the new jextract a chance (for 
various reasons, some maybe good, and some maybe less good - e.g. naming 
issues).

> Sure, my bindings using my abstraction layer still have issues like 
> telling whether a NativeValue<Long> is signed, but that's a *huge* 
> reduction in possible input possibilities than a MemoryAddress. 
> Without creating a new implementation that violates the point of the 
> interface, you cannot at the very least plug in a NativeStruct or a 
> NativeArray abstraction. Some of these aren't even my fault.
>
>
> And these bindings are tied to the platform they are generated for 
> even if the library is cross-platform. Cross-platform JavaFX 
> applications using jextract, like mine, are, at the very least, more 
> work to make than should be required. There is good reason for this,I 
> know, but if the solution is to create a plugin because of a minor 
> disagreement then I'll just not use it to begin with.
>
>
> The very idea that jextract is somehow this perfect solution that'll 
> my problem(s) is a red herring. You want to talk about root causes? 
> Fine, lets talk about the root cause of why the abstraction exists in 
> the first place and is still being used: jextract wasn't in a state to 
> do what I wanted it to do at the time, in a way I wanted it, and is 
> still not*. Going by everything said, it will never be either, so the 
> only other alternative is to waste time creating a plugin that just 
> does what I could do by hand, even if doing it by hand is risky. If 
> jextract had the ability to spit out layout information for struct, 
> making bindings by hand would be less risky, but it looks like that 
> too must be done via a plugin.
>
>
> *referring to method naming and forcing API users to go through the 
> Stream API to access attribute names for a given layout, in addition 
> to everything else mentioned.

FIrst, you will never hear me claiming that jextract is a perfect silver 
bullet (that doesn't exist, otherwise we wouldn't be here). That said, 
we have been able to do quite a bit of stuff on top of it - some examples:

* full lib clang port (which is part of the jextract implementation itself)
* Sundar few weeks ago wrote an alternate jimage reader using jextract
* Henry ported the JDK NIO framework to use Panama and jextract (albeit 
Henry did use a plugin to achieve better filtering)

These are all very big and realistic experiments, we're talking many 
thousands lines of code. So obviously this stuff, even if low level, 
works. So, when you say that jextract is in a state where it's not 
usable, well, I don't think that exactly reflects reality - what you 
really want to say, maybe, is that you are unhappy because jextract 
doesn't give you exactly what you want (which is a fully type safe API). 
But then you ignore the fact that jextract doesn't give you what you 
want, in part also because we're obsessive in not introducing overhead 
at every step of the way; wrapping every struct in a class and every 
pointer into something else has a (GC) cost - I find it telling that 
some of the reasons why jextract is the way it is are connected to the 
problems you brought up when you first wrote.

>
>
> jextract couldn't even handle inner structs/unions until recently 
> ontop of the issues mentioned above and you want me to abandon my 
> abstraction layer which did and still can do things jextract couldn't? 
> I understand that this is a process and things aren't even close to 
> being finished, but I'm also being told to use it despite those facts. 
> I'm not trying to sound overly negative or ungrateful(I very much am 
> grateful) but this is a red herring.

These are called bugs - as I'm sure you know. As people take our stuff 
out for a spin, things get noticed and reported (thanks!); we'll fix 
them (in this case after 1 day). If this issue was indeed a blocker for 
you (does NVML has nested structs with _anonymous_ struct/union fields? 
I haven't checked), why didn't you report it sooner? (I'm afraid I know 
the answer).

Now, enough discussions - I can't really be of any help if I don't see 
the code. Your old repos seems to have disappeared, so, if things stay 
that way, I'm afraid I'm not gonna be able to comment much on the issues 
you brought up in your original email.

Maurizio

>
>
>>
>> Maurizio
>>
>>>
>>>
>>>
>>>>
>>>> Maurizio
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Maurizio
>>>>>>
>>>>>> On 13/08/2020 14:06, Ty Young wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>> I took a little time to look into optimizing the performance of 
>>>>>>> my abstraction layer as FMA hasn't changed in any radical, 
>>>>>>> breaking way and I'm happy with the overall design of my 
>>>>>>> abstraction layer.
>>>>>>>
>>>>>>>
>>>>>>> In order to look into what could be optimized, I set the number 
>>>>>>> of worker threads in my JavaFX application to 1 so that Nvidia 
>>>>>>> attribute updates are done in a linear fashion and can be more 
>>>>>>> easily reasoned as to how much of a performance impact any given 
>>>>>>> one has and why. I then use Netbean's built-in profiler to view 
>>>>>>> the CPU time was being taken. Runnables to be updated are given 
>>>>>>> to the worker thread pool every 500 ms.
>>>>>>>
>>>>>>>
>>>>>>> Unsurprisingly to me, besides PCIe TX/RX attributes which 
>>>>>>> supposedly are hung up within NVML itself, the attribute that 
>>>>>>> represents GPU processes is the worst by far(see img1). This 
>>>>>>> attribute is actually multiple native function calls jammed into 
>>>>>>> one attribute which all utilize arrays of structs.
>>>>>>>
>>>>>>>
>>>>>>> Viewing the call tree(see img2) shows that a major contributor 
>>>>>>> to the amount of this is caused by ValueLayout.equals() but 
>>>>>>> there is some self-time in the upper 
>>>>>>> NativeObject.getNativeObject() and 
>>>>>>> NativeValue.ofUnsafeValueeLayout calls as well. 
>>>>>>> ValueLayout.equals() is used in a if-else chain because you need 
>>>>>>> to know which NativeValue implementation should be returned. If 
>>>>>>> the layout is an integer then return NativeInteger, for example. 
>>>>>>> It is maybe possible to order this if-else chain in a way that 
>>>>>>> may return faster results without hitting every else-if(e.g. 
>>>>>>> bytes first, then integers, then longs, etc) but that's always 
>>>>>>> going to be a presumptuous, arbitrary order that may not 
>>>>>>> actually be faster in some situations.
>>>>>>>
>>>>>>>
>>>>>>> What could be done to improve this? I can't think of any 
>>>>>>> absolute fixes but an improvement would be to extend the 
>>>>>>> ValueLayout so that you have a NumberLayout and a PointerLayout. 
>>>>>>> You could then use instanceof to presumably filter things faster 
>>>>>>> and more cheaply so that the mentioned else-if chain does not 
>>>>>>> need to check for a pointer layout. The PointerLayout specific 
>>>>>>> checks could be moved to its own static method. It's a small 
>>>>>>> change, but it's presumably an improvement even if small.
>>>>>>>
>>>>>>>
>>>>>>> Unfortunately I can't do this myself because of sealed types so 
>>>>>>> here I am.
>>>>>>>
>>>>>>>
>>>>>>> Another thing that needs optimizing is the memory allocation 
>>>>>>> waste of getting an attribute. Every call to attribute(string 
>>>>>>> name) allocated a new Optional instance which was often times 
>>>>>>> used by my abstraction for a check and then immediately 
>>>>>>> discarded. I wanted to do a bunch of layout checks to make sure 
>>>>>>> that the MemoryLayout is valid, but after viewing the amount of 
>>>>>>> garbage being generated standing out like a sore thumb, I 
>>>>>>> decided to remove those checks(they are really important too). 
>>>>>>> The amount of memory wasted wasn't worth it. The answer to this 
>>>>>>> is presumably going to be value types, but it isn't clear when 
>>>>>>> it's going to be delivered.
>>>>>>>
>>>>>>>
>>>>>>> Once again, if MemoryLayout and its extensions weren't sealed I 
>>>>>>> could do things to improve both performance and memory waste as 
>>>>>>> well as fix the other issue like attributes being factored into 
>>>>>>> equality checks when it isn't wanted. Yes, I realize I'm beating 
>>>>>>> a dead horse at this point but that dead horse is still causing 
>>>>>>> issues.
>>>>>>>
>>>>>>>
>>>>>>> Could the suggested ValueLayout changes be done, at the very 
>>>>>>> least? Or maybe somekind of equals() performance optimizations 
>>>>>>> or something?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>