performance and memory optimization of layouts
Ty Young
youngty1997 at gmail.com
Thu Aug 13 13:06:05 UTC 2020
Hi,
I took a little time to look into optimizing the performance of my
abstraction layer as FMA hasn't changed in any radical, breaking way and
I'm happy with the overall design of my abstraction layer.
In order to look into what could be optimized, I set the number of
worker threads in my JavaFX application to 1 so that Nvidia attribute
updates are done in a linear fashion and can be more easily reasoned as
to how much of a performance impact any given one has and why. I then
use Netbean's built-in profiler to view the CPU time was being taken.
Runnables to be updated are given to the worker thread pool every 500 ms.
Unsurprisingly to me, besides PCIe TX/RX attributes which supposedly are
hung up within NVML itself, the attribute that represents GPU processes
is the worst by far(see img1). This attribute is actually multiple
native function calls jammed into one attribute which all utilize arrays
of structs.
Viewing the call tree(see img2) shows that a major contributor to the
amount of this is caused by ValueLayout.equals() but there is some
self-time in the upper NativeObject.getNativeObject() and
NativeValue.ofUnsafeValueeLayout calls as well. ValueLayout.equals() is
used in a if-else chain because you need to know which NativeValue
implementation should be returned. If the layout is an integer then
return NativeInteger, for example. It is maybe possible to order this
if-else chain in a way that may return faster results without hitting
every else-if(e.g. bytes first, then integers, then longs, etc) but
that's always going to be a presumptuous, arbitrary order that may not
actually be faster in some situations.
What could be done to improve this? I can't think of any absolute fixes
but an improvement would be to extend the ValueLayout so that you have a
NumberLayout and a PointerLayout. You could then use instanceof to
presumably filter things faster and more cheaply so that the mentioned
else-if chain does not need to check for a pointer layout. The
PointerLayout specific checks could be moved to its own static method.
It's a small change, but it's presumably an improvement even if small.
Unfortunately I can't do this myself because of sealed types so here I am.
Another thing that needs optimizing is the memory allocation waste of
getting an attribute. Every call to attribute(string name) allocated a
new Optional instance which was often times used by my abstraction for a
check and then immediately discarded. I wanted to do a bunch of layout
checks to make sure that the MemoryLayout is valid, but after viewing
the amount of garbage being generated standing out like a sore thumb, I
decided to remove those checks(they are really important too). The
amount of memory wasted wasn't worth it. The answer to this is
presumably going to be value types, but it isn't clear when it's going
to be delivered.
Once again, if MemoryLayout and its extensions weren't sealed I could do
things to improve both performance and memory waste as well as fix the
other issue like attributes being factored into equality checks when it
isn't wanted. Yes, I realize I'm beating a dead horse at this point but
that dead horse is still causing issues.
Could the suggested ValueLayout changes be done, at the very least? Or
maybe somekind of equals() performance optimizations or something?
More information about the panama-dev
mailing list