performance and memory optimization of layouts

Ty Young youngty1997 at gmail.com
Thu Aug 13 13:06:05 UTC 2020


Hi,


I took a little time to look into optimizing the performance of my 
abstraction layer as FMA hasn't changed in any radical, breaking way and 
I'm happy with the overall design of my abstraction layer.


In order to look into what could be optimized, I set the number of 
worker threads in my JavaFX application to 1 so that Nvidia attribute 
updates are done in a linear fashion and can be more easily reasoned as 
to how much of a performance impact any given one has and why. I then 
use Netbean's built-in profiler to view the CPU time was being taken. 
Runnables to be updated are given to the worker thread pool every 500 ms.


Unsurprisingly to me, besides PCIe TX/RX attributes which supposedly are 
hung up within NVML itself, the attribute that represents GPU processes 
is the worst by far(see img1). This attribute is actually multiple 
native function calls jammed into one attribute which all utilize arrays 
of structs.


Viewing the call tree(see img2) shows that a major contributor to the 
amount of this is caused by ValueLayout.equals() but there is some 
self-time in the upper NativeObject.getNativeObject() and 
NativeValue.ofUnsafeValueeLayout calls as well. ValueLayout.equals() is 
used in a if-else chain because you need to know which NativeValue 
implementation should be returned. If the layout is an integer then 
return NativeInteger, for example. It is maybe possible to order this 
if-else chain in a way that may return faster results without hitting 
every else-if(e.g. bytes first, then integers, then longs, etc) but 
that's always going to be a presumptuous, arbitrary order that may not 
actually be faster in some situations.


What could be done to improve this? I can't think of any absolute fixes 
but an improvement would be to extend the ValueLayout so that you have a 
NumberLayout and a PointerLayout. You could then use instanceof to 
presumably filter things faster and more cheaply so that the mentioned 
else-if chain does not need to check for a pointer layout. The 
PointerLayout specific checks could be moved to its own static method. 
It's a small change, but it's presumably an improvement even if small.


Unfortunately I can't do this myself because of sealed types so here I am.


Another thing that needs optimizing is the memory allocation waste of 
getting an attribute. Every call to attribute(string name) allocated a 
new Optional instance which was often times used by my abstraction for a 
check and then immediately discarded. I wanted to do a bunch of layout 
checks to make sure that the MemoryLayout is valid, but after viewing 
the amount of garbage being generated standing out like a sore thumb, I 
decided to remove those checks(they are really important too). The 
amount of memory wasted wasn't worth it. The answer to this is 
presumably going to be value types, but it isn't clear when it's going 
to be delivered.


Once again, if MemoryLayout and its extensions weren't sealed I could do 
things to improve both performance and memory waste as well as fix the 
other issue like attributes being factored into equality checks when it 
isn't wanted. Yes, I realize I'm beating a dead horse at this point but 
that dead horse is still causing issues.


Could the suggested ValueLayout changes be done, at the very least? Or 
maybe somekind of equals() performance optimizations or something?











More information about the panama-dev mailing list