performance and memory optimization of layouts

Ty Young youngty1997 at gmail.com
Thu Aug 13 14:53:38 UTC 2020


On 8/13/20 8:46 AM, Maurizio Cimadamore wrote:
> I can no longer find your repository.
>
> I think I've suggested something in the past related to a similar 
> issue, not sure if you acted on in or not.
>
> Basically, the suggestion was to define a set of your own layout 
> constants, which contained a special attribute which could be used for 
> deciding whether something is a NativeInteger, or something else. This 
> is the same approach used by the ABI layer and works very well.
>
> With something like that there is no need to do an equals() - you just 
> have to get the value of a well-known attribute (e.g. lookup in an 
> HashMap).


I am doing constants for layouts already.


Regardless, doing this still generates a lot of garbage and presumably 
isn't efficient CPU wise either since you're accessing a HashMap 
under-the-hood. Again, this is being done in order to make sense of 
struct fields in quick succession. Each struct field needs safety 
attribute checks, which have to check if certain attributes of a 
ValueLayout exists(e.g. class, handle, type, etc). Each struct is stored 
in an array and there are multiple arrays of structs.


Without generating garbage and taking whatever CPU time the HashMap 
accessing takes, I can't see a way of doing this without changes from 
FMA's end. What you're suggesting, if I'm understanding correctly, can 
only be done with the least amount of garbage and CPU time if 
ValueLayout was extended so that an instanceof check could be used.






>
> Maurizio
>
> On 13/08/2020 14:06, Ty Young wrote:
>> Hi,
>>
>>
>> I took a little time to look into optimizing the performance of my 
>> abstraction layer as FMA hasn't changed in any radical, breaking way 
>> and I'm happy with the overall design of my abstraction layer.
>>
>>
>> In order to look into what could be optimized, I set the number of 
>> worker threads in my JavaFX application to 1 so that Nvidia attribute 
>> updates are done in a linear fashion and can be more easily reasoned 
>> as to how much of a performance impact any given one has and why. I 
>> then use Netbean's built-in profiler to view the CPU time was being 
>> taken. Runnables to be updated are given to the worker thread pool 
>> every 500 ms.
>>
>>
>> Unsurprisingly to me, besides PCIe TX/RX attributes which supposedly 
>> are hung up within NVML itself, the attribute that represents GPU 
>> processes is the worst by far(see img1). This attribute is actually 
>> multiple native function calls jammed into one attribute which all 
>> utilize arrays of structs.
>>
>>
>> Viewing the call tree(see img2) shows that a major contributor to the 
>> amount of this is caused by ValueLayout.equals() but there is some 
>> self-time in the upper NativeObject.getNativeObject() and 
>> NativeValue.ofUnsafeValueeLayout calls as well. ValueLayout.equals() 
>> is used in a if-else chain because you need to know which NativeValue 
>> implementation should be returned. If the layout is an integer then 
>> return NativeInteger, for example. It is maybe possible to order this 
>> if-else chain in a way that may return faster results without hitting 
>> every else-if(e.g. bytes first, then integers, then longs, etc) but 
>> that's always going to be a presumptuous, arbitrary order that may 
>> not actually be faster in some situations.
>>
>>
>> What could be done to improve this? I can't think of any absolute 
>> fixes but an improvement would be to extend the ValueLayout so that 
>> you have a NumberLayout and a PointerLayout. You could then use 
>> instanceof to presumably filter things faster and more cheaply so 
>> that the mentioned else-if chain does not need to check for a pointer 
>> layout. The PointerLayout specific checks could be moved to its own 
>> static method. It's a small change, but it's presumably an 
>> improvement even if small.
>>
>>
>> Unfortunately I can't do this myself because of sealed types so here 
>> I am.
>>
>>
>> Another thing that needs optimizing is the memory allocation waste of 
>> getting an attribute. Every call to attribute(string name) allocated 
>> a new Optional instance which was often times used by my abstraction 
>> for a check and then immediately discarded. I wanted to do a bunch of 
>> layout checks to make sure that the MemoryLayout is valid, but after 
>> viewing the amount of garbage being generated standing out like a 
>> sore thumb, I decided to remove those checks(they are really 
>> important too). The amount of memory wasted wasn't worth it. The 
>> answer to this is presumably going to be value types, but it isn't 
>> clear when it's going to be delivered.
>>
>>
>> Once again, if MemoryLayout and its extensions weren't sealed I could 
>> do things to improve both performance and memory waste as well as fix 
>> the other issue like attributes being factored into equality checks 
>> when it isn't wanted. Yes, I realize I'm beating a dead horse at this 
>> point but that dead horse is still causing issues.
>>
>>
>> Could the suggested ValueLayout changes be done, at the very least? 
>> Or maybe somekind of equals() performance optimizations or something?
>>
>>
>>
>>
>>
>>
>>
>>
>>


More information about the panama-dev mailing list