Random values from NVML functions

Ty Young youngty1997 at gmail.com
Thu May 14 00:21:59 UTC 2020


On 5/13/20 7:01 PM, Maurizio Cimadamore wrote:
>
> Btw - nice-looking app! (I looked at the pic :-) )
>

Thanks!


> If I understand correctly, the place where you are getting garbage 
> values out of is this:
>
> https://github.com/BlueGoliath/java-nvidia-bindings/blob/master/modules/org.goliath.bindings.nvml/src/main/java/org/goliath/bindings/nvml/main/nvml_h.java#L426
>
> More specifically, after the call, the array of 
> nvmlProcessUtilizationSample_t doesn't contain what you think it 
> should contain. Am I correct?
>

Right, almost as if the memory isn't being sliced correctly. Although, 
I'm not sure how incorrectly sliced memory, if zero'd, would give those 
numbers to begin with.


> Can I see the client code which calls this function, so that I can 
> take a look at all the pieces?
>

Of course:


https://github.com/BlueGoliath/GoliathEnviousNative/blob/master/modules/org.goliath.envious.nvml/src/main/java/org/goliath/envious/nvml/local/attributes/NVMLGPUProcessAttributeData.java


Be warned though, the code isn't as pretty as the GUI.


> Thanks
> Maurizio
>
> On 14/05/2020 00:55, Ty Young wrote:
>>
>> On 5/13/20 6:38 PM, Maurizio Cimadamore wrote:
>>> Hi,
>>> is this a regression? E.g. did this work before and now it started 
>>> behave differently all of a sudden (e.g. after a rebuild on panama) 
>>> or is this a new function you are trying to call and you are getting 
>>> an odd behavior?
>>
>>
>> Not sure.
>>
>>
>> After converting everything to FMA from pointer it started giving me 
>> 0 for everything where the Pointer API would give me seemingly 
>> correct non-zero values the majority of the time, but would sometimes 
>> give random garbage. Because the old Pointer API never zero'd memory 
>> I have no idea if those values were valid or not, so I didn't think 
>> much of always getting 0.
>>
>>
>> Yesterday I did some cleanups in the OO code(layer under JavaFX), 
>> including converting NativeValue<Integer> instances to 
>> NativeInteger(same for longlong) and it started doing this, which I 
>> think is partially correct: if I start a GPU benchmarking 
>> application(Unigine Superposition) and view the processes content in 
>> the GUI, I do see seemingly correct utilization rates that match 
>> in-app On-Screen-Display FPS.
>>
>>
>> The issue is with Memory Utilization and Video encoder/decoder 
>> Utilization.
>>
>>
>>>
>>> Maurizio
>>>
>>> On 14/05/2020 00:00, Ty Young wrote:
>>>> Hi,
>>>>
>>>>
>>>> Currently I'm getting random values[1] from this NVML function[2]. 
>>>> I've spent a few hours dumping sizes and re-checking my abstraction 
>>>> layer code in order to figure out why it's doing this but am not 
>>>> seeing anything. I'm wondering if there ware any recent bug fixes 
>>>> in FMA that might cause this that were fixed. If not I'm going to 
>>>> have to try asking on the Nvidia forums.
>>>>
>>>>
>>>> For reference, the function binding can be found here:
>>>>
>>>>
>>>> https://github.com/BlueGoliath/java-nvidia-bindings/blob/master/modules/org.goliath.bindings.nvml/src/main/java/org/goliath/bindings/nvml/main/nvml_h.java#L426 
>>>>
>>>>
>>>>
>>>> and the abstraction layer here:
>>>>
>>>>
>>>> https://github.com/BlueGoliath/Crosspoint/tree/master/src/main/java/org/goliath/crosspoint 
>>>>
>>>>
>>>>
>>>> I'm able to read/write other structs just fine, such as:
>>>>
>>>>
>>>> https://github.com/BlueGoliath/java-nvidia-bindings/blob/master/modules/org.goliath.bindings.nvctrl/src/main/java/org/goliath/bindings/nvctrl/structs/NVCTRLAttributeValidValuesRec.java 
>>>>
>>>>
>>>>
>>>> and again, all byte sizes seem correct(48 bytes for the NVML 
>>>> struct), so I'm really lost here.
>>>>
>>>>
>>>>
>>>> [1] https://imgur.com/a/wrQtOXq
>>>>
>>>> [2] 
>>>> https://docs.nvidia.com/deploy/nvml-api/group__nvmlGridQueries.html#group__nvmlGridQueries_1gb0ea5236f5e69e63bf53684a11c233bd
>>>>


More information about the panama-dev mailing list