Random values from NVML functions

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Thu May 14 00:01:10 UTC 2020


Btw - nice-looking app! (I looked at the pic :-) )

If I understand correctly, the place where you are getting garbage 
values out of is this:

https://github.com/BlueGoliath/java-nvidia-bindings/blob/master/modules/org.goliath.bindings.nvml/src/main/java/org/goliath/bindings/nvml/main/nvml_h.java#L426

More specifically, after the call, the array of 
nvmlProcessUtilizationSample_t doesn't contain what you think it should 
contain. Am I correct?

Can I see the client code which calls this function, so that I can take 
a look at all the pieces?

Thanks
Maurizio

On 14/05/2020 00:55, Ty Young wrote:
>
> On 5/13/20 6:38 PM, Maurizio Cimadamore wrote:
>> Hi,
>> is this a regression? E.g. did this work before and now it started 
>> behave differently all of a sudden (e.g. after a rebuild on panama) 
>> or is this a new function you are trying to call and you are getting 
>> an odd behavior?
>
>
> Not sure.
>
>
> After converting everything to FMA from pointer it started giving me 0 
> for everything where the Pointer API would give me seemingly correct 
> non-zero values the majority of the time, but would sometimes give 
> random garbage. Because the old Pointer API never zero'd memory I have 
> no idea if those values were valid or not, so I didn't think much of 
> always getting 0.
>
>
> Yesterday I did some cleanups in the OO code(layer under JavaFX), 
> including converting NativeValue<Integer> instances to 
> NativeInteger(same for longlong) and it started doing this, which I 
> think is partially correct: if I start a GPU benchmarking 
> application(Unigine Superposition) and view the processes content in 
> the GUI, I do see seemingly correct utilization rates that match 
> in-app On-Screen-Display FPS.
>
>
> The issue is with Memory Utilization and Video encoder/decoder 
> Utilization.
>
>
>>
>> Maurizio
>>
>> On 14/05/2020 00:00, Ty Young wrote:
>>> Hi,
>>>
>>>
>>> Currently I'm getting random values[1] from this NVML function[2]. 
>>> I've spent a few hours dumping sizes and re-checking my abstraction 
>>> layer code in order to figure out why it's doing this but am not 
>>> seeing anything. I'm wondering if there ware any recent bug fixes in 
>>> FMA that might cause this that were fixed. If not I'm going to have 
>>> to try asking on the Nvidia forums.
>>>
>>>
>>> For reference, the function binding can be found here:
>>>
>>>
>>> https://github.com/BlueGoliath/java-nvidia-bindings/blob/master/modules/org.goliath.bindings.nvml/src/main/java/org/goliath/bindings/nvml/main/nvml_h.java#L426 
>>>
>>>
>>>
>>> and the abstraction layer here:
>>>
>>>
>>> https://github.com/BlueGoliath/Crosspoint/tree/master/src/main/java/org/goliath/crosspoint 
>>>
>>>
>>>
>>> I'm able to read/write other structs just fine, such as:
>>>
>>>
>>> https://github.com/BlueGoliath/java-nvidia-bindings/blob/master/modules/org.goliath.bindings.nvctrl/src/main/java/org/goliath/bindings/nvctrl/structs/NVCTRLAttributeValidValuesRec.java 
>>>
>>>
>>>
>>> and again, all byte sizes seem correct(48 bytes for the NVML 
>>> struct), so I'm really lost here.
>>>
>>>
>>>
>>> [1] https://imgur.com/a/wrQtOXq
>>>
>>> [2] 
>>> https://docs.nvidia.com/deploy/nvml-api/group__nvmlGridQueries.html#group__nvmlGridQueries_1gb0ea5236f5e69e63bf53684a11c233bd
>>>


More information about the panama-dev mailing list