Random values from NVML functions
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu May 14 00:01:10 UTC 2020
Btw - nice-looking app! (I looked at the pic :-) )
If I understand correctly, the place where you are getting garbage
values out of is this:
https://github.com/BlueGoliath/java-nvidia-bindings/blob/master/modules/org.goliath.bindings.nvml/src/main/java/org/goliath/bindings/nvml/main/nvml_h.java#L426
More specifically, after the call, the array of
nvmlProcessUtilizationSample_t doesn't contain what you think it should
contain. Am I correct?
Can I see the client code which calls this function, so that I can take
a look at all the pieces?
Thanks
Maurizio
On 14/05/2020 00:55, Ty Young wrote:
>
> On 5/13/20 6:38 PM, Maurizio Cimadamore wrote:
>> Hi,
>> is this a regression? E.g. did this work before and now it started
>> behave differently all of a sudden (e.g. after a rebuild on panama)
>> or is this a new function you are trying to call and you are getting
>> an odd behavior?
>
>
> Not sure.
>
>
> After converting everything to FMA from pointer it started giving me 0
> for everything where the Pointer API would give me seemingly correct
> non-zero values the majority of the time, but would sometimes give
> random garbage. Because the old Pointer API never zero'd memory I have
> no idea if those values were valid or not, so I didn't think much of
> always getting 0.
>
>
> Yesterday I did some cleanups in the OO code(layer under JavaFX),
> including converting NativeValue<Integer> instances to
> NativeInteger(same for longlong) and it started doing this, which I
> think is partially correct: if I start a GPU benchmarking
> application(Unigine Superposition) and view the processes content in
> the GUI, I do see seemingly correct utilization rates that match
> in-app On-Screen-Display FPS.
>
>
> The issue is with Memory Utilization and Video encoder/decoder
> Utilization.
>
>
>>
>> Maurizio
>>
>> On 14/05/2020 00:00, Ty Young wrote:
>>> Hi,
>>>
>>>
>>> Currently I'm getting random values[1] from this NVML function[2].
>>> I've spent a few hours dumping sizes and re-checking my abstraction
>>> layer code in order to figure out why it's doing this but am not
>>> seeing anything. I'm wondering if there ware any recent bug fixes in
>>> FMA that might cause this that were fixed. If not I'm going to have
>>> to try asking on the Nvidia forums.
>>>
>>>
>>> For reference, the function binding can be found here:
>>>
>>>
>>> https://github.com/BlueGoliath/java-nvidia-bindings/blob/master/modules/org.goliath.bindings.nvml/src/main/java/org/goliath/bindings/nvml/main/nvml_h.java#L426
>>>
>>>
>>>
>>> and the abstraction layer here:
>>>
>>>
>>> https://github.com/BlueGoliath/Crosspoint/tree/master/src/main/java/org/goliath/crosspoint
>>>
>>>
>>>
>>> I'm able to read/write other structs just fine, such as:
>>>
>>>
>>> https://github.com/BlueGoliath/java-nvidia-bindings/blob/master/modules/org.goliath.bindings.nvctrl/src/main/java/org/goliath/bindings/nvctrl/structs/NVCTRLAttributeValidValuesRec.java
>>>
>>>
>>>
>>> and again, all byte sizes seem correct(48 bytes for the NVML
>>> struct), so I'm really lost here.
>>>
>>>
>>>
>>> [1] https://imgur.com/a/wrQtOXq
>>>
>>> [2]
>>> https://docs.nvidia.com/deploy/nvml-api/group__nvmlGridQueries.html#group__nvmlGridQueries_1gb0ea5236f5e69e63bf53684a11c233bd
>>>
More information about the panama-dev
mailing list