JVM crash by creating VarHandle

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Sun Feb 2 00:59:03 UTC 2020


Without having tried it (as I don't have access to adequate HW to try it 
on right now), I was eyeballing at the test code and spotted something. 
Look at these two calls:

> System.out.println(nvml_h.nvmlDeviceGetHandleByIndex(0, gpuPointer));
>
>
> // probe for the number of running GPU processes. Note: can be 
> INSUFFICIENT_SIZE_ERROR when probing.
>
> System.out.println(nvml_h.nvmlDeviceGetGraphicsRunningProcesses(gpuPointer, 
> intPointer, MemoryArray.ofNull()));
> // fill the GPU reference pointer
>
If I look at the documentation for these two functions, I see


  nvmlReturn_t nvmlDeviceGetHandleByIndex ( unsigned int  index, 
nvmlDevice_t* device )



  nvmlReturn_t nvmlDeviceGetGraphicsRunningProcesses ( nvmlDevice_t 
device, unsigned int* infoCount, nvmlProcessInfo_t* infos )


So, the first is accepting a pointer to pointer (as I assume 
nvmlDevice_t is just an opaque pointer to struct). The second is just 
taking a nvmlDevice_t directly.

But you are calling both bindings with the same "gpuPointer" parameter - 
surely one of the two calls is wrong? E.g. shouldn't the second call use 
the MemoryAddress _value_ that the first call has written into gpuPointer?

The subsequent call to "nvmlDeviceGetGraphicsRunningProcesses" seems to 
suffer from the same problem.

Now, I don't know what your binding functions do under the hood - e.g. 
if they apply some sort of automatic getValue() every time you pass a 
MemoryValue (probably you do looking at this **)- but from the looks of 
it, the code in the test seems to have some issues.

(**)

I don't get how you are doing bindings - e.g. you have Java methods 
accepting same arguments (MemoryValue<MemoryAddress>) doing subtly 
different things under the hood (because the model native function with 
_different_ signature):

https://github.com/BlueGoliath/GoliathBindings/blob/master/modules/org.goliath.bindings.nvml/src/main/java/org/goliath/bindings/nvml/functions/nvmlDeviceGetGraphicsRunningProcesses.java

https://github.com/BlueGoliath/GoliathBindings/blob/master/modules/org.goliath.bindings.nvml/src/main/java/org/goliath/bindings/nvml/functions/nvmlDeviceGetHandleByIndex.java

This probably means that, in the end, the test code is _probably_ 
correct - but there is a lot of stuff here which can cause trouble.

Maurizio

On 01/02/2020 23:45, Maurizio Cimadamore wrote:
> I guess what I was trying to assess was as to whether a straight call 
> to MemoryHandles.varhandle would crash with exception. It seems that's 
> not the case (in fact all our tests pass, and I tried several example 
> just now, which work).
>
> I will take a look at the links you provided - given this involves 
> interacting with a native library, there's an actual possibility that 
> some _previous_ operation ended up corrupting the VM memory state, 
> which then shows up in some weird form.
>
> Maurizio
>
>
> On 01/02/2020 23:26, Ty Young wrote:
>>
>> On 2/1/20 5:09 PM, Maurizio Cimadamore wrote:
>>>
>>> On 01/02/2020 23:01, Ty Young wrote:
>>>>
>>>> On 2/1/20 4:49 PM, Maurizio Cimadamore wrote:
>>>>>
>>>>> On 01/02/2020 12:31, Ty Young wrote:
>>>>>> MemoryHandles.varHandle(long.class, ByteOrder.nativeOrder()); 
>>>>> Hi Ty,
>>>>> thanks for reaching out - I assume you are on the foreign-jextract 
>>>>> branch?
>>>>
>>>>
>>>> Yep. The build is right after the name() method was added to 
>>>> SystemABI.
>>>>
>>>>
>>>>> And also, is the above snippet enough to reproduce the crash for 
>>>>> you? Or does it only happen sometimes, but not others?
>>>>
>>>>
>>>> It seems to only happen when accessing the struct field from a 
>>>> struct that resides in memory. Never outside of an array.
>>>>
>>>>
>>>> Here is a Github Gist, if it helps any: 
>>>> https://gist.github.com/BlueGoliath/307f60856afee04e218b759420a53fb7
>>>
>>> The stack trace you showed seems to point at a constructor somewhere:
>>>
>>>  org.goliath.crosspoint.fields.NumberField.<init>
>>>
>>> so, the crash  seems to happen outside of memory access - e.g. it 
>>> happens on actual var handle creation. So I assume that the thing 
>>> which actually fails is this?
>>>
>>> https://github.com/BlueGoliath/Crosspoint/blob/master/src/main/java/org/goliath/crosspoint/fields/NumberField.java#L29 
>>>
>>
>>
>> Yep.
>>
>>
>>>
>>> Is there a repeatable way to call the NumberField constructor which 
>>> will lead to the crash?
>>
>>
>> Got a Linux machine that has an Nvidia GPU? I've pushed local code to 
>> Github. The bindings I'm working on right now are here:
>>
>>
>> https://github.com/BlueGoliath/GoliathBindings
>>
>>
>> I have a test class for testing the bindings within it, which shows 
>> the problem:
>>
>>
>> https://github.com/BlueGoliath/GoliathBindings/blob/master/modules/org.goliath.bindings.nvml/src/main/java/org/goliath/bindings/nvml/main/Test.java 
>>
>>
>>
>> You can get the bindings to work on Windows(since NVML is 
>> cross-platform) but you'll have to add the library to the path and 
>> change the name string in nvml_h.java. I think it's just nvml.dll in 
>> Windows.
>>
>>
>>>
>>> Maurizio
>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Thanks
>>>>> Maurizio
>>>>>


More information about the panama-dev mailing list