JVM crash by creating VarHandle
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Sun Feb 2 00:59:03 UTC 2020
Without having tried it (as I don't have access to adequate HW to try it
on right now), I was eyeballing at the test code and spotted something.
Look at these two calls:
> System.out.println(nvml_h.nvmlDeviceGetHandleByIndex(0, gpuPointer));
>
>
> // probe for the number of running GPU processes. Note: can be
> INSUFFICIENT_SIZE_ERROR when probing.
>
> System.out.println(nvml_h.nvmlDeviceGetGraphicsRunningProcesses(gpuPointer,
> intPointer, MemoryArray.ofNull()));
> // fill the GPU reference pointer
>
If I look at the documentation for these two functions, I see
nvmlReturn_t nvmlDeviceGetHandleByIndex ( unsigned int index,
nvmlDevice_t* device )
nvmlReturn_t nvmlDeviceGetGraphicsRunningProcesses ( nvmlDevice_t
device, unsigned int* infoCount, nvmlProcessInfo_t* infos )
So, the first is accepting a pointer to pointer (as I assume
nvmlDevice_t is just an opaque pointer to struct). The second is just
taking a nvmlDevice_t directly.
But you are calling both bindings with the same "gpuPointer" parameter -
surely one of the two calls is wrong? E.g. shouldn't the second call use
the MemoryAddress _value_ that the first call has written into gpuPointer?
The subsequent call to "nvmlDeviceGetGraphicsRunningProcesses" seems to
suffer from the same problem.
Now, I don't know what your binding functions do under the hood - e.g.
if they apply some sort of automatic getValue() every time you pass a
MemoryValue (probably you do looking at this **)- but from the looks of
it, the code in the test seems to have some issues.
(**)
I don't get how you are doing bindings - e.g. you have Java methods
accepting same arguments (MemoryValue<MemoryAddress>) doing subtly
different things under the hood (because the model native function with
_different_ signature):
https://github.com/BlueGoliath/GoliathBindings/blob/master/modules/org.goliath.bindings.nvml/src/main/java/org/goliath/bindings/nvml/functions/nvmlDeviceGetGraphicsRunningProcesses.java
https://github.com/BlueGoliath/GoliathBindings/blob/master/modules/org.goliath.bindings.nvml/src/main/java/org/goliath/bindings/nvml/functions/nvmlDeviceGetHandleByIndex.java
This probably means that, in the end, the test code is _probably_
correct - but there is a lot of stuff here which can cause trouble.
Maurizio
On 01/02/2020 23:45, Maurizio Cimadamore wrote:
> I guess what I was trying to assess was as to whether a straight call
> to MemoryHandles.varhandle would crash with exception. It seems that's
> not the case (in fact all our tests pass, and I tried several example
> just now, which work).
>
> I will take a look at the links you provided - given this involves
> interacting with a native library, there's an actual possibility that
> some _previous_ operation ended up corrupting the VM memory state,
> which then shows up in some weird form.
>
> Maurizio
>
>
> On 01/02/2020 23:26, Ty Young wrote:
>>
>> On 2/1/20 5:09 PM, Maurizio Cimadamore wrote:
>>>
>>> On 01/02/2020 23:01, Ty Young wrote:
>>>>
>>>> On 2/1/20 4:49 PM, Maurizio Cimadamore wrote:
>>>>>
>>>>> On 01/02/2020 12:31, Ty Young wrote:
>>>>>> MemoryHandles.varHandle(long.class, ByteOrder.nativeOrder());
>>>>> Hi Ty,
>>>>> thanks for reaching out - I assume you are on the foreign-jextract
>>>>> branch?
>>>>
>>>>
>>>> Yep. The build is right after the name() method was added to
>>>> SystemABI.
>>>>
>>>>
>>>>> And also, is the above snippet enough to reproduce the crash for
>>>>> you? Or does it only happen sometimes, but not others?
>>>>
>>>>
>>>> It seems to only happen when accessing the struct field from a
>>>> struct that resides in memory. Never outside of an array.
>>>>
>>>>
>>>> Here is a Github Gist, if it helps any:
>>>> https://gist.github.com/BlueGoliath/307f60856afee04e218b759420a53fb7
>>>
>>> The stack trace you showed seems to point at a constructor somewhere:
>>>
>>> org.goliath.crosspoint.fields.NumberField.<init>
>>>
>>> so, the crash seems to happen outside of memory access - e.g. it
>>> happens on actual var handle creation. So I assume that the thing
>>> which actually fails is this?
>>>
>>> https://github.com/BlueGoliath/Crosspoint/blob/master/src/main/java/org/goliath/crosspoint/fields/NumberField.java#L29
>>>
>>
>>
>> Yep.
>>
>>
>>>
>>> Is there a repeatable way to call the NumberField constructor which
>>> will lead to the crash?
>>
>>
>> Got a Linux machine that has an Nvidia GPU? I've pushed local code to
>> Github. The bindings I'm working on right now are here:
>>
>>
>> https://github.com/BlueGoliath/GoliathBindings
>>
>>
>> I have a test class for testing the bindings within it, which shows
>> the problem:
>>
>>
>> https://github.com/BlueGoliath/GoliathBindings/blob/master/modules/org.goliath.bindings.nvml/src/main/java/org/goliath/bindings/nvml/main/Test.java
>>
>>
>>
>> You can get the bindings to work on Windows(since NVML is
>> cross-platform) but you'll have to add the library to the path and
>> change the name string in nvml_h.java. I think it's just nvml.dll in
>> Windows.
>>
>>
>>>
>>> Maurizio
>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Thanks
>>>>> Maurizio
>>>>>
More information about the panama-dev
mailing list