JVM crash by creating VarHandle
Ty Young
youngty1997 at gmail.com
Sun Feb 2 01:51:42 UTC 2020
On 2/1/20 6:59 PM, Maurizio Cimadamore wrote:
>
> Without having tried it (as I don't have access to adequate HW to try
> it on right now), I was eyeballing at the test code and spotted
> something. Look at these two calls:
>
>> System.out.println(nvml_h.nvmlDeviceGetHandleByIndex(0, gpuPointer));
>>
>>
>> // probe for the number of running GPU processes. Note: can be
>> INSUFFICIENT_SIZE_ERROR when probing.
>>
>> System.out.println(nvml_h.nvmlDeviceGetGraphicsRunningProcesses(gpuPointer,
>> intPointer, MemoryArray.ofNull()));
>> // fill the GPU reference pointer
>>
> If I look at the documentation for these two functions, I see
>
>
> nvmlReturn_t nvmlDeviceGetHandleByIndex ( unsigned int index,
> nvmlDevice_t* device )
>
>
>
> nvmlReturn_t nvmlDeviceGetGraphicsRunningProcesses ( nvmlDevice_t
> device, unsigned int* infoCount, nvmlProcessInfo_t* infos )
>
>
> So, the first is accepting a pointer to pointer (as I assume
> nvmlDevice_t is just an opaque pointer to struct). The second is just
> taking a nvmlDevice_t directly.
>
> But you are calling both bindings with the same "gpuPointer" parameter
> - surely one of the two calls is wrong? E.g. shouldn't the second call
> use the MemoryAddress _value_ that the first call has written into
> gpuPointer?
>
> The subsequent call to "nvmlDeviceGetGraphicsRunningProcesses" seems
> to suffer from the same problem.
>
> Now, I don't know what your binding functions do under the hood - e.g.
> if they apply some sort of automatic getValue() every time you pass a
> MemoryValue (probably you do looking at this **)- but from the looks
> of it, the code in the test seems to have some issues.
>
Yes, the same gpuPointer is used for every function and getValue() is
called because what actually matters is the MemoryAddress stored in the
MemoryValue(hence MemoryValue<MemoryAddress>).
In the old Pointer API this looked like:
Pointer<Pointer<nvmlDevice_t>>
where you would call get() on the instance to get Pointer<nvmlDevice_t>
which is then used to pass to functions.
Since nvmlDevice_t is opaque it only matters in a type safety sense. I
need to create a struct wrapper for it.
Anyway, I've been testing the bindings as I've (mind numbingly) added
the functions. They work fine and return an enum value of success with
proper pointer values. In other words, when asking the default power
limit I get "21500" just as I did with the older Pointer API.
> (**)
>
> I don't get how you are doing bindings - e.g. you have Java methods
> accepting same arguments (MemoryValue<MemoryAddress>) doing subtly
> different things under the hood (because the model native function
> with _different_ signature):
>
> https://github.com/BlueGoliath/GoliathBindings/blob/master/modules/org.goliath.bindings.nvml/src/main/java/org/goliath/bindings/nvml/functions/nvmlDeviceGetGraphicsRunningProcesses.java
>
> https://github.com/BlueGoliath/GoliathBindings/blob/master/modules/org.goliath.bindings.nvml/src/main/java/org/goliath/bindings/nvml/functions/nvmlDeviceGetHandleByIndex.java
>
> This probably means that, in the end, the test code is _probably_
> correct - but there is a lot of stuff here which can cause trouble.
>
I'm not entirely sure what could be done differently but If you have
suggestions then I'd be glad to hear it. The thing to keep in mind with
NVML is that it's backwards and cross-platform compatible so once things
are defined there isn't anything to really worry about later.
In hindsight the NativeFunction implementations shouldn't force the use
of higher level abstractions - that should be the job of nvml_h.java as
it's what enforces type safety to begin with.
> Maurizio
>
> On 01/02/2020 23:45, Maurizio Cimadamore wrote:
>> I guess what I was trying to assess was as to whether a straight call
>> to MemoryHandles.varhandle would crash with exception. It seems
>> that's not the case (in fact all our tests pass, and I tried several
>> example just now, which work).
>>
>> I will take a look at the links you provided - given this involves
>> interacting with a native library, there's an actual possibility that
>> some _previous_ operation ended up corrupting the VM memory state,
>> which then shows up in some weird form.
>>
>> Maurizio
>>
>>
>> On 01/02/2020 23:26, Ty Young wrote:
>>>
>>> On 2/1/20 5:09 PM, Maurizio Cimadamore wrote:
>>>>
>>>> On 01/02/2020 23:01, Ty Young wrote:
>>>>>
>>>>> On 2/1/20 4:49 PM, Maurizio Cimadamore wrote:
>>>>>>
>>>>>> On 01/02/2020 12:31, Ty Young wrote:
>>>>>>> MemoryHandles.varHandle(long.class, ByteOrder.nativeOrder());
>>>>>> Hi Ty,
>>>>>> thanks for reaching out - I assume you are on the
>>>>>> foreign-jextract branch?
>>>>>
>>>>>
>>>>> Yep. The build is right after the name() method was added to
>>>>> SystemABI.
>>>>>
>>>>>
>>>>>> And also, is the above snippet enough to reproduce the crash for
>>>>>> you? Or does it only happen sometimes, but not others?
>>>>>
>>>>>
>>>>> It seems to only happen when accessing the struct field from a
>>>>> struct that resides in memory. Never outside of an array.
>>>>>
>>>>>
>>>>> Here is a Github Gist, if it helps any:
>>>>> https://gist.github.com/BlueGoliath/307f60856afee04e218b759420a53fb7
>>>>
>>>> The stack trace you showed seems to point at a constructor somewhere:
>>>>
>>>> org.goliath.crosspoint.fields.NumberField.<init>
>>>>
>>>> so, the crash seems to happen outside of memory access - e.g. it
>>>> happens on actual var handle creation. So I assume that the thing
>>>> which actually fails is this?
>>>>
>>>> https://github.com/BlueGoliath/Crosspoint/blob/master/src/main/java/org/goliath/crosspoint/fields/NumberField.java#L29
>>>>
>>>
>>>
>>> Yep.
>>>
>>>
>>>>
>>>> Is there a repeatable way to call the NumberField constructor which
>>>> will lead to the crash?
>>>
>>>
>>> Got a Linux machine that has an Nvidia GPU? I've pushed local code
>>> to Github. The bindings I'm working on right now are here:
>>>
>>>
>>> https://github.com/BlueGoliath/GoliathBindings
>>>
>>>
>>> I have a test class for testing the bindings within it, which shows
>>> the problem:
>>>
>>>
>>> https://github.com/BlueGoliath/GoliathBindings/blob/master/modules/org.goliath.bindings.nvml/src/main/java/org/goliath/bindings/nvml/main/Test.java
>>>
>>>
>>>
>>> You can get the bindings to work on Windows(since NVML is
>>> cross-platform) but you'll have to add the library to the path and
>>> change the name string in nvml_h.java. I think it's just nvml.dll in
>>> Windows.
>>>
>>>
>>>>
>>>> Maurizio
>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Maurizio
>>>>>>
More information about the panama-dev
mailing list