JVM crash by creating VarHandle

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Sun Feb 2 17:59:34 UTC 2020


I managed to test remotely using a machine with an nvidia GPU. I didn't 
get any VM crash, but I did get an issue with a misaligned access in the 
instruction where you get the segfault:

SUCCESS
SUCCESS
INSUFFICIENT_SIZE_ERROR
SUCCESS
Exception in thread "main" java.lang.IllegalStateException: Misaligned 
access at address: 140131897994114
     at 
java.base/java.lang.invoke.VarHandleMemoryAddressBase.newIllegalStateExceptionForMisalignedAccess(VarHandleMemoryAddressBase.java:54)
     at 
java.base/java.lang.invoke.VarHandleMemoryAddressAsInts.offsetNoVMAlignCheck(VarHandleMemoryAddressAsInts.java:69)
     at 
java.base/java.lang.invoke.VarHandleMemoryAddressAsInts.get0(VarHandleMemoryAddressAsInts.java:79)
     at 
java.base/java.lang.invoke.VarHandleMemoryAddressAsInts0/0x0000000800baf440.get(Unknown 
Source)
     at 
java.base/java.lang.invoke.VarHandleGuards.guard_L_L(VarHandleGuards.java:41)
     at 
org.goliath.crosspoint.fields.NumberField.getValue(NumberField.java:57)
     at 
org.goliath.crosspoint.fields.NumberField.getValue(NumberField.java:14)
     at org.goliath.bindings.nvml.main.Test.main(Test.java:45)

Seems like the nvidia lib always puts an address inside the struct of 
pointer which doesn't seem to be 4-byte aligned, so reading an int out 
of it fails.

That said, I'm not sure I fully understand how your crosspoint machinery 
is supposed to work - I'm seeing a bunch of struct creation, even before 
the nvidia routine is called to fill in the array, which is odd, given 
that the code is not really allocating any struct. Specifically, this:

         MemoryArray<nvmlProcessInfo_t> structArray = new 
nvmlProcessInfo_t().toArray(intPointer.getValue());

Creates allocates two off heap structs - one is allocated by 
nvmlProcessInfo_t - then another is created in the toArray() call. Which 
seems completely odd given that the array is supposed to be empty and 
filled by the nvidia routine?

In any case, the crash doesn't happen on my machine - so I suspect that 
we'll have to keep an eye out for that issue in case we see some other 
example which ends up with same crash. It would be useful if you could 
convert your test not to use the crosspoint library and see if that 
still has the crash. This should not be super hard given that there 
aren't many calls in there - and the static wrappers generated by 
jextract should be good enough to run that test?

Maurizio

On 02/02/2020 07:59, Maurizio Cimadamore wrote:
>
> On 02/02/2020 01:51, Ty Young wrote:
>> I'm not entirely sure what could be done differently but If you have 
>> suggestions then I'd be glad to hear it. The thing to keep in mind 
>> with NVML is that it's backwards and cross-platform compatible so 
>> once things are defined there isn't anything to really worry about 
>> later.
>>
>>
>> In hindsight the NativeFunction implementations shouldn't force the 
>> use of higher level abstractions - that should be the job of 
>> nvml_h.java as it's what enforces type safety to begin with. 
>
> I wasn't suggesting you should change the API - just that there are 
> many layers between the code you see in Test and the actual method 
> handle, var handle calls - which makes it harder to diagnose.
>
> Re-reading the stack trace in the crash, it seems to be a problem 
> related with classfile parsing, potentially of one of the synthetic 
> VarHandle classes which we spin on the fly. I'll do more analysis next 
> week.
>
> In the meantime it would be helpful to understand if the crash started 
> to appear when you updated the Panama repository (which might suggest 
> some relationship with recent commits, such as the one for adding 
> VarHandle adapter support), or if it's a failure that you encountered 
> writing a new test.
>
> Maurizio
>


More information about the panama-dev mailing list