JVM crash by creating VarHandle
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Sun Feb 2 17:59:34 UTC 2020
I managed to test remotely using a machine with an nvidia GPU. I didn't
get any VM crash, but I did get an issue with a misaligned access in the
instruction where you get the segfault:
SUCCESS
SUCCESS
INSUFFICIENT_SIZE_ERROR
SUCCESS
Exception in thread "main" java.lang.IllegalStateException: Misaligned
access at address: 140131897994114
at
java.base/java.lang.invoke.VarHandleMemoryAddressBase.newIllegalStateExceptionForMisalignedAccess(VarHandleMemoryAddressBase.java:54)
at
java.base/java.lang.invoke.VarHandleMemoryAddressAsInts.offsetNoVMAlignCheck(VarHandleMemoryAddressAsInts.java:69)
at
java.base/java.lang.invoke.VarHandleMemoryAddressAsInts.get0(VarHandleMemoryAddressAsInts.java:79)
at
java.base/java.lang.invoke.VarHandleMemoryAddressAsInts0/0x0000000800baf440.get(Unknown
Source)
at
java.base/java.lang.invoke.VarHandleGuards.guard_L_L(VarHandleGuards.java:41)
at
org.goliath.crosspoint.fields.NumberField.getValue(NumberField.java:57)
at
org.goliath.crosspoint.fields.NumberField.getValue(NumberField.java:14)
at org.goliath.bindings.nvml.main.Test.main(Test.java:45)
Seems like the nvidia lib always puts an address inside the struct of
pointer which doesn't seem to be 4-byte aligned, so reading an int out
of it fails.
That said, I'm not sure I fully understand how your crosspoint machinery
is supposed to work - I'm seeing a bunch of struct creation, even before
the nvidia routine is called to fill in the array, which is odd, given
that the code is not really allocating any struct. Specifically, this:
MemoryArray<nvmlProcessInfo_t> structArray = new
nvmlProcessInfo_t().toArray(intPointer.getValue());
Creates allocates two off heap structs - one is allocated by
nvmlProcessInfo_t - then another is created in the toArray() call. Which
seems completely odd given that the array is supposed to be empty and
filled by the nvidia routine?
In any case, the crash doesn't happen on my machine - so I suspect that
we'll have to keep an eye out for that issue in case we see some other
example which ends up with same crash. It would be useful if you could
convert your test not to use the crosspoint library and see if that
still has the crash. This should not be super hard given that there
aren't many calls in there - and the static wrappers generated by
jextract should be good enough to run that test?
Maurizio
On 02/02/2020 07:59, Maurizio Cimadamore wrote:
>
> On 02/02/2020 01:51, Ty Young wrote:
>> I'm not entirely sure what could be done differently but If you have
>> suggestions then I'd be glad to hear it. The thing to keep in mind
>> with NVML is that it's backwards and cross-platform compatible so
>> once things are defined there isn't anything to really worry about
>> later.
>>
>>
>> In hindsight the NativeFunction implementations shouldn't force the
>> use of higher level abstractions - that should be the job of
>> nvml_h.java as it's what enforces type safety to begin with.
>
> I wasn't suggesting you should change the API - just that there are
> many layers between the code you see in Test and the actual method
> handle, var handle calls - which makes it harder to diagnose.
>
> Re-reading the stack trace in the crash, it seems to be a problem
> related with classfile parsing, potentially of one of the synthetic
> VarHandle classes which we spin on the fly. I'll do more analysis next
> week.
>
> In the meantime it would be helpful to understand if the crash started
> to appear when you updated the Panama repository (which might suggest
> some relationship with recent commits, such as the one for adding
> VarHandle adapter support), or if it's a failure that you encountered
> writing a new test.
>
> Maurizio
>
More information about the panama-dev
mailing list