JVM crash by creating VarHandle
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Wed Feb 5 09:30:50 UTC 2020
On 04/02/2020 06:26, John Rose wrote:
> This seems to be the classic, dangerous confusion between
> struct X and struct X* (nvmlProcessInfo_t and nvmlProcessInfo_t*),
> as applied to arrays. Am I right in assuming that if this code
> were written in C there would be enough C type information to
> catch the mismatch?
What the code was doing was the equivalent of this:
#include <stdlib.h>
#include <stdio.h>
typedef struct {
int x;
} Foo;
void consumeFoos(Foo *foos, int length) {
for (int i = 0 ; i < length ; i++) {
printf("foo::x = %d\n", foos[i].x);
}
}
int main(void) {
Foo **foo = malloc(sizeof(size_t) * 5);
Foo val = { 42 };
for (int i = 0 ; i < 5 ; i++) {
foo[i] = &val;
}
consumeFoos(foo, 5);
return 0;
}
e.g. it was allocating (with malloc) a slab of memory to hold N pointers
(hence Foo**), then stick N pointers to some structs into it - then
passing the thing into a function expecting Foo*. Can I do this in C?
Yes, with warnings (-Wincompatible-pointer-types) mind you, not errors
(because there are cases where a super user might deem it ok). So I can
execute it and get garbage out.
Should we make types tighter so that this kind of situation cannot
arise? Of course that would be desirable - but with the experience with
the past API we have learned that:
* the cost for designing an API which makes a _sound_ use of generics to
carry around type constraints on pointers is _very_ high (where by
_sound_ here I mean that the generic type actually reflects what's in
the layout, e.g. it's not a secondary slide-show which can just be
casted away - in Java - as needed). The same developers who, on this
list, now complains about the lack of an higher level API, back then
complained about the fact that the API we had was hard to use
* almost immediately we figured out that, w/o an escape hatch (cast to
Pointer<Void>) the strict API would be useless - as there are things you
can do in C (even w/o warnings) that would just be impossible to do in
Java. As a consequence, I have seen i several places a (genuine) need to
go from A* to B*, which then resulted (since the API prevented that) in
a round-trip through void* (e.g. A* -> void* -> B*). This made tasks,
such as implementing heterogeneous buffers, very hard to do.
In other words, even if jextract had generated higher level bindings,
with tight pointer types - perhaps the user, confused by an
incompatibility between a Pointer<Foo> and Pointer<Pointer<Foo>> could
still have made (and often _did_ made) the wrong decision - e.g. decide
to Pointer::cast all the way to the function call, or take the first
element of the **foo and pass it to the function expecting a *foo.
All this to say that, while static safety is desirable, using a native
library from Java remains an "advanced" task - we made this task easier
by not requiring the user to write some JNI goop to call the desired
library. But still, most library out there do have some peculiar ways in
which they want to be used, and no amount of static checking can save
developers. I recall, earlier on this list [1], a port of libusb using
the old Panama (hence fully type-safe) API which was generating a
similar surprising segfault. Again, after few hours of investigation,
the problem was narrowed down to the fact that the Panama program was
not calling the library in the "idiomatic" way, that is by calling the
libusb_init function before everything else.
So, ultimately, I consider static type-safety as one of the (many!)
dimensions of the design space we're exploring and optimizing for,
rather than an absolute constraint. I'd be far more willing to consider
an option providing some _dynamic_ safety (after all, if you put a
MemoryAddress and a layout together, you will be able to detect most of
these mismatches) - but in the past, I had the feeling that the appetite
for this kind of safety was rather low.
So, let me rephrase: let's imagine a world where the bindings we
generated were _dynamically_ safe - that is, the program Ty submitted in
this thread (or its equivalent using an hypothetical new API) would have
compiled, but would have failed at runtime with some informative
exception, rather than just randomly crashing - would that be perceived
as a viable compromise? Note this still does nothing for the libusb kind
of scenario - so _at some point_ the Java developer using a native
library will have to face the complexity of the library he/she is trying
to use. Would that be enough?
Maurizio
[1] -
https://mail.openjdk.java.net/pipermail/panama-dev/2019-December/006842.html
More information about the panama-dev
mailing list