JVM crash by creating VarHandle

Wed Feb 5 09:30:50 UTC 2020

On 04/02/2020 06:26, John Rose wrote:
> This seems to be the classic, dangerous confusion between
> struct X and struct X* (nvmlProcessInfo_t and nvmlProcessInfo_t*),
> as applied to arrays.  Am I right in assuming that if this code
> were written in C there would be enough C type information to
> catch the mismatch?

What the code was doing was the equivalent of this:

#include <stdlib.h>
#include <stdio.h>

typedef struct {
    int x;
} Foo;

void consumeFoos(Foo *foos, int length) {
    for (int i = 0 ; i < length ; i++) {
        printf("foo::x = %d\n", foos[i].x);
    }
}

int main(void) {
    Foo **foo = malloc(sizeof(size_t) * 5);
    Foo val = { 42 };
    for (int i = 0 ; i < 5 ; i++) {
       foo[i] = &val;
    }
    consumeFoos(foo, 5);
    return 0;
}

e.g. it was allocating (with malloc) a slab of memory to hold N pointers 
(hence Foo**), then stick N pointers to some structs into it - then 
passing the thing into a function expecting Foo*. Can I do this in C? 
Yes, with warnings (-Wincompatible-pointer-types) mind you, not errors 
(because there are cases where a super user might deem it ok). So I can 
execute it and get garbage out.

Should we make types tighter so that this kind of situation cannot 
arise? Of course that would be desirable - but with the experience with 
the past API we have learned that:

* the cost for designing an API which makes a _sound_ use of generics to 
carry around type constraints on pointers is _very_ high (where by 
_sound_ here I mean that the generic type actually reflects what's in 
the layout, e.g. it's not a secondary slide-show which can just be 
casted away - in Java - as needed). The same developers who, on this 
list, now complains about the lack of an higher level API, back then 
complained about the fact that the API we had was hard to use

* almost immediately we figured out that, w/o an escape hatch (cast to 
Pointer<Void>) the strict API would be useless - as there are things you 
can do in C (even w/o warnings) that would just be impossible to do in 
Java. As a consequence, I have seen i several places a (genuine) need to 
go from A* to B*, which then resulted (since the API prevented that) in 
a round-trip through void* (e.g. A* -> void* -> B*). This made tasks, 
such as implementing heterogeneous buffers, very hard to do.

In other words, even if jextract had generated higher level bindings, 
with tight pointer types - perhaps the user, confused by an 
incompatibility between a Pointer<Foo> and Pointer<Pointer<Foo>> could 
still have made (and often _did_ made) the wrong decision - e.g. decide 
to Pointer::cast all the way to the function call, or take the first 
element of the **foo and pass it to the function expecting a *foo.

All this to say that, while static safety is desirable, using a native 
library from Java remains an "advanced" task - we made this task easier 
by not requiring the user to write some JNI goop to call the desired 
library. But still, most library out there do have some peculiar ways in 
which they want to be used, and no amount of static checking can save 
developers. I recall, earlier on this list [1], a port of libusb using 
the old Panama (hence fully type-safe) API which was generating a 
similar surprising segfault. Again, after few hours of investigation, 
the problem was narrowed down to the fact that the Panama program was 
not calling the library in the "idiomatic" way, that is by calling the 
libusb_init function before everything else.

So, ultimately, I consider static type-safety as one of the (many!) 
dimensions of the design space we're exploring and optimizing for, 
rather than an absolute constraint. I'd be far more willing to consider 
an option providing some _dynamic_ safety (after all, if you put a 
MemoryAddress and a layout together, you will be able to detect most of 
these mismatches) - but in the past, I had the feeling that the appetite 
for this kind of safety was rather low.

So, let me rephrase: let's imagine a world where the bindings we 
generated were _dynamically_ safe - that is, the program Ty submitted in 
this thread (or its equivalent using an hypothetical new API) would have 
compiled, but would have failed at runtime with some informative 
exception, rather than just randomly crashing - would that be perceived 
as a viable compromise? Note this still does nothing for the libusb kind 
of scenario - so _at some point_ the Java developer using a native 
library will have to face the complexity of the library he/she is trying 
to use. Would that be enough?

Maurizio

[1] - 
https://mail.openjdk.java.net/pipermail/panama-dev/2019-December/006842.html