JVM crash by creating VarHandle

Tue Feb 4 06:26:37 UTC 2020

On Feb 2, 2020, at 10:43 AM, Maurizio Cimadamore <maurizio.cimadamore at oracle.com> wrote:
> 
> But, your toArray() function is not just creating a contiguous array of nvmlProcessInfo_t - instead, it seems to create an array of pointers to nvmlProcessInfo_t structs - which is not what the function expects. In this case, at least on my machine, since num_procs is "1", allocating an array of one pointer means allocating 64 bits - but to be able to write one nvmlProcessInfo_t you need at least 128 bits. So the runtime doesn't have sufficient size to do the write.

This seems to be the classic, dangerous confusion between
struct X and struct X* (nvmlProcessInfo_t and nvmlProcessInfo_t*),
as applied to arrays.  Am I right in assuming that if this code
were written in C there would be enough C type information to
catch the mismatch?

Whether or not, it’s clear that a memory array struct X a[]
has a different layout from that of struct X* a[], and it’s the
job of a language that makes the distinction between struct X
and struct X* to keep code from mixing up those layouts.

(The “dangerous” part of the confusion is that the problem
manifests as an undifferentiated core dump of some sort, which
requires lots of people to weigh in about whether it’s their code
that is at fault or someone else’s; it’s the so-called “finger pointing
problem”.  The importance of finger pointing is not who
gets to blame or be shamed, but rather how quickly we can
drill down to a root cause and fix it.  Also how many experts
it takes to do the drilling down.)

If there’s something to the above argument, then it follows
we should regard this event, and watch for future ones like
it, as indications that the seatbelts are a little too loose on
the current model of the Panama race-car.  We can’t achieve
absolute safety in an automatically extracted API, without
human intervention, but perhaps we can achieve a better
balance between performance and type checking, by default.

My thought here is that (a) there should be enough layout
metadata floating around to issue warnings or errors when
something doesn’t quite match, where the C type system
would have caught a problem.  Also, (b) there should be
a way to select the level of checking, so that during development
more expensive dynamic checks rule out errors at the
level of C types.  Perhaps (c) there should be casting operations
to assert new layouts (like C casts) to be inserted by hand
to discharge warnings or errors; these operations could
be no-ops when the type checking mode is turned off.

One of the advantages of *dynamic* binding (as opposed
to static bytecode generation) is that the binder can make
environmental decisions about layout checking, without
forcing the user to re-invoke jextract in some different
mode (-O vs. -g mode).  We do *not* (IMO) want to replicate
that bit of the C programming experience.  Java’s super
power is late binding; let’s not wear kryptonite socks please.

I’m not ready to propose something specific, but I would
ask us not just to accept SEGVs that could have been better
diagnosed, under the theory that “well, Panama is unsafe
and unsafe gives you SEGVs”.  We want an experience
that is competitive with programming in C (against
C APIs) and turning off type checking is not competitive,
in the long run.  That’s “long” run; we are building up
by steps here, and just getting the plumbing working at
speed is plenty of progress.  Did I mention that we *are*
making really good progress?  :-)

— John