Need some help in debugging jvm corruption/crashes in C++/JNI code

David Holmes david.holmes at oracle.com
Tue Jan 14 04:21:33 UTC 2020


On 14/01/2020 12:35 pm, Bwmat . wrote:
> Thanks for the response, but I JUST figured it out.
> 
> It was our fault, of course. I was staring at the trace, wondering why 
> it wasn't doing an interface call, when I realized that what I had 
> described as IColumn wasn't actually an interface, it was a class! I had 
> assumed we had properly written the compatibility layer to use the 
> interface, but the *real* Interface is actually rarely used, with the 
> aforementioned class being used in the vast majority of cases... (and I 
> haven't used this code much for a couple of years). Not here though! 
> Really kicking myself for not realizing this sooner.
> 
> Kinda wish that -Xcheck:jni checked this (or the debug assertions in 
> hotspot, though maybe they do in a newer version). Has this been 
> considered, anyone know?

Funny you should mention that:

https://bugs.openjdk.java.net/browse/JDK-8229900

a change in JDK 14, should provide the receiver check you were looking for.

Cheers,
David

> Thanks again,
> Matthew w.
> 
> 
> On Mon., Jan. 13, 2020, 6:22 p.m. David Holmes, <david.holmes at oracle.com 
> <mailto:david.holmes at oracle.com>> wrote:
> 
>     Hi,
> 
>     On 14/01/2020 9:56 am, Bwmat . wrote:
>      > Hello,
>      >
>      >
>      > I just joined this mailing list to hopefully get a bit of help
>     from someone
>      > familiar with hotspot.
> 
>     These mailing lists are not really for end user application debugging
>     help. However, as you have deep dived into hotspot internals ... :)
> 
>     In a debug build you can use itable/vtable logging to see how the
>     itable/vtable is constructed and see if anything odd appears there.
> 
>     -Xlog:itables*=trace,vtables*=trace
> 
>     You can also try running with -Xcheck:jni.
> 
>     FYI most file attachments are stripped by the mailing list so your log
>     did not get included.
> 
>     Cheers,
>     David
> 
>      >
>      >
>      > We found a crash in a C++ library we have that uses JNI, and I’ve
>     been
>      > trying to debug it for a couple of days. I’m working on windows,
>     and I
>      > managed to get a TTD (time travel debugging, a feature in the preview
>      > version of WinDBG) trace of when the problem occurs, and I built
>     myself a
>      > debug JVM from source to help debug, but I’m completely
>     unfamiliar with
>      > hotspot.
>      >
>      >
>      >
>      > I’ve also attached a log file created one time that the crash
>     occurred (the
>      > symptoms aren’t always the same, sometimes it doesn’t crash at
>     all, but
>      > just return an invalid value from a method call, so my test app quits
>      > early, sometimes it crashes from an assertion from within hotpot, not
>      > always the same one).
>      >
>      >
>      >
>      > One weird thing is that I found the issue while doing some
>     testing, and a
>      > certain SQL query triggers the issue, but another, analogous
>     query does
>      > not. This is significant, since all of the SQL processing is
>     written in
>      > Java, so very little changes in the native part of the
>     application between
>      > the case that works and the case that fails, which makes me think
>     it’s a
>      > jvm issue. I’m currently debugging a java 8 openjdk (since I used an
>      > article about how to build it that used that version, and that’s
>     what we’re
>      > using internally anyways, but I also reproduced the issue on a
>     java 13
>      > oracle jvm, so if it IS a JVM bug, doesn’t seem like it's been
>     fixed yet.
>      > I’m quite aware that it’s probably still our fault somehow though.)
>      >
>      >
>      >
>      > The issue seems to somehow be caused by the wrong java method
>     being invoked
>      > by a JNI call, or maybe the right method, but on the wrong object.
>      >
>      >
>      >
>      > Early on, while debugging using Eclipse’s remote debugger, I
>     found that,
>      > right before crashing, I was able to hit a breakpoint in a method
>     on a type
>      > T, but the debugger told me that the ‘this’ reference was
>     actually of type
>      > String!
>      >
>      >
>      >
>      > Later on, while debugging the internals of hotspot in my TTD
>     trace (which
>      > allowed me to get quite far without much understanding of what’s
>     going
>      > one), I found that the crash in the trace occurs when the wrong
>     method is
>      > invoked on an object, returning a long, which is later
>     interpreted as a
>      > string reference, and the jvm notices it isn’t actually a string
>     when the
>      > native code tries to get the string length.
>      >
>      >
>      >
>      > To lay out the situation without going into too many details of
>     our code,
>      > we have an interface IColumn, that has a few methods that get
>     attributes
>      > about a SQL column. Some methods return primitives, some return
>     strings.
>      > The native code is trying to call IColumn.GetLabel(), so it passes a
>      > jmethodID that was generated via a successful call to JNI’s
>     GetMethodID,
>      > passing the class object of IColumn. For the receiver of the call, it
>      > passes an instance of a type which implements IColumn, and has no
>      > superclasses, but is a private static inner class.
>      >
>      >
>      >
>      > In jni_invoke_nonstatic(), it goes into the ‘else if
>      > (!m->has_itable_index())’ branch, which seems wrong from what
>     little I
>      > gleamed from
>     https://wiki.openjdk.java.net/display/HotSpot/InterfaceCalls
>      >
>      > It then resolves the wrong method, getDisplaySize(), which
>     returns a long
>      > (I note that getDisplaySize() is the method declared directly before
>      > getLabel() in the declaration of IColumn, so maybe an off-by-one
>     error
>      > somewhere?), and then invokes it, dooming the process to a future
>     crash.
>      >
>      >
>      >
>      > If I go backwards in time to when the _*vtable*_index field of
>     the Method
>      > (the one originally resolved using GetMethodID, and later passed into
>      > CallObjectMethodV) is set, it’s in
>     KlassVtable::initialize_vtable(), on the
>      > line ‘mh()->set_vtable_index(initialized); // set primary vtable
>     index’,
>      > and that’s the last time it’s set.
>      >
>      > Am I right in thinking that it should have instead been set to a
>     negative
>      > value (so that it was an ‘itable index’), since it’s an interface
>     method?
>      > If so, where would that happen? What else should I check?
>      >
>      >
>      >
>      > Any suggestions are welcome, thanks in advance.
>      >
> 


More information about the hotspot-dev mailing list