Need some help in debugging jvm corruption/crashes in C++/JNI code

Bwmat . bwmat.reloaded at gmail.com
Tue Jan 14 02:35:43 UTC 2020


Thanks for the response, but I JUST figured it out.

It was our fault, of course. I was staring at the trace, wondering why it
wasn't doing an interface call, when I realized that what I had described
as IColumn wasn't actually an interface, it was a class! I had assumed we
had properly written the compatibility layer to use the interface, but the
*real* Interface is actually rarely used, with the aforementioned class
being used in the vast majority of cases... (and I haven't used this code
much for a couple of years). Not here though! Really kicking myself for not
realizing this sooner.

Kinda wish that -Xcheck:jni checked this (or the debug assertions in
hotspot, though maybe they do in a newer version). Has this been
considered, anyone know?

Thanks again,
Matthew w.


On Mon., Jan. 13, 2020, 6:22 p.m. David Holmes, <david.holmes at oracle.com>
wrote:

> Hi,
>
> On 14/01/2020 9:56 am, Bwmat . wrote:
> > Hello,
> >
> >
> > I just joined this mailing list to hopefully get a bit of help from
> someone
> > familiar with hotspot.
>
> These mailing lists are not really for end user application debugging
> help. However, as you have deep dived into hotspot internals ... :)
>
> In a debug build you can use itable/vtable logging to see how the
> itable/vtable is constructed and see if anything odd appears there.
>
> -Xlog:itables*=trace,vtables*=trace
>
> You can also try running with -Xcheck:jni.
>
> FYI most file attachments are stripped by the mailing list so your log
> did not get included.
>
> Cheers,
> David
>
> >
> >
> > We found a crash in a C++ library we have that uses JNI, and I’ve been
> > trying to debug it for a couple of days. I’m working on windows, and I
> > managed to get a TTD (time travel debugging, a feature in the preview
> > version of WinDBG) trace of when the problem occurs, and I built myself a
> > debug JVM from source to help debug, but I’m completely unfamiliar with
> > hotspot.
> >
> >
> >
> > I’ve also attached a log file created one time that the crash occurred
> (the
> > symptoms aren’t always the same, sometimes it doesn’t crash at all, but
> > just return an invalid value from a method call, so my test app quits
> > early, sometimes it crashes from an assertion from within hotpot, not
> > always the same one).
> >
> >
> >
> > One weird thing is that I found the issue while doing some testing, and a
> > certain SQL query triggers the issue, but another, analogous query does
> > not. This is significant, since all of the SQL processing is written in
> > Java, so very little changes in the native part of the application
> between
> > the case that works and the case that fails, which makes me think it’s a
> > jvm issue. I’m currently debugging a java 8 openjdk (since I used an
> > article about how to build it that used that version, and that’s what
> we’re
> > using internally anyways, but I also reproduced the issue on a java 13
> > oracle jvm, so if it IS a JVM bug, doesn’t seem like it's been fixed yet.
> > I’m quite aware that it’s probably still our fault somehow though.)
> >
> >
> >
> > The issue seems to somehow be caused by the wrong java method being
> invoked
> > by a JNI call, or maybe the right method, but on the wrong object.
> >
> >
> >
> > Early on, while debugging using Eclipse’s remote debugger, I found that,
> > right before crashing, I was able to hit a breakpoint in a method on a
> type
> > T, but the debugger told me that the ‘this’ reference was actually of
> type
> > String!
> >
> >
> >
> > Later on, while debugging the internals of hotspot in my TTD trace (which
> > allowed me to get quite far without much understanding of what’s going
> > one), I found that the crash in the trace occurs when the wrong method is
> > invoked on an object, returning a long, which is later interpreted as a
> > string reference, and the jvm notices it isn’t actually a string when the
> > native code tries to get the string length.
> >
> >
> >
> > To lay out the situation without going into too many details of our code,
> > we have an interface IColumn, that has a few methods that get attributes
> > about a SQL column. Some methods return primitives, some return strings.
> > The native code is trying to call IColumn.GetLabel(), so it passes a
> > jmethodID that was generated via a successful call to JNI’s GetMethodID,
> > passing the class object of IColumn. For the receiver of the call, it
> > passes an instance of a type which implements IColumn, and has no
> > superclasses, but is a private static inner class.
> >
> >
> >
> > In jni_invoke_nonstatic(), it goes into the ‘else if
> > (!m->has_itable_index())’ branch, which seems wrong from what little I
> > gleamed from
> https://wiki.openjdk.java.net/display/HotSpot/InterfaceCalls
> >
> > It then resolves the wrong method, getDisplaySize(), which returns a long
> > (I note that getDisplaySize() is the method declared directly before
> > getLabel() in the declaration of IColumn, so maybe an off-by-one error
> > somewhere?), and then invokes it, dooming the process to a future crash.
> >
> >
> >
> > If I go backwards in time to when the _*vtable*_index field of the Method
> > (the one originally resolved using GetMethodID, and later passed into
> > CallObjectMethodV) is set, it’s in KlassVtable::initialize_vtable(), on
> the
> > line ‘mh()->set_vtable_index(initialized); // set primary vtable index’,
> > and that’s the last time it’s set.
> >
> > Am I right in thinking that it should have instead been set to a negative
> > value (so that it was an ‘itable index’), since it’s an interface method?
> > If so, where would that happen? What else should I check?
> >
> >
> >
> > Any suggestions are welcome, thanks in advance.
> >
>


More information about the hotspot-dev mailing list