OpenJDK 18 Linux Bug when Wrapping Clang, Crashes Outside Native Frames
Joshua Suskalo
joshua+panama at suskalo.org
Mon Jul 18 16:16:25 UTC 2022
This is the first time I've posted to the panama-dev mailing list, so if
this isn't the right place for this, please forgive me.
I've been working fairly happily with Panama (sans not liking that
Addressable was made sealed in JDK 18) since JDK 17 building a wrapper
for it in Clojure called coffi[1], but I've recently run into a bug
that's a bit outside of what I think I can solve. I'm fairly sure it's
not an error in how I'm using Panama (although tracking it down has
helped me find and fix some bugs in coffi), and I've been able to
reproduce it in a short plain Java file.
I have a full listing for a Clojure reproduction case using coffi, a
Java reproduction case using Panama directly, and a C version that
appears to work just fine, as well as some test cpp files to use as
inputs, all available in a paste on sourcehut[2]. The first argument to
running them is the filename to parse. To run the Clojure version you
only need to have the Clojure CLI installed (available from most package
managers), mark the file executable, and run it as a script.
The problem is that when calling functions from clang's C api[3], the
JVM appears to enter a corrupted state that will eventually crash with a
SIGSEGV. Usually in my experience this happens outside of native stack
frames, and when working locally in a REPL in Clojure the actual crash
that occurred seemed non-deterministic, though that likely just had to
do with slightly different inputs to the system, as the reproduction
case I've included[2] appears to have deterministic behavior on my
machine.
Notably I have not observed similar behavior with other C libraries,
including ones which use upcalls, which means that this *may* simply be
an issue of clang corrupting memory through its normal use that causes
problems with the JVM but which does not affect the C runtime.
Unfortunately I don't know how to test that theory. I also believe I
have determined that this is not caused by the native threads in clang,
as disabling threading by passing the arguments `-mthread-model single`
to the parsing does not appear to prevent a crash, although in my
limited testing it *did* appear to increase the delay between running
the native code and the JVM crashing.
Thanks for your time and for reading,
Joshua Suskalo
[1]: https://github.com/IGJoshua/coffi
[2]: https://paste.sr.ht/~srasu/80750a5513bb5e175169465875a155136aad44d7
[3]: https://clang.llvm.org/doxygen/index.html
More information about the panama-dev
mailing list