OpenJDK 18 Linux Bug when Wrapping Clang, Crashes Outside Native Frames

Joshua Suskalo joshua+panama at suskalo.org
Mon Jul 18 16:16:25 UTC 2022


This is the first time I've posted to the panama-dev mailing list, so if 
this isn't the right place for this, please forgive me.

I've been working fairly happily with Panama (sans not liking that 
Addressable was made sealed in JDK 18) since JDK 17 building a wrapper 
for it in Clojure called coffi[1], but I've recently run into a bug 
that's a bit outside of what I think I can solve. I'm fairly sure it's 
not an error in how I'm using Panama (although tracking it down has 
helped me find and fix some bugs in coffi), and I've been able to 
reproduce it in a short plain Java file.

I have a full listing for a Clojure reproduction case using coffi, a 
Java reproduction case using Panama directly, and a C version that 
appears to work just fine, as well as some test cpp files to use as 
inputs, all available in a paste on sourcehut[2]. The first argument to 
running them is the filename to parse. To run the Clojure version you 
only need to have the Clojure CLI installed (available from most package 
managers), mark the file executable, and run it as a script.

The problem is that when calling functions from clang's C api[3], the 
JVM appears to enter a corrupted state that will eventually crash with a 
SIGSEGV. Usually in my experience this happens outside of native stack 
frames, and when working locally in a REPL in Clojure the actual crash 
that occurred seemed non-deterministic, though that likely just had to 
do with slightly different inputs to the system, as the reproduction 
case I've included[2] appears to have deterministic behavior on my 
machine.

Notably I have not observed similar behavior with other C libraries, 
including ones which use upcalls, which means that this *may* simply be 
an issue of clang corrupting memory through its normal use that causes 
problems with the JVM but which does not affect the C runtime. 
Unfortunately I don't know how to test that theory. I also believe I 
have determined that this is not caused by the native threads in clang, 
as disabling threading by passing the arguments `-mthread-model single` 
to the parsing does not appear to prevent a crash, although in my 
limited testing it *did* appear to increase the delay between running 
the native code and the JVM crashing.

Thanks for your time and for reading,
Joshua Suskalo

[1]: https://github.com/IGJoshua/coffi
[2]: https://paste.sr.ht/~srasu/80750a5513bb5e175169465875a155136aad44d7
[3]: https://clang.llvm.org/doxygen/index.html


More information about the panama-dev mailing list