OpenJDK 18 Linux Bug when Wrapping Clang, Crashes Outside Native Frames

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Mon Jul 18 16:43:59 UTC 2022


Hi Joshua,
I know about coffi, and very excited to see stuff like these happening - 
kudos!

As for clang, my hunch is that you are hitting the dreaded "libclang 
crash recovery" issue - e.g.

https://reviews.llvm.org/D23662

Basically, clang installs its own signal handlers, which end up 
overriding (at least on Linux) the signal handlers installed by the JVM.

Our Jextract implementation (which is based on a Panama port of 
libclang) has also to workaround this:

https://github.com/openjdk/jextract/blob/master/src/main/java/org/openjdk/jextract/clang/LibClang.java#L54

(On windows, the recovery logic seems to work ok, but on Linux it causes 
spurious crashes, pretty much all over the place).

In principle, setting the LIBCLANG_DISABLE_CRASH_RECOVERY variable 
should be a quick way for you to check if that's indeed the issue.

Cheers
Maurizio

On 18/07/2022 17:16, Joshua Suskalo wrote:
> This is the first time I've posted to the panama-dev mailing list, so 
> if this isn't the right place for this, please forgive me.
>
> I've been working fairly happily with Panama (sans not liking that 
> Addressable was made sealed in JDK 18) since JDK 17 building a wrapper 
> for it in Clojure called coffi[1], but I've recently run into a bug 
> that's a bit outside of what I think I can solve. I'm fairly sure it's 
> not an error in how I'm using Panama (although tracking it down has 
> helped me find and fix some bugs in coffi), and I've been able to 
> reproduce it in a short plain Java file.
>
> I have a full listing for a Clojure reproduction case using coffi, a 
> Java reproduction case using Panama directly, and a C version that 
> appears to work just fine, as well as some test cpp files to use as 
> inputs, all available in a paste on sourcehut[2]. The first argument 
> to running them is the filename to parse. To run the Clojure version 
> you only need to have the Clojure CLI installed (available from most 
> package managers), mark the file executable, and run it as a script.
>
> The problem is that when calling functions from clang's C api[3], the 
> JVM appears to enter a corrupted state that will eventually crash with 
> a SIGSEGV. Usually in my experience this happens outside of native 
> stack frames, and when working locally in a REPL in Clojure the actual 
> crash that occurred seemed non-deterministic, though that likely just 
> had to do with slightly different inputs to the system, as the 
> reproduction case I've included[2] appears to have deterministic 
> behavior on my machine.
>
> Notably I have not observed similar behavior with other C libraries, 
> including ones which use upcalls, which means that this *may* simply 
> be an issue of clang corrupting memory through its normal use that 
> causes problems with the JVM but which does not affect the C runtime. 
> Unfortunately I don't know how to test that theory. I also believe I 
> have determined that this is not caused by the native threads in 
> clang, as disabling threading by passing the arguments `-mthread-model 
> single` to the parsing does not appear to prevent a crash, although in 
> my limited testing it *did* appear to increase the delay between 
> running the native code and the JVM crashing.
>
> Thanks for your time and for reading,
> Joshua Suskalo
>
> [1]: https://github.com/IGJoshua/coffi
> [2]: https://paste.sr.ht/~srasu/80750a5513bb5e175169465875a155136aad44d7
> [3]: https://clang.llvm.org/doxygen/index.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20220718/2255a1fe/attachment.htm>


More information about the panama-dev mailing list