[foreign-jextract] Segmentation fault from generated code
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Fri Dec 18 15:00:09 UTC 2020
Jorn came up with a suggestion (thanks):
-XX:+UnlockDiagnosticVMOptions -XX:-ImplicitNullChecks
Not elegant, but works w/o disabling the JIT. I've verified and it works
on my setup.
Cheers
Maurizio
On 18/12/2020 14:51, Maurizio Cimadamore wrote:
> I managed to reproduce.
>
> It seems to me that libucs is installing signal handlers - and that
> these handlers interfere with the JVM's own signal handlers. The fact
> that I get this on the command line:
>
> ```
> Caught signal 11 (Segmentation fault: address not mapped to object at
> address 0x14)
> ==== backtrace (tid: 558482) ====
> 0
> /home/maurizio/Desktop/panama-test/ucx/build/lib/libucs.so.0(ucs_handle_error+0x2a4)
> [0x7fe23e0b4c74]
> 1
> /home/maurizio/Desktop/panama-test/ucx/build/lib/libucs.so.0(+0x27e4f)
> [0x7fe23e0b4e4f]
> 2
> /home/maurizio/Desktop/panama-test/ucx/build/lib/libucs.so.0(+0x28184)
> [0x7fe23e0b5184]
> 3 [0x7fe27130735b]
> =================================
> ```
>
> Seems very suspicious. This is in the standard output and NOT in the
> hotspot trace.
>
> This seems to suggest that libucs is installing an handler for signal
> 11, and that this handler is accidentally triggered by Hotspot C1/C2
> signals (Hotspot uses signals to handle certain events, such as NPEs)
> - see this:
>
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2011-March/003981.html
>
> It is likely that the UCX library installs an handler to detect issues
> in memory access - but that doesn't go well with hotspot.
>
> There's an option in Hotspot to minimize signal usage (-Xrs) but that
> will still install an handler for SIGSEGV, so it's useless in this
> circumstance. Only running with "-Xint" (interpreted mode) allowed me
> to run successfully.
>
> Maybe there's some configuration option that can be passed to UCX when
> building to prevent this signal handler to be installed?
>
>
> Cheers
> Maurizio
>
>
> On 18/12/2020 14:10, Maurizio Cimadamore wrote:
>>
>> Thanks - unfortunately libucx seems to be unavailable for Ubuntu
>> 20.04/18.04 - I'll try to build the library from scratch and
>> reproduce your issue.
>>
>> Cheers
>> Maurizio
>>
>> On 18/12/2020 12:56, Filip Krakowski wrote:
>>> Hi,
>>>
>>> the same code is running well inside a Docker container (Debian 11).
>>> I use Debian as the base image because it provides packages for the
>>> ucx development headers. For reference, here is my Dockerfile.
>>>
>>> FROM debian:bullseye
>>>
>>> # Use bash shell
>>> SHELL ["/bin/bash", "-c"]
>>>
>>> # Install packages
>>> RUN apt update && apt install -y libucx0 libucx-dev curl unzip
>>> zip wget llvm-9
>>>
>>> # Install SDKMAN!
>>> RUN curl -s "https://get.sdkman.io" | bash
>>>
>>> # Initialize SDKMAN!
>>> RUN source "$HOME/.sdkman/bin/sdkman-init.sh"
>>>
>>> # Install latest OpenJDK Panama nightly
>>> RUN curl -s "https://coconucos.cs.hhu.de/forschung/jdk/install" |
>>> bash
>>>
>>>
>>> Inside the built container I switch to the Panama JDK using "sdk
>>> default java panama", jextract the ucp headers using "jextract -l
>>> ucp -d . -t org.openucx /usr/include/ucp/api/ucp.h" and run a simple
>>> programm using "java -Dforeign.restricted=permit --add-modules
>>> jdk.incubator.foreign Main.java".
>>>
>>> import org.openucx.ucx_h.ucp_params_t;
>>>
>>> public class Main {
>>>
>>> public static void main(String[] args) {
>>> var layout = ucp_params_t.$LAYOUT();
>>> System.out.println(layout);
>>> }
>>> }
>>>
>>>
>>> The only difference I make with the other method (leading to a
>>> segfault) is that I run jextract locally on my machine (Arch Linux),
>>> compile the code afterwards and upload it to our cluster (CentOS).
>>> Both machines have the same version (1.9) of ucx installed.
>>>
>>> Best regards
>>> Filip
>>>
>>> On 12/18/20 12:12 PM, Filip Krakowski wrote:
>>>> Hi,
>>>>
>>>> I ran the code on Linux (CentOS Linux release 8.1.1911) after
>>>> installing the "ucx" package (version 1.9). I will create a Docker
>>>> container with the environment to reproduce this issue for easier
>>>> debugging.
>>>>
>>>> Best regards
>>>> Filip
>>>>
>>>> On 12/17/20 10:49 PM, Maurizio Cimadamore wrote:
>>>>> Hi,
>>>>> I haven't seen this particular one.
>>>>>
>>>>> What platform are you on? What do you need to reproduce?
>>>>>
>>>>> Thanks
>>>>> Maurizio
>>>>>
>>>>> On 17/12/2020 18:43, Filip Krakowski wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I work on a simple wrapper for ucx
>>>>>> (https://github.com/openucx/ucx) and am experiencing a
>>>>>> segmentation fault when calling any generated method. The strange
>>>>>> thing is that the segmentation fault disappears as soon as I
>>>>>> attach a (remote) debugger and manually step through the code.
>>>>>>
>>>>>> * Screenshot - https://i.imgur.com/okl3epv.png
>>>>>>
>>>>>> My code does only access a struct's layout. I don't create any
>>>>>> additional threads.
>>>>>>
>>>>>> log.info("Starting");
>>>>>> var layout = ucp_params_t.$LAYOUT();
>>>>>> log.info("{}", layout);
>>>>>>
>>>>>> The generated layout looks like this.
>>>>>>
>>>>>> static final MemoryLayout ucp_params$struct$LAYOUT_ =
>>>>>> MemoryLayout.ofStruct(
>>>>>> C_LONG.withName("field_mask"),
>>>>>> C_LONG.withName("features"),
>>>>>> C_LONG.withName("request_size"),
>>>>>> C_POINTER.withName("request_init"),
>>>>>> C_POINTER.withName("request_cleanup"),
>>>>>> C_LONG.withName("tag_sender_mask"),
>>>>>> C_INT.withName("mt_workers_shared"),
>>>>>> MemoryLayout.ofPaddingBits(32),
>>>>>> C_LONG.withName("estimated_num_eps"),
>>>>>> C_LONG.withName("estimated_num_ppn")
>>>>>> ).withName("ucp_params");
>>>>>>
>>>>>>
>>>>>> Is this a known Issue? I use the latest build from last night.
>>>>>>
>>>>>> Best regards
>>>>>> Filip
>>>>
>>>
More information about the panama-dev
mailing list