OpenSSL and panama-foreign

Rémy Maucherat remm at apache.org
Wed Nov 10 11:20:36 UTC 2021


On Wed, Nov 10, 2021 at 10:34 AM Maurizio Cimadamore
<maurizio.cimadamore at oracle.com> wrote:
>
> Hi,
> thanks for sharing the feedback! I'm glad you got something working.
> Hopefully we can fix the rest :-)
>
> On 10/11/2021 09:20, Rémy Maucherat wrote:
> > However, I am running into cores (apparently caused by memory
> > corruption) with the panama-foreign branch, while the Java 17 version
> > seems solid.
>
> That is interesting information. So, you say that the Java 17 version
> works fine, but when building (latest) panama-foreign, you get some VM
> crash.

That's correct. I am not ready to call Java 17 totally stable yet, but
it makes it through a reasonable amount of testing [h2, Tomcat
testsuite, ab] and checks for memory leaks. This is getting released
now so maybe this will get further testing.

Only a heavy memory alloc/free activity seems to cause the cores, for
example ab -k (which will do keepalive over existing connections and
TLS engines) is almost stable. Without the -k, crashes always occur
after only a few seconds of ab.

Since this round of Tomcat releases is now done, I will see if I can
produce a real test case for the problem.

> Now, looking at the crash, it seems like it occurs in the middle of a
> native call:
>
> ```
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> v  ~RuntimeStub::nep_invoker_blob <------------
> J 5738 c1
> org.apache.tomcat.util.openssl.openssl_h.SSL_shutdown(Ljdk/incubator/foreign/Addressable;)I
> (31 bytes) @ 0x00007faf82340d2c [0x00007faf8233cc20+0x000000000000410c]
> J 5482 c1
> org.apache.tomcat.util.net.openssl.panama.OpenSSLEngine.closeOutbound()V
> (76 bytes) @ 0x00007faf81a934e4 [0x00007faf81a93280+0x0000000000000264]
> ```
>
> This area changed quite a bit recently, as we are refactoring and
> consolidating the linking functionalities to make it simpler for
> developers to port the CLInker implementation to other platforms. I
> wonder if a regression sneaked in (possible, the refactoring were quite
> big).

Crashes happen in multiple locations (internal JVM, SSL_read,
SSL_shutdown), but the easiest to investigate was the SSL_shutdown
one.

In gdb, the debug is:

#10 ssl3_shutdown (s=0x0) at ssl/s3_lib.c:4420
4420            if (s->s3->alert_dispatch)
(gdb) print s
$1 = (SSL *) 0x0

With the corresponding source:
4400    int ssl3_shutdown(SSL *s)
4401    {
4402        int ret;
4403
4404        /*
4405         * Don't do anything much if we have not done the
handshake or we don't
4406         * want to send messages :-)
4407         */
4408        if (s->quiet_shutdown || SSL_in_before(s)) {
4409            s->shutdown = (SSL_SENT_SHUTDOWN | SSL_RECEIVED_SHUTDOWN);
4410            return 1;
4411        }
4412
4413        if (!(s->shutdown & SSL_SENT_SHUTDOWN)) {
4414            s->shutdown |= SSL_SENT_SHUTDOWN;
4415            ssl3_send_alert(s, SSL3_AL_WARNING, SSL_AD_CLOSE_NOTIFY);
4416            /*
4417             * our shutdown alert has been sent now, and if it
still needs to be
4418             * written, s->s3->alert_dispatch will be true
4419             */
4420            if (s->s3->alert_dispatch)
4421                return -1;        /* return WANT_WRITE */
4422        } else if (s->s3->alert_dispatch) {
4423            /* resend it if not sent */
4424            ret = s->method->ssl_dispatch_alert(s);
4425            if (ret == -1) {
4426                /*
4427                 * we only get to return -1 here the 2nd/Nth
invocation, we must
4428                 * have already signalled return 0 upon a previous
invocation,
4429                 * return WANT_WRITE
4430                 */
4431                return ret;
4432            }

OpenSSL does its own malloc/free for most of its structures, for
example the SSL is allocated with SSL_new and freed with SSL_free.
There are hooks to provide your own malloc/free for integration and
leak debugging. I actually tried panama-izing these to see what would
happen, but this only caused cores (after a few hundred successful
allocations, the main init OPENSSL_init_ssl would cause a crash).

> In terms of support for shared scopes, the one thing that has changed
> from 16 to 17 is that now if you pass arguments by reference to
> functions, those references' scopes are "acquired" on function enter and
> "released" on function exit - which means it is not possible, even in a
> shared context, for e.g. the target address of a native function to be
> unloaded in the middle of a native call. But this should add _more_
> safety, not less.

Ok, that sounds like a good strategy.

> I suppose that you tested Java 17 by using the jextract available in the
> Panama EA binaries; and that you tested the bits in panama-foreign by
> manually building them, and using the jextract you obtained as a result
> (e.g. using the EA jextract against the latest panama-foreign API will
> NOT work).

Yes, for panama-foreign I am using the current commit with a jextract
built from there.
For Java 17, it took me a couple hours to figure out the pre built
jextract binary at http://jdk.java.net/panama/ was the best option.

The original plan for this OpenSSL module was to support
panama-foreign and track the API changes (plus bugfixes and
improvements). I expected Java 17 would need fixes [which would then
only be available in the dev branch].

Thanks for the quick response !

Rémy

> Thanks
> Maurizio
>


More information about the panama-dev mailing list