OpenSSL and panama-foreign
Rémy Maucherat
remm at apache.org
Wed Nov 10 11:20:36 UTC 2021
On Wed, Nov 10, 2021 at 10:34 AM Maurizio Cimadamore
<maurizio.cimadamore at oracle.com> wrote:
>
> Hi,
> thanks for sharing the feedback! I'm glad you got something working.
> Hopefully we can fix the rest :-)
>
> On 10/11/2021 09:20, Rémy Maucherat wrote:
> > However, I am running into cores (apparently caused by memory
> > corruption) with the panama-foreign branch, while the Java 17 version
> > seems solid.
>
> That is interesting information. So, you say that the Java 17 version
> works fine, but when building (latest) panama-foreign, you get some VM
> crash.
That's correct. I am not ready to call Java 17 totally stable yet, but
it makes it through a reasonable amount of testing [h2, Tomcat
testsuite, ab] and checks for memory leaks. This is getting released
now so maybe this will get further testing.
Only a heavy memory alloc/free activity seems to cause the cores, for
example ab -k (which will do keepalive over existing connections and
TLS engines) is almost stable. Without the -k, crashes always occur
after only a few seconds of ab.
Since this round of Tomcat releases is now done, I will see if I can
produce a real test case for the problem.
> Now, looking at the crash, it seems like it occurs in the middle of a
> native call:
>
> ```
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> v ~RuntimeStub::nep_invoker_blob <------------
> J 5738 c1
> org.apache.tomcat.util.openssl.openssl_h.SSL_shutdown(Ljdk/incubator/foreign/Addressable;)I
> (31 bytes) @ 0x00007faf82340d2c [0x00007faf8233cc20+0x000000000000410c]
> J 5482 c1
> org.apache.tomcat.util.net.openssl.panama.OpenSSLEngine.closeOutbound()V
> (76 bytes) @ 0x00007faf81a934e4 [0x00007faf81a93280+0x0000000000000264]
> ```
>
> This area changed quite a bit recently, as we are refactoring and
> consolidating the linking functionalities to make it simpler for
> developers to port the CLInker implementation to other platforms. I
> wonder if a regression sneaked in (possible, the refactoring were quite
> big).
Crashes happen in multiple locations (internal JVM, SSL_read,
SSL_shutdown), but the easiest to investigate was the SSL_shutdown
one.
In gdb, the debug is:
#10 ssl3_shutdown (s=0x0) at ssl/s3_lib.c:4420
4420 if (s->s3->alert_dispatch)
(gdb) print s
$1 = (SSL *) 0x0
With the corresponding source:
4400 int ssl3_shutdown(SSL *s)
4401 {
4402 int ret;
4403
4404 /*
4405 * Don't do anything much if we have not done the
handshake or we don't
4406 * want to send messages :-)
4407 */
4408 if (s->quiet_shutdown || SSL_in_before(s)) {
4409 s->shutdown = (SSL_SENT_SHUTDOWN | SSL_RECEIVED_SHUTDOWN);
4410 return 1;
4411 }
4412
4413 if (!(s->shutdown & SSL_SENT_SHUTDOWN)) {
4414 s->shutdown |= SSL_SENT_SHUTDOWN;
4415 ssl3_send_alert(s, SSL3_AL_WARNING, SSL_AD_CLOSE_NOTIFY);
4416 /*
4417 * our shutdown alert has been sent now, and if it
still needs to be
4418 * written, s->s3->alert_dispatch will be true
4419 */
4420 if (s->s3->alert_dispatch)
4421 return -1; /* return WANT_WRITE */
4422 } else if (s->s3->alert_dispatch) {
4423 /* resend it if not sent */
4424 ret = s->method->ssl_dispatch_alert(s);
4425 if (ret == -1) {
4426 /*
4427 * we only get to return -1 here the 2nd/Nth
invocation, we must
4428 * have already signalled return 0 upon a previous
invocation,
4429 * return WANT_WRITE
4430 */
4431 return ret;
4432 }
OpenSSL does its own malloc/free for most of its structures, for
example the SSL is allocated with SSL_new and freed with SSL_free.
There are hooks to provide your own malloc/free for integration and
leak debugging. I actually tried panama-izing these to see what would
happen, but this only caused cores (after a few hundred successful
allocations, the main init OPENSSL_init_ssl would cause a crash).
> In terms of support for shared scopes, the one thing that has changed
> from 16 to 17 is that now if you pass arguments by reference to
> functions, those references' scopes are "acquired" on function enter and
> "released" on function exit - which means it is not possible, even in a
> shared context, for e.g. the target address of a native function to be
> unloaded in the middle of a native call. But this should add _more_
> safety, not less.
Ok, that sounds like a good strategy.
> I suppose that you tested Java 17 by using the jextract available in the
> Panama EA binaries; and that you tested the bits in panama-foreign by
> manually building them, and using the jextract you obtained as a result
> (e.g. using the EA jextract against the latest panama-foreign API will
> NOT work).
Yes, for panama-foreign I am using the current commit with a jextract
built from there.
For Java 17, it took me a couple hours to figure out the pre built
jextract binary at http://jdk.java.net/panama/ was the best option.
The original plan for this OpenSSL module was to support
panama-foreign and track the API changes (plus bugfixes and
improvements). I expected Java 17 would need fixes [which would then
only be available in the dev branch].
Thanks for the quick response !
Rémy
> Thanks
> Maurizio
>
More information about the panama-dev
mailing list