OpenSSL and panama-foreign

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Wed Nov 10 16:39:58 UTC 2021


For the records, I'm on Linux, so if you can put together a test case 
maybe with just SSL engine (e.g. no tomcat) which is enough to reproduce 
the problem, I'm happy to give it a try and do some debugging.

Maurizio

On 10/11/2021 15:48, Maurizio Cimadamore wrote:
>
> I see what you mean - e.g. this code is just a straight port of the 
> JNI code.
>
> That said, it's possible that the JNI code kept alive some Java 
> objects via JNI handles - which with Panama would no longer be true.
>
> I'd recommend to double check if, when you get a core dump, your 
> cleanup actions are run prematurely. Right now it's a bit hard to tell 
> given that some address are raw native addresses (w/o a lifecycle) and 
> some do have a lifecycle (scope).
>
> But looking at your debug for the crash you got on SSL_Shutdown, I 
> still don't get why SSL* is set to zeros (NULL). I don't think you are 
> assigning that pointer anywhere else (OpenSSLState.ssl is final).
>
> Which seems to suggest that, somehow, that ssl pointer is set to NULL 
> from the start, and then you get a crash when you try to do anything 
> with it.
>
> But I agree that it looks suspicious that the behavior changed from 17.
>
> I'd suggest starting from making sure that there's just one cleanup 
> mechanism (the scope). E.g.
>
> 1. create an implicit scope, as suggested by Jorn instead of a shared one.
> 2. do not register anything with a cleaner. Instead add a cleanup 
> action, so that when the scope is closed (implicitly, by a cleaner), 
> you call BIO_free and SSL_free on the pointers you have.
> 3. I think you might need to add reachability fences - because there's 
> no guarantee that the scope would be kept reachable while you call 
> methods on SSLEngine - the VM might detect that `this` is no longer 
> used and then reclaim it (as Jorn mentioned, I'm not sure if relying 
> on synchronized blocks is enough for this)
>
> In other words, let's simplify the lifecycle management a bit, so that 
> we can rule out e.g. that the scope is close _while_ one of the native 
> call is taking place. If we do that, and we still get a crash, that 
> might point to a real regression - as things are, it's hard to tell - 
> it might just have been a latent bug which now became more manifest.
>
> Maurizio
>
>
> On 10/11/2021 14:54, Rémy Maucherat wrote:
>> Thanks a lot for the great design ideas. Keeping around MemorySegment
>> instead of MemoryAddress is a much better design. I'll also use the
>> scope as the controller for closing everything.
>>
>> Notes: The cleaner comes from a Java 11 finalize refactoring. The sync
>> model was also there before with the JNI code, it worked. Overall this
>> is a translation of the JNI calling code, the ssl there was simply a
>> long (so not very safe ...).


More information about the panama-dev mailing list