[RFR] 8223727: com/sun/jndi/ldap/privconn/RunTest.java failed due to hang in LdapRequest.getReplyBer

Pavel Rappo pavel.rappo at oracle.com
Thu Jun 13 12:14:34 UTC 2019


Hi Rob,

As per our offline conversation, there's indeed a race condition between the cleanup
actions performed by LDAP connection (com.sun.jndi.ldap.Connection.cleanup)
and reading from LdapRequest (com.sun.jndi.ldap.LdapRequest.getReplyBer).

There are 3 activities of interest to us. Adding a request (1), reading that
request (2) and cleaning up the connection (3). (3) can happen before (1),
between (1) and (2), and after (2). If it so happens that the request is added
(2) after the connection has been cleaned up (3), then that request will never
be cancelled or closed. Whoever is reading that request will have to wait
indefinitely. Which is what we (luckily) saw in that test run.

Your patch fixes that issue by checking if the connection has been cleaned up
before the request is read. In the case where the sequence of events is (1),
(3), (2) the code will perform a redundant, yet harmless check.

I've run the attached test both with and without the fix. I must say that it's
not very "specific". If there's no fix the test will pass on our testing system
almost always. Which is not the case locally. Locally it fails all the time for
me.

    For the record, this it the test that I gave Rob so we could reproduce the
    issue. 
    
I will continue to think about how to increase the test's specificity. One
option would be to increase the number of iterations. But it might bring the
problems of its own. Each LdapContext creates a connection. Each connection
creates a daemon thread. So there's a risk of exhausting system resources.
Another option would be to return -1 from InputStream.read rather than throw an
Exception, so the daemon thread could finish even quicker.

Please add the required copyright header to the test file, otherwise looks good.

Thanks.

P.S. The multithreading design of that code is somewhat disturbing and the
structure of it seems brittle. For instance, multilayered synchronizations (they
might be perfectly appropriate, but the sheer amount of them is concerning). Or
the fact that requests can be added to the connection after it has been closed.
I also actively dislike that the connection doesn't seem to be asynchronously
closable and is serviced by a daemon thread, etc.

> On 12 Jun 2019, at 21:56, Rob McKenna <rob.mckenna at oracle.com> wrote:
> 
> Hi folks,
> 
> Looking for a review for this test failure. (caused by an oversight in
> 8139965)
> 
> http://cr.openjdk.java.net/~robm/8223727/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8223727
> 
> I've subjected this change to a 100 run repeat of the original failure
> across all platforms, and another 100 run of the new test. No failures
> were seen.
> 
>    -Rob
> 



More information about the core-libs-dev mailing list