RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection

Daniel Fuchs daniel.fuchs at oracle.com
Fri Sep 7 16:47:35 UTC 2018


Hi Vyom,

Please see inline... I've elided some parts...

On 07/09/2018 10:11, vyom tewari wrote:
> On Tuesday 04 September 2018 08:13 PM, Daniel Fuchs wrote:
>> So what we see here is actually a race between an open
>> connection being retrieved from the pool, and the server
>> deciding to close that connection.
>> The connection/client is retrieved from the pool, then
>> unfortunately closed by the server after having been selected
>> for the next operation.
>>
> I checked  the code and i did not found any problem. Both "get" and 
> "removePooledConnection" are "synchronized" so there should not be any 
> problem(concurrency issue) here.

I'm referring to this pattern (in forceClose):

             conn.cleanup(null, false);
             conn = null;

             if (cleanPool) {
                 pcb.removePooledConnection(this);
             }

The connection is cleaned up, then nulled, then only removed
from the pool. I'm questioning whether the first thing to do would
be to remove the connection from the pool, to reduce the time window
in which the client could be retrieved from the pool by another
thread while forceClose is in process.

I am not sure what side effect this could have - as I haven't
studied the code that retrieves the client from the pool,
but I'm wondering whether that's worth exploring.

> As you said server is closing the same connection which is retrieved 
> from pool which  is correct but this should not throw unintentional NPE.

Agreed.

> In LdapClient, we set the "Ldapclient.conn" to explicitly null 
> asynchronously after we remove the connection from pool and which is 
> creating the NPE.
> 
> LdapClient does not set `Ldapclient.conn` to null if connection is not 
> pooled.

Really? That's not what I see... Am I missing something?

 >> [...]
>> If so will leaving the 'conn' field assigned ensure that
>> the retry happens, or will new InitialDirContext() fail
>> with some other exception because the connection has
>> been closed? Or does the code simply assume that asynchronous
>> closing of the connection by the server can only happen while
>> it's sitting idle in the pool?
>>
>   I don't know if retrying is feasible in 'IntialDirContext' but if we 
> leave 'LdapClient.conn' assigned then we will get 
> "javax.naming.CommunicationException" which tells that, underline 
> connection is closed. I am not 100% sure if this is right approach but 
> if we set 'LdapClient.conn' to null asynchronously then we will hit NPE.

OK - that's good to know. I agree that CommunicationException is
better than NPE. If so then not nulling the connection
is at least an improvement.
But this would deserve a comment in the code, I think.
More below...

>> I wonder if some other mechanism could be used to reduce
>> and hopefully eliminate the time window in which the race
>> could appear. For instance taking the connection out of
>> the pool before closing it, instead of closing it before
>> taking it out of the pool, etc...
>>
> Changing order will not help, i think closing the  same connection is 
> not a problem  but setting `LdapClient.conn'  to null asynchronously is 
> causing NPE.

Yes - I agree that setting conn to null is problematic.
I wonder why it's set to null. Maybe in an attempt to
make it eligible for GC earlier?

If that's not the issue, then I would suggest making it
final (AFAICS it's only set to non-null once) and cleaning
up the code that tests for conn == null.

If GC is the issue, then the best you can do is probably
only set it to null when the client is not pooled.

And if we could reduce the time window in which a thread
could get a stale client from the pool that might be good
too?

Is there anyway we could somehow add a test that reproduce
the NPE?

best regards,

-- daniel

[...]

>>>> On 24/08/18 11:35, vyom tewari wrote:
>>>>> Hi All,
>>>>>
>>>>> Please review this simple fix below
>>>>>
>>>>> webrev: 
>>>>> http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html
>>>>>
>>>>> bugid: https://bugs.openjdk.java.net/browse/JDK-8205330
>>>>>
>>>>> This fix will resolve the race in LdapClient  where we are 
>>>>> explicitly making "null" to LdapClient.conn.
[...]


More information about the core-libs-dev mailing list