RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection
vyom tewari
vyom.tewari at oracle.com
Mon Sep 10 14:53:10 UTC 2018
On Monday 10 September 2018 05:30 PM, Chris Hegarty wrote:
> Vyom,
>
> The NPE is originating from the finally block of the LdapClient
> `authenticate` method. In the finally block there is an attempt
> to restore the "read" timeout, since it was changed earlier in
> `authenticate`, to reflect the connect timeout value, since the
> subsequent operation(s) are considered part of the connect phase.
>
> An unsolicited message may close the connection during
> authenticate, which is does and LdapClient.ldapBind throws
> "javax.naming.CommunicationException: Request: 1 cancelled". The
> finally block is then executed and the previous exception is
> effectively throw away when the NPE is encountered.
>
> If we agree that such behaviour is reasonable ( and I think it
> most likely is ), then the finally block needs to be more
> defensive - check that conn is not null. It makes no sense to
> attempt to reset the read timeout if conn is null.
>
> With the following change, then the CommunicationException will
> be thrown from `authenticate`. I think this is the behaviour we
> want, no?
>
Yes , this will definitely solve this NPE issue but we may hit NPE other
places as well because we are setting 'LdapClient.com' to null
asynchronously.
Vyom
>
> --- a/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java
> +++ b/src/java.naming/share/classes/com/sun/jndi/ldap/LdapClient.java
> @@ -297,7 +297,8 @@
> conn.setV3(isLdapv3);
> return res;
> } finally {
> - conn.readTimeout = readTimeout;
> + if (conn != null)
> + conn.readTimeout = readTimeout;
> }
> }
>
>
> -Chris.
>
>
> On 07/09/18 17:47, Daniel Fuchs wrote:
>> Hi Vyom,
>>
>> Please see inline... I've elided some parts...
>>
>> On 07/09/2018 10:11, vyom tewari wrote:
>>> On Tuesday 04 September 2018 08:13 PM, Daniel Fuchs wrote:
>>>> So what we see here is actually a race between an open
>>>> connection being retrieved from the pool, and the server
>>>> deciding to close that connection.
>>>> The connection/client is retrieved from the pool, then
>>>> unfortunately closed by the server after having been selected
>>>> for the next operation.
>>>>
>>> I checked the code and i did not found any problem. Both "get" and
>>> "removePooledConnection" are "synchronized" so there should not be
>>> any problem(concurrency issue) here.
>>
>> I'm referring to this pattern (in forceClose):
>>
>> conn.cleanup(null, false);
>> conn = null;
>>
>> if (cleanPool) {
>> pcb.removePooledConnection(this);
>> }
>>
>> The connection is cleaned up, then nulled, then only removed
>> from the pool. I'm questioning whether the first thing to do would
>> be to remove the connection from the pool, to reduce the time window
>> in which the client could be retrieved from the pool by another
>> thread while forceClose is in process.
>>
>> I am not sure what side effect this could have - as I haven't
>> studied the code that retrieves the client from the pool,
>> but I'm wondering whether that's worth exploring.
>>
>>> As you said server is closing the same connection which is retrieved
>>> from pool which is correct but this should not throw unintentional
>>> NPE.
>>
>> Agreed.
>>
>>> In LdapClient, we set the "Ldapclient.conn" to explicitly null
>>> asynchronously after we remove the connection from pool and which is
>>> creating the NPE.
>>>
>>> LdapClient does not set `Ldapclient.conn` to null if connection is
>>> not pooled.
>>
>> Really? That's not what I see... Am I missing something?
>>
>> >> [...]
>>>> If so will leaving the 'conn' field assigned ensure that
>>>> the retry happens, or will new InitialDirContext() fail
>>>> with some other exception because the connection has
>>>> been closed? Or does the code simply assume that asynchronous
>>>> closing of the connection by the server can only happen while
>>>> it's sitting idle in the pool?
>>>>
>>> I don't know if retrying is feasible in 'IntialDirContext' but if
>>> we leave 'LdapClient.conn' assigned then we will get
>>> "javax.naming.CommunicationException" which tells that, underline
>>> connection is closed. I am not 100% sure if this is right approach
>>> but if we set 'LdapClient.conn' to null asynchronously then we will
>>> hit NPE.
>>
>> OK - that's good to know. I agree that CommunicationException is
>> better than NPE. If so then not nulling the connection
>> is at least an improvement.
>> But this would deserve a comment in the code, I think.
>> More below...
>>
>>>> I wonder if some other mechanism could be used to reduce
>>>> and hopefully eliminate the time window in which the race
>>>> could appear. For instance taking the connection out of
>>>> the pool before closing it, instead of closing it before
>>>> taking it out of the pool, etc...
>>>>
>>> Changing order will not help, i think closing the same connection
>>> is not a problem but setting `LdapClient.conn' to null
>>> asynchronously is causing NPE.
>>
>> Yes - I agree that setting conn to null is problematic.
>> I wonder why it's set to null. Maybe in an attempt to
>> make it eligible for GC earlier?
>>
>> If that's not the issue, then I would suggest making it
>> final (AFAICS it's only set to non-null once) and cleaning
>> up the code that tests for conn == null.
>>
>> If GC is the issue, then the best you can do is probably
>> only set it to null when the client is not pooled.
>>
>> And if we could reduce the time window in which a thread
>> could get a stale client from the pool that might be good
>> too?
>>
>> Is there anyway we could somehow add a test that reproduce
>> the NPE?
>>
>> best regards,
>>
>> -- daniel
>>
>> [...]
>>
>>>>>> On 24/08/18 11:35, vyom tewari wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Please review this simple fix below
>>>>>>>
>>>>>>> webrev:
>>>>>>> http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html
>>>>>>>
>>>>>>> bugid: https://bugs.openjdk.java.net/browse/JDK-8205330
>>>>>>>
>>>>>>> This fix will resolve the race in LdapClient where we are
>>>>>>> explicitly making "null" to LdapClient.conn.
>> [...]
More information about the core-libs-dev
mailing list