RFR:8205330 InitialDirContext ctor sometimes throws NPE if the server has sent a disconnection
vyom.tewari at oracle.com
Fri Sep 7 09:11:19 UTC 2018
Thanks for detailed review and comment. Please find my answers inline.
On Tuesday 04 September 2018 08:13 PM, Daniel Fuchs wrote:
> Hi Vyom,
> IIUC, the issue only happens when connections (clients?) to the
> server are pooled. I assume the server may decide to
> close what it thinks is an idle connection at any time.
> So what we see here is actually a race between an open
> connection being retrieved from the pool, and the server
> deciding to close that connection.
> The connection/client is retrieved from the pool, then
> unfortunately closed by the server after having been selected
> for the next operation.
I checked the code and i did not found any problem. Both "get" and
"removePooledConnection" are "synchronized" so there should not be any
problem(concurrency issue) here.
As you said server is closing the same connection which is retrieved
from pool which is correct but this should not throw unintentional NPE.
In LdapClient, we set the "Ldapclient.conn" to explicitly null
asynchronously after we remove the connection from pool and which is
creating the NPE.
LdapClient does not set `Ldapclient.conn` to null if connection is not
> The question for me would be "what is the expected behavior
> if the server decides to close an idle connection?"
> I would expect that new InitialDirContext() should have some
> way of dealing with that with retrying?
> If so will leaving the 'conn' field assigned ensure that
> the retry happens, or will new InitialDirContext() fail
> with some other exception because the connection has
> been closed? Or does the code simply assume that asynchronous
> closing of the connection by the server can only happen while
> it's sitting idle in the pool?
I don't know if retrying is feasible in 'IntialDirContext' but if we
leave 'LdapClient.conn' assigned then we will get
"javax.naming.CommunicationException" which tells that, underline
connection is closed. I am not 100% sure if this is right approach but
if we set 'LdapClient.conn' to null asynchronously then we will hit NPE.
> I wonder if some other mechanism could be used to reduce
> and hopefully eliminate the time window in which the race
> could appear. For instance taking the connection out of
> the pool before closing it, instead of closing it before
> taking it out of the pool, etc...
Changing order will not help, i think closing the same connection is
not a problem but setting `LdapClient.conn' to null asynchronously is
Please do let me know if i am missing something.
> best regards,
> -- daniel
> On 04/09/2018 15:04, vyom tewari wrote:
>> On Friday 24 August 2018 08:52 PM, Chris Hegarty wrote:
>>> Hi Vyom,
>>> On 24/08/18 11:35, vyom tewari wrote:
>>>> Hi All,
>>>> Please review this simple fix below
>>>> bugid: https://bugs.openjdk.java.net/browse/JDK-8205330
>>>> This fix will resolve the race in LdapClient where we are
>>>> explicitly making "null" to LdapClient.conn.
>>> Sorry, I don't know this code all that well, but I think
>>> that more explanation will be needed before this code
>>> can be reviewed.
>> Hi Chris, let me try to explain issue little bit.
>> The issue is a if ldap connection has already been established and
>> then the LDAP directory server sends an (unsolicited) "Notice of
>> Disconnection", the client's processing of this LDAP message can race
>> with an application thread calling new InitialDirContext() to
>> authenticate user credentials (i.e.bind) and throw NPE.
>> I did some analysis and found out that when server send an
>> (unsolicited) "Notice of Disconnection", LdapClient.forceClose() will
>> be called in LdapClient.processUnsolicited() which is called
>> asynchronously by the reader thread in Connection, this means
>> 'LdapClient.conn' may set to null anytime after it received "Notice
>> of Disconnection".
>>> The LdapClient and the Connection seem to be loosely
>>> coupled. I think part of this is to allow the LdapClient
>>> to be GC'ed and finalized separately to the Connection
>>> ( that can be reused ). Not setting `conn` to null could
>>> have a negative impact on this loose coupling. I also
>>> note that the synchronization is implemented poorly in
>>> the LdapClient, `conn` is operated on both from within
>>> synchronized blocks and from unsynchronized blocks (
>>> which I think is the reason you see the unexpected
>>> null ).
>> I agree, not setting 'conn' to null will definitely have some
>> impact. I check the LdapClient and it looks to me it is intentional(i
>> may be wrong) that 'conn' is operated on both from within synchronize
>> blocks and from unsynchronize block.
>> LdapClient, java doc says that connection(conn) take care of it's own
>> access from outside LdapClient must sync;
>> * conn - no sync; Connection takes care of its own sync
>> * unsolicited - sync Vector; multiple operations sync'ed
>> Please have a look and do let me know your thought on the above.
More information about the core-libs-dev