RFR: 8139965 - Hang seen when using com.sun.jndi.ldap.search.replyQueueSize - core-libs-dev

newer
Re: JMathConstant submission for...

RFR: 8139965 - Hang seen when using com.sun.jndi.ldap.search.replyQueueSize

Rob McKenna

3 Sep 2018 3 Sep '18

9:48 p.m.

Hi folks, I'd like to get a re-review of this change: https://bugs.openjdk.java.net/browse/JDK-8139965 http://cr.openjdk.java.net/~robm/8139965/webrev/ In case the original has gone stale: http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-August/042767.html -Rob

Show replies by date

vyom tewari

4 Sep 4 Sep

7:39 a.m.

Hi Rob, Code change looks good to me. Thanks, Vyom On Tuesday 04 September 2018 03:18 AM, Rob McKenna wrote:

...

Hi folks,

I'd like to get a re-review of this change:

https://bugs.openjdk.java.net/browse/JDK-8139965 http://cr.openjdk.java.net/~robm/8139965/webrev/

In case the original has gone stale:

http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-August/042767.html

-Rob

Chris Hegarty

9 a.m.

Rob,

...

On 3 Sep 2018, at 22:48, Rob McKenna <rob.mckenna@oracle.com> wrote:

Hi folks,

I'd like to get a re-review of this change:

https://bugs.openjdk.java.net/browse/JDK-8139965 <https://bugs.openjdk.java.net/browse/JDK-8139965>

This issue is closed as `will not fix`. I presume you will re-open it before pushing.

...

http://cr.openjdk.java.net/~robm/8139965/webrev/ <http://cr.openjdk.java.net/~robm/8139965/webrev/>

43 private boolean completed; Won’t `completed` need to be volatile now? ( since the removal of synchronized from hasSearchCompleted ) LdapRequest.close puts EOF into the queue, but that is a potentially blocking operation ( while holding the lock ). If the queue is at capacity, then it will block forever. This model will only work if `getReplyBer` is always guaranteed to be running concurrently. Is it? Please check the indentation of LdapRequest.java L103 ( in the new file ). It appears, in the webrev, that the trailing `}` is not lined up. The indentation doesn’t look right here either. 624 if (nparent) { 625 LdapRequest ldr = pendingRequests; 626 while (ldr != null) { 627 ldr.close(); 628 ldr = ldr.next; 629 } 630 } 631 } -Chris

Daniel Fuchs

9:22 a.m.

Hi Rob, I concur with Chris. completed needs to be volatile and close() needs to set a flag and use offer like cancel(). The condition for testing for closed then becomes that the flag is set and the queue is empty or has EOF as its head. Is there any way this could be tested by a regression test? best regards, -- daniel On 04/09/2018 10:00, Chris Hegarty wrote:

...

Rob,

...
On 3 Sep 2018, at 22:48, Rob McKenna <rob.mckenna@oracle.com> wrote:

Hi folks,

I'd like to get a re-review of this change:

https://bugs.openjdk.java.net/browse/JDK-8139965 <https://bugs.openjdk.java.net/browse/JDK-8139965>

This issue is closed as `will not fix`. I presume you will re-open it before pushing.

...
http://cr.openjdk.java.net/~robm/8139965/webrev/ <http://cr.openjdk.java.net/~robm/8139965/webrev/>

43 private boolean completed; Won’t `completed` need to be volatile now? ( since the removal of synchronized from hasSearchCompleted )

LdapRequest.close puts EOF into the queue, but that is a potentially blocking operation ( while holding the lock ). If the queue is at capacity, then it will block forever. This model will only work if `getReplyBer` is always guaranteed to be running concurrently. Is it?

Please check the indentation of LdapRequest.java L103 ( in the new file ). It appears, in the webrev, that the trailing `}` is not lined up.

The indentation doesn’t look right here either. 624 if (nparent) { 625 LdapRequest ldr = pendingRequests; 626 while (ldr != null) { 627 ldr.close(); 628 ldr = ldr.next; 629 } 630 } 631 }

-Chris

Rob McKenna

5 Sep 5 Sep

1:05 a.m.

Thanks for the reviews folks. I believe the following captures your recommended changes: http://cr.openjdk.java.net/~robm/8139965/webrev.02/ W.r.t. testing I think this area has been difficult to test traditionally. I'll have a dig through the existing testbase (and I'll get back to you) to see if there's anything similar but afaik most tests simply mimic a bindable dummy ldap server. Vyom, are you aware of any more rigorous tests / ldap test frameworks? -Rob On 04/09/18 10:22, Daniel Fuchs wrote:

...

Hi Rob,

I concur with Chris. completed needs to be volatile and close() needs to set a flag and use offer like cancel().

The condition for testing for closed then becomes that the flag is set and the queue is empty or has EOF as its head.

Is there any way this could be tested by a regression test?

best regards,

-- daniel

On 04/09/2018 10:00, Chris Hegarty wrote:

...
Rob,

...
On 3 Sep 2018, at 22:48, Rob McKenna <rob.mckenna@oracle.com> wrote:

Hi folks,

I'd like to get a re-review of this change:

https://bugs.openjdk.java.net/browse/JDK-8139965 <https://bugs.openjdk.java.net/browse/JDK-8139965>

This issue is closed as `will not fix`. I presume you will re-open it before pushing.

...
http://cr.openjdk.java.net/~robm/8139965/webrev/ <http://cr.openjdk.java.net/~robm/8139965/webrev/>

43 private boolean completed; Won’t `completed` need to be volatile now? ( since the removal of synchronized from hasSearchCompleted )

LdapRequest.close puts EOF into the queue, but that is a potentially blocking operation ( while holding the lock ). If the queue is at capacity, then it will block forever. This model will only work if `getReplyBer` is always guaranteed to be running concurrently. Is it?

Please check the indentation of LdapRequest.java L103 ( in the new file ). It appears, in the webrev, that the trailing `}` is not lined up.

The indentation doesn’t look right here either. 624 if (nparent) { 625 LdapRequest ldr = pendingRequests; 626 while (ldr != null) { 627 ldr.close(); 628 ldr = ldr.next; 629 } 630 } 631 }

-Chris

Daniel Fuchs

8:53 a.m.

Hi Rob, That looks better but I believe that: 1. closed should be volatile since it's read from outside synchronized block 2. it seems there might be a race where the last response received could be dropped, if the connection is closed just after it's been pulled from the queue. So I would suggest exchanging: 115 if (isClosed()) { 116 return null; 117 } 118 119 return result; with: return result == EOF ? null : result; best regards, -- daniel On 05/09/2018 02:05, Rob McKenna wrote:

...

Thanks for the reviews folks.

I believe the following captures your recommended changes:

http://cr.openjdk.java.net/~robm/8139965/webrev.02/

W.r.t. testing I think this area has been difficult to test traditionally. I'll have a dig through the existing testbase (and I'll get back to you) to see if there's anything similar but afaik most tests simply mimic a bindable dummy ldap server.

Vyom, are you aware of any more rigorous tests / ldap test frameworks?

-Rob

On 04/09/18 10:22, Daniel Fuchs wrote:

...
Hi Rob,

I concur with Chris. completed needs to be volatile and close() needs to set a flag and use offer like cancel().

The condition for testing for closed then becomes that the flag is set and the queue is empty or has EOF as its head.

Is there any way this could be tested by a regression test?

best regards,

-- daniel

On 04/09/2018 10:00, Chris Hegarty wrote:

...
Rob,

...
On 3 Sep 2018, at 22:48, Rob McKenna <rob.mckenna@oracle.com> wrote:

Hi folks,

I'd like to get a re-review of this change:

https://bugs.openjdk.java.net/browse/JDK-8139965 <https://bugs.openjdk.java.net/browse/JDK-8139965>

This issue is closed as `will not fix`. I presume you will re-open it before pushing.

...
http://cr.openjdk.java.net/~robm/8139965/webrev/ <http://cr.openjdk.java.net/~robm/8139965/webrev/>

43 private boolean completed; Won’t `completed` need to be volatile now? ( since the removal of synchronized from hasSearchCompleted )

LdapRequest.close puts EOF into the queue, but that is a potentially blocking operation ( while holding the lock ). If the queue is at capacity, then it will block forever. This model will only work if `getReplyBer` is always guaranteed to be running concurrently. Is it?

Please check the indentation of LdapRequest.java L103 ( in the new file ). It appears, in the webrev, that the trailing `}` is not lined up.

The indentation doesn’t look right here either. 624 if (nparent) { 625 LdapRequest ldr = pendingRequests; 626 while (ldr != null) { 627 ldr.close(); 628 ldr = ldr.next; 629 } 630 } 631 }

-Chris

Rob McKenna

25 Oct 25 Oct

8:53 p.m.

Thanks Daniel: http://cr.openjdk.java.net/~robm/8139965/webrev.03/ I'm planning to follow up on the test side of things with a separate bug. I think the technique used in some of the recent SQE LDAP tests might be applicable. -Rob On 05/09/18 09:53, Daniel Fuchs wrote:

...

Hi Rob,

That looks better but I believe that:

1. closed should be volatile since it's read from outside synchronized block

2. it seems there might be a race where the last response received could be dropped, if the connection is closed just after it's been pulled from the queue.

So I would suggest exchanging:

115 if (isClosed()) { 116 return null; 117 } 118 119 return result;

with:

return result == EOF ? null : result;

best regards,

-- daniel

On 05/09/2018 02:05, Rob McKenna wrote:

...
Thanks for the reviews folks.

I believe the following captures your recommended changes:

http://cr.openjdk.java.net/~robm/8139965/webrev.02/

W.r.t. testing I think this area has been difficult to test traditionally. I'll have a dig through the existing testbase (and I'll get back to you) to see if there's anything similar but afaik most tests simply mimic a bindable dummy ldap server.

Vyom, are you aware of any more rigorous tests / ldap test frameworks?

-Rob

On 04/09/18 10:22, Daniel Fuchs wrote:

...
Hi Rob,

I concur with Chris. completed needs to be volatile and close() needs to set a flag and use offer like cancel().

The condition for testing for closed then becomes that the flag is set and the queue is empty or has EOF as its head.

Is there any way this could be tested by a regression test?

best regards,

-- daniel

On 04/09/2018 10:00, Chris Hegarty wrote:

...
Rob,

...
On 3 Sep 2018, at 22:48, Rob McKenna <rob.mckenna@oracle.com> wrote:

Hi folks,

I'd like to get a re-review of this change:

https://bugs.openjdk.java.net/browse/JDK-8139965 <https://bugs.openjdk.java.net/browse/JDK-8139965>

This issue is closed as `will not fix`. I presume you will re-open it before pushing.

...
http://cr.openjdk.java.net/~robm/8139965/webrev/ <http://cr.openjdk.java.net/~robm/8139965/webrev/>

43 private boolean completed; Won’t `completed` need to be volatile now? ( since the removal of synchronized from hasSearchCompleted )

LdapRequest.close puts EOF into the queue, but that is a potentially blocking operation ( while holding the lock ). If the queue is at capacity, then it will block forever. This model will only work if `getReplyBer` is always guaranteed to be running concurrently. Is it?

Please check the indentation of LdapRequest.java L103 ( in the new file ). It appears, in the webrev, that the trailing `}` is not lined up.

The indentation doesn’t look right here either. 624 if (nparent) { 625 LdapRequest ldr = pendingRequests; 626 while (ldr != null) { 627 ldr.close(); 628 ldr = ldr.next; 629 } 630 } 631 }

-Chris

Daniel Fuchs

26 Oct 26 Oct

8:54 a.m.

Hi Rob, Now I'm concerned about this method: 71 synchronized boolean addReplyBer(BerDecoder ber) { 72 // check the closed boolean value here as we don't want anything 73 // to be added to the queue after close() has been called. 74 if (cancelled || closed) { 75 return false; 76 } 77 78 // Add a new reply to the queue of unprocessed replies. 79 try { 80 replies.put(ber); 81 } catch (InterruptedException e) { 82 // ignore 83 } 84 85 // peek at the BER buffer to check if it is a SearchResultDone PDU 86 try { 87 ber.parseSeq(null); 88 ber.parseInt(); 89 completed = (ber.peekByte() == LdapClient.LDAP_REP_RESULT); 90 } catch (IOException e) { 91 // ignore 92 } 93 ber.reset(); 94 return pauseAfterReceipt; 95 } getReplyBer() is not synchronized, so AFAICS there is a chance that line 93 executes after another thread as got hold of the `ber` object. Is that an issue? Should the `ber` object be added to the reply queue only after it's been reset? best regards, -- daniel On 25/10/2018 21:53, Rob McKenna wrote:

...

Thanks Daniel:

http://cr.openjdk.java.net/~robm/8139965/webrev.03/

I'm planning to follow up on the test side of things with a separate bug. I think the technique used in some of the recent SQE LDAP tests might be applicable.

-Rob

On 05/09/18 09:53, Daniel Fuchs wrote:

...
Hi Rob,

That looks better but I believe that:

1. closed should be volatile since it's read from outside synchronized block

2. it seems there might be a race where the last response received could be dropped, if the connection is closed just after it's been pulled from the queue.

So I would suggest exchanging:

115 if (isClosed()) { 116 return null; 117 } 118 119 return result;

with:

return result == EOF ? null : result;

best regards,

-- daniel

On 05/09/2018 02:05, Rob McKenna wrote:

...
Thanks for the reviews folks.

I believe the following captures your recommended changes:

http://cr.openjdk.java.net/~robm/8139965/webrev.02/

W.r.t. testing I think this area has been difficult to test traditionally. I'll have a dig through the existing testbase (and I'll get back to you) to see if there's anything similar but afaik most tests simply mimic a bindable dummy ldap server.

Vyom, are you aware of any more rigorous tests / ldap test frameworks?

-Rob

On 04/09/18 10:22, Daniel Fuchs wrote:

...
Hi Rob,

I concur with Chris. completed needs to be volatile and close() needs to set a flag and use offer like cancel().

The condition for testing for closed then becomes that the flag is set and the queue is empty or has EOF as its head.

Is there any way this could be tested by a regression test?

best regards,

-- daniel

On 04/09/2018 10:00, Chris Hegarty wrote:

...
Rob,

...
On 3 Sep 2018, at 22:48, Rob McKenna <rob.mckenna@oracle.com> wrote:

Hi folks,

I'd like to get a re-review of this change:

https://bugs.openjdk.java.net/browse/JDK-8139965 <https://bugs.openjdk.java.net/browse/JDK-8139965>

This issue is closed as `will not fix`. I presume you will re-open it before pushing.

...
http://cr.openjdk.java.net/~robm/8139965/webrev/ <http://cr.openjdk.java.net/~robm/8139965/webrev/>

43 private boolean completed; Won’t `completed` need to be volatile now? ( since the removal of synchronized from hasSearchCompleted )

LdapRequest.close puts EOF into the queue, but that is a potentially blocking operation ( while holding the lock ). If the queue is at capacity, then it will block forever. This model will only work if `getReplyBer` is always guaranteed to be running concurrently. Is it?

Please check the indentation of LdapRequest.java L103 ( in the new file ). It appears, in the webrev, that the trailing `}` is not lined up.

The indentation doesn’t look right here either. 624 if (nparent) { 625 LdapRequest ldr = pendingRequests; 626 while (ldr != null) { 627 ldr.close(); 628 ldr = ldr.next; 629 } 630 } 631 }

-Chris

Rob McKenna

2:55 p.m.

Thanks again Daniel, http://cr.openjdk.java.net/~robm/8139965/webrev.04/ -Rob On 26/10/18 09:54, Daniel Fuchs wrote:

...

Hi Rob,

Now I'm concerned about this method:

71 synchronized boolean addReplyBer(BerDecoder ber) { 72 // check the closed boolean value here as we don't want anything 73 // to be added to the queue after close() has been called. 74 if (cancelled || closed) { 75 return false; 76 } 77 78 // Add a new reply to the queue of unprocessed replies. 79 try { 80 replies.put(ber); 81 } catch (InterruptedException e) { 82 // ignore 83 } 84 85 // peek at the BER buffer to check if it is a SearchResultDone PDU 86 try { 87 ber.parseSeq(null); 88 ber.parseInt(); 89 completed = (ber.peekByte() == LdapClient.LDAP_REP_RESULT); 90 } catch (IOException e) { 91 // ignore 92 } 93 ber.reset(); 94 return pauseAfterReceipt; 95 }

getReplyBer() is not synchronized, so AFAICS there is a chance that line 93 executes after another thread as got hold of the `ber` object.

Is that an issue? Should the `ber` object be added to the reply queue only after it's been reset?

best regards,

-- daniel

On 25/10/2018 21:53, Rob McKenna wrote:

...
Thanks Daniel:

http://cr.openjdk.java.net/~robm/8139965/webrev.03/

I'm planning to follow up on the test side of things with a separate bug. I think the technique used in some of the recent SQE LDAP tests might be applicable.

-Rob

On 05/09/18 09:53, Daniel Fuchs wrote:

...
Hi Rob,

That looks better but I believe that:

1. closed should be volatile since it's read from outside synchronized block

2. it seems there might be a race where the last response received could be dropped, if the connection is closed just after it's been pulled from the queue.

So I would suggest exchanging:

115 if (isClosed()) { 116 return null; 117 } 118 119 return result;

with:

return result == EOF ? null : result;

best regards,

-- daniel

On 05/09/2018 02:05, Rob McKenna wrote:

...
Thanks for the reviews folks.

I believe the following captures your recommended changes:

http://cr.openjdk.java.net/~robm/8139965/webrev.02/

W.r.t. testing I think this area has been difficult to test traditionally. I'll have a dig through the existing testbase (and I'll get back to you) to see if there's anything similar but afaik most tests simply mimic a bindable dummy ldap server.

Vyom, are you aware of any more rigorous tests / ldap test frameworks?

-Rob

On 04/09/18 10:22, Daniel Fuchs wrote:

...
Hi Rob,

I concur with Chris. completed needs to be volatile and close() needs to set a flag and use offer like cancel().

The condition for testing for closed then becomes that the flag is set and the queue is empty or has EOF as its head.

Is there any way this could be tested by a regression test?

best regards,

-- daniel

On 04/09/2018 10:00, Chris Hegarty wrote:

...
Rob,

> On 3 Sep 2018, at 22:48, Rob McKenna <rob.mckenna@oracle.com> wrote: > > Hi folks, > > I'd like to get a re-review of this change: > > https://bugs.openjdk.java.net/browse/JDK-8139965 <https://bugs.openjdk.java.net/browse/JDK-8139965>

This issue is closed as `will not fix`. I presume you will re-open it before pushing.

> http://cr.openjdk.java.net/~robm/8139965/webrev/ <http://cr.openjdk.java.net/~robm/8139965/webrev/>

43 private boolean completed; Won’t `completed` need to be volatile now? ( since the removal of synchronized from hasSearchCompleted )

LdapRequest.close puts EOF into the queue, but that is a potentially blocking operation ( while holding the lock ). If the queue is at capacity, then it will block forever. This model will only work if `getReplyBer` is always guaranteed to be running concurrently. Is it?

Please check the indentation of LdapRequest.java L103 ( in the new file ). It appears, in the webrev, that the trailing `}` is not lined up.

The indentation doesn’t look right here either. 624 if (nparent) { 625 LdapRequest ldr = pendingRequests; 626 while (ldr != null) { 627 ldr.close(); 628 ldr = ldr.next; 629 } 630 } 631 }

-Chris

Daniel Fuchs

3:14 p.m.

Hi Rob, Looks better to me know. Though I admit that: 53 this.replies = new LinkedBlockingQueue<>(8 * replyQueueCapacity / 10); is still a bit mystifying... Why not use the full replyQueueCapacity provided? That doesn't look strictly equivalent to the highWatermark logic that you have removed. On 25/10/2018 21:53, Rob McKenna wrote:

...

I'm planning to follow up on the test side of things with a separate bug. I think the technique used in some of the recent SQE LDAP tests might be applicable.

It will be good to have a test and try to shake the implementation a bit with some repeating jobs in our test system to get some confidence that we've not harmed anything else. I admit that my only acquaintance to the JNDI/LDAP code of the JDK has been through reviews, so I'd probably only spot the obvious. best regards, -- daniel On 26/10/2018 15:55, Rob McKenna wrote:

...

Thanks again Daniel,

http://cr.openjdk.java.net/~robm/8139965/webrev.04/

-Rob

Rob McKenna

3:59 p.m.

On 26/10/18 16:14, Daniel Fuchs wrote:

...

Hi Rob,

Looks better to me know. Though I admit that:

53 this.replies = new LinkedBlockingQueue<>(8 * replyQueueCapacity / 10);

is still a bit mystifying... Why not use the full replyQueueCapacity provided? That doesn't look strictly equivalent to the highWatermark logic that you have removed.

Pavel may be able to shed more light on this part. (as it was his suggested fix originally I believe) It looks to me like the intention is to mimic the pre-fix behaviour, which only allowed the queue to fill to 80% of its capacity before pausing. I agree though, it doesn't really make sense to keep that. I'll remove it.

...

On 25/10/2018 21:53, Rob McKenna wrote:

...
I'm planning to follow up on the test side of things with a separate bug. I think the technique used in some of the recent SQE LDAP tests might be applicable.

It will be good to have a test and try to shake the implementation a bit with some repeating jobs in our test system to get some confidence that we've not harmed anything else.

You've convinced me to wait until we can get one together. I'll hold off on the push until that point. jdk-tier1-4 pass, but that may not be saying much unfortunately. -Rob

...

I admit that my only acquaintance to the JNDI/LDAP code of the JDK has been through reviews, so I'd probably only spot the obvious.

best regards,

-- daniel

On 26/10/2018 15:55, Rob McKenna wrote:

...
Thanks again Daniel,

http://cr.openjdk.java.net/~robm/8139965/webrev.04/

-Rob

2691

Age (days ago)

2744

Last active (days ago)

List overview

Download

10 comments

4 participants

participants (4)

Chris Hegarty
Daniel Fuchs
Rob McKenna
vyom tewari