adding rsockets support into JDK

Lu, Yingqi yingqi.lu at intel.com
Thu Dec 13 17:59:23 UTC 2018


Hi Chris/Alan,

Here is the version 25 of the patch: http://cr.openjdk.java.net/~ylu/8195160.25/

In this version, I have modified following items:

1. I applied Javadoc wording change suggestions from Brian http://cr.openjdk.java.net/~bpb/8195160/webrev-22-delta/

2. I applied the suggested changes from Chris
http://cr.openjdk.java.net/~chegar/rsocket/webrev.23.1/
https://cr.openjdk.java.net/~chegar/rsocket/webrev.23.2/
http://cr.openjdk.java.net/~chegar/rsocket/webrev.23.3/
https://cr.openjdk.java.net/~chegar/rsocket/webrev.24.1/

3. I made changes to address the issue #1 (non-blocking connect/blocking accept). Instead of making changes inside RdmaSocketChannelImpl.java, I think it might be easier to directly change the native implementation at src/jdk.net/linux/native/libextnet/LinuxRdmaSocketDispatcherImpl.c

Please let me know your feedback. At the same time, I am working on addressing issue #2. 

Thanks,
Lucy

>-----Original Message-----
>From: nio-dev [mailto:nio-dev-bounces at openjdk.java.net] On Behalf Of Lu,
>Yingqi
>Sent: Tuesday, December 11, 2018 12:05 PM
>To: Chris Hegarty <chris.hegarty at oracle.com>
>Cc: Aundhe, Shirish <shirish.aundhe at intel.com>; nio-dev at openjdk.java.net;
>Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Kaczmarek, Eric
><eric.kaczmarek at intel.com>
>Subject: RE: adding rsockets support into JDK
>
>Hi Chris,
>
>On issue #1:
>Q: can rwrite return EAGAIN? I have not checked yet.
>I checked with the testing app. Yes, it can return EAGAIN when the resource is
>not available. In this case, we need to do an rpoll on POLLOUT. I can quickly try
>a patch to address it.
>
>On issue #2:
>I discussed if we need to poll on the unfinished accepts with kernel RDMA
>developer yesterday. The answer I got was we do not really need to (we can
>of course). Reason is that connect side is the initiator of the handshake. To
>start a connection (rconnect being called), it sends out a "connection request
>message" to the accept side. If the message has not been sent over, raccept
>will simply return. If the message has arrived, rdma_accept be called and
>"connection response message" will be sent to connection side. The connect
>side will read the message and then send "connection ready message" to the
>accept side. rconnect only sends the first message, that is why we need a
>thread on the connect side to make sure the rest of steps are being taken
>care of.
>
>Thanks,
>Lucy
>
>>-----Original Message-----
>>From: Chris Hegarty [mailto:chris.hegarty at oracle.com]
>>Sent: Tuesday, December 11, 2018 11:31 AM
>>To: Lu, Yingqi <yingqi.lu at intel.com>
>>Cc: Alan Bateman <Alan.Bateman at oracle.com>; nio-dev at openjdk.java.net;
>>Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Aundhe, Shirish
>><shirish.aundhe at intel.com>; Kaczmarek, Eric <eric.kaczmarek at intel.com>
>>Subject: Re: adding rsockets support into JDK
>>
>>Lucy,
>>
>>Sure, the small test scenarios can be modified to make them "work". The
>>bigger question is how the proposed JDK-RDMA implementation code can
>be
>>modified to provide the semantics of the `SocketChannel` API.
>>
>>Issue #1: rread returns EAGAIN. One possible solution could be that the
>>blocking code path in RdmaSocketChannelImpl could fallback back into a
>>blocking rpoll POLLOUT if the IOUtil.read method invocation returns
>>IOStatus.UNAVAILABLE. I think this should work, and not have too much
>>of a negative impact since the fallback will only occur infrequently.
>>  Q: can rwrite return EAGAIN? I have not checked yet.
>>
>>Issue #2: This issue is likely to be encountered mainly during testing,
>>since a non-blocking connect followed by an accept, on the same thread,
>>is not all the common in non-test code. That said, the semantics of the
>>SocketChannel API would lead one to expect it to work. ( I get that
>>rsocket is not asynchronous, but the semantics of non-blocking channels
>>implies some asynchronousity ).  I wonder if the JDK-RDMA
>>implementation should have a dedicated thread that "pulls" on
>>unfinished non-blocking connects that are not subsequently registered with
>a Selector? Maybe accepts too? I'm not sure yet.
>>
>>-Chris.
>>
>>> On 11 Dec 2018, at 00:36, Lu, Yingqi <yingqi.lu at intel.com> wrote:
>>>
>>> Hi Alan/Chris,
>>>
>>> I was able to confirm that connecting on non-blocking socket causes issues.
>>It happens when connect/accept occurs in the same thread or different
>>threads in the same process.
>>>
>>> Then, I did a small tweak in Chris's sample application by spawning a
>>> thread
>>doing rpoll on the connection_fd. Now, the connect/accept works in both
>>of the cases above. Please let me know if this is a valid workaround for the
>issue.
>>>
>>> Performance wise, this workaround should not impact send/receive at
>>> all. It
>>might only add a small overhead to the connection setup phase only with
>>non- blocking RDMA socket.
>>>
>>> The modified app code is available at
>>>
>>> For connect/accept occur in the same thread:
>>> https://cr.openjdk.java.net/~ylu/testNonBlocking_raccept_modified.c
>>>
>>> For connect/accept occur in two different threads:
>>> https://cr.openjdk.java.net/~ylu/testNonBlocking_raccept_modified_2th
>>> r
>>> eads.c
>>>
>>> Thanks,
>>> Lucy
>>>
>>>> -----Original Message-----
>>>> From: Alan Bateman [mailto:Alan.Bateman at oracle.com]
>>>> Sent: Saturday, December 8, 2018 8:10 AM
>>>> To: Chris Hegarty <chris.hegarty at oracle.com>; Lu, Yingqi
>>>> <yingqi.lu at intel.com>; nio-dev at openjdk.java.net
>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Aundhe,
>>>> Shirish <shirish.aundhe at intel.com>; Kaczmarek, Eric
>>>> <eric.kaczmarek at intel.com>
>>>> Subject: Re: adding rsockets support into JDK
>>>>
>>>> On 08/12/2018 09:39, Chris Hegarty wrote:
>>>>> :
>>>>>
>>>>> - It has become apparent that mixing blocking and non-blocking
>>>>>   connect/accept operations, in the same thread, may cause issues. For
>>>>>   example, attempting to setup a connected-socket on the same host by
>>>>>   issuing a non-blocking connect followed by a blocking accept, will
>>>>>   just hang and not make progress [3]. Upon further enquiries it
>>>>> appears
>>>>>   that the programming model for rsocket is a subtly different than
>>>>> that
>>>>>   of the regular Berkeley sockets ( at least for the connection
>>>>>   handshake ). It is not immediately clear how to reasonably
>>>>> workaround
>>>>>   this issue ( it's not a bug in rdma-core, but more a fundamental
>>>>> part
>>>>>   of its thread-less operation ).
>>>>>
>>>> Would it be possible to expand on this to say whether the same
>>>> issues arises when the non-blocking connect is initiated on a
>>>> different thread, or in a different process, or even a different machine on
>the fabric.
>>>> That is, if the socket is non-blocking and I do a rconnect and then
>>>> delay before doing anything else on the socket then will the peer
>>>> doing accept be blocked/hung in the mean-time?
>>>>
>>>> -Alan



More information about the nio-dev mailing list