adding rsockets support into JDK
Chris Hegarty
chris.hegarty at oracle.com
Thu Dec 13 19:56:52 UTC 2018
Lucy,
I will take a look at this latest version.
In the meantime, I’ve put together a test that exercises a number of
connecting scenarios exercising both blocking and non-blocking accept,
connect, and I/O operations. Twenty four scenarios in total.
The test can be configured to either use either RDMA sockets or regular
TCP sockets ( there is a static final field useRDMA ). The test passes
reliably using TCP sockets, not so with RDMA ( I have yet to test webrev
25, but I will do it ).
http://cr.openjdk.java.net/~chegar/rsocket/IOExchanges/ <http://cr.openjdk.java.net/~chegar/rsocket/IOExchanges/>
The test uses testng as it provides a number of useful mechanisms that
allow the writing of reasonably self contained and compact code. The
test scenarios are described within, and while there is a small amount
of duplication, it is reasonable to maintain readability.
-Chris.
> On 13 Dec 2018, at 17:59, Lu, Yingqi <yingqi.lu at intel.com> wrote:
>
> Hi Chris/Alan,
>
> Here is the version 25 of the patch: http://cr.openjdk.java.net/~ylu/8195160.25/
>
> In this version, I have modified following items:
>
> 1. I applied Javadoc wording change suggestions from Brian http://cr.openjdk.java.net/~bpb/8195160/webrev-22-delta/
>
> 2. I applied the suggested changes from Chris
> http://cr.openjdk.java.net/~chegar/rsocket/webrev.23.1/
> https://cr.openjdk.java.net/~chegar/rsocket/webrev.23.2/
> http://cr.openjdk.java.net/~chegar/rsocket/webrev.23.3/
> https://cr.openjdk.java.net/~chegar/rsocket/webrev.24.1/
>
> 3. I made changes to address the issue #1 (non-blocking connect/blocking accept). Instead of making changes inside RdmaSocketChannelImpl.java, I think it might be easier to directly change the native implementation at src/jdk.net/linux/native/libextnet/LinuxRdmaSocketDispatcherImpl.c
>
> Please let me know your feedback. At the same time, I am working on addressing issue #2.
>
> Thanks,
> Lucy
>
>> -----Original Message-----
>> From: nio-dev [mailto:nio-dev-bounces at openjdk.java.net] On Behalf Of Lu,
>> Yingqi
>> Sent: Tuesday, December 11, 2018 12:05 PM
>> To: Chris Hegarty <chris.hegarty at oracle.com>
>> Cc: Aundhe, Shirish <shirish.aundhe at intel.com>; nio-dev at openjdk.java.net;
>> Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Kaczmarek, Eric
>> <eric.kaczmarek at intel.com>
>> Subject: RE: adding rsockets support into JDK
>>
>> Hi Chris,
>>
>> On issue #1:
>> Q: can rwrite return EAGAIN? I have not checked yet.
>> I checked with the testing app. Yes, it can return EAGAIN when the resource is
>> not available. In this case, we need to do an rpoll on POLLOUT. I can quickly try
>> a patch to address it.
>>
>> On issue #2:
>> I discussed if we need to poll on the unfinished accepts with kernel RDMA
>> developer yesterday. The answer I got was we do not really need to (we can
>> of course). Reason is that connect side is the initiator of the handshake. To
>> start a connection (rconnect being called), it sends out a "connection request
>> message" to the accept side. If the message has not been sent over, raccept
>> will simply return. If the message has arrived, rdma_accept be called and
>> "connection response message" will be sent to connection side. The connect
>> side will read the message and then send "connection ready message" to the
>> accept side. rconnect only sends the first message, that is why we need a
>> thread on the connect side to make sure the rest of steps are being taken
>> care of.
>>
>> Thanks,
>> Lucy
>>
>>> -----Original Message-----
>>> From: Chris Hegarty [mailto:chris.hegarty at oracle.com]
>>> Sent: Tuesday, December 11, 2018 11:31 AM
>>> To: Lu, Yingqi <yingqi.lu at intel.com>
>>> Cc: Alan Bateman <Alan.Bateman at oracle.com>; nio-dev at openjdk.java.net;
>>> Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Aundhe, Shirish
>>> <shirish.aundhe at intel.com>; Kaczmarek, Eric <eric.kaczmarek at intel.com>
>>> Subject: Re: adding rsockets support into JDK
>>>
>>> Lucy,
>>>
>>> Sure, the small test scenarios can be modified to make them "work". The
>>> bigger question is how the proposed JDK-RDMA implementation code can
>> be
>>> modified to provide the semantics of the `SocketChannel` API.
>>>
>>> Issue #1: rread returns EAGAIN. One possible solution could be that the
>>> blocking code path in RdmaSocketChannelImpl could fallback back into a
>>> blocking rpoll POLLOUT if the IOUtil.read method invocation returns
>>> IOStatus.UNAVAILABLE. I think this should work, and not have too much
>>> of a negative impact since the fallback will only occur infrequently.
>>> Q: can rwrite return EAGAIN? I have not checked yet.
>>>
>>> Issue #2: This issue is likely to be encountered mainly during testing,
>>> since a non-blocking connect followed by an accept, on the same thread,
>>> is not all the common in non-test code. That said, the semantics of the
>>> SocketChannel API would lead one to expect it to work. ( I get that
>>> rsocket is not asynchronous, but the semantics of non-blocking channels
>>> implies some asynchronousity ). I wonder if the JDK-RDMA
>>> implementation should have a dedicated thread that "pulls" on
>>> unfinished non-blocking connects that are not subsequently registered with
>> a Selector? Maybe accepts too? I'm not sure yet.
>>>
>>> -Chris.
>>>
>>>> On 11 Dec 2018, at 00:36, Lu, Yingqi <yingqi.lu at intel.com> wrote:
>>>>
>>>> Hi Alan/Chris,
>>>>
>>>> I was able to confirm that connecting on non-blocking socket causes issues.
>>> It happens when connect/accept occurs in the same thread or different
>>> threads in the same process.
>>>>
>>>> Then, I did a small tweak in Chris's sample application by spawning a
>>>> thread
>>> doing rpoll on the connection_fd. Now, the connect/accept works in both
>>> of the cases above. Please let me know if this is a valid workaround for the
>> issue.
>>>>
>>>> Performance wise, this workaround should not impact send/receive at
>>>> all. It
>>> might only add a small overhead to the connection setup phase only with
>>> non- blocking RDMA socket.
>>>>
>>>> The modified app code is available at
>>>>
>>>> For connect/accept occur in the same thread:
>>>> https://cr.openjdk.java.net/~ylu/testNonBlocking_raccept_modified.c
>>>>
>>>> For connect/accept occur in two different threads:
>>>> https://cr.openjdk.java.net/~ylu/testNonBlocking_raccept_modified_2th
>>>> r
>>>> eads.c
>>>>
>>>> Thanks,
>>>> Lucy
>>>>
>>>>> -----Original Message-----
>>>>> From: Alan Bateman [mailto:Alan.Bateman at oracle.com]
>>>>> Sent: Saturday, December 8, 2018 8:10 AM
>>>>> To: Chris Hegarty <chris.hegarty at oracle.com>; Lu, Yingqi
>>>>> <yingqi.lu at intel.com>; nio-dev at openjdk.java.net
>>>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Aundhe,
>>>>> Shirish <shirish.aundhe at intel.com>; Kaczmarek, Eric
>>>>> <eric.kaczmarek at intel.com>
>>>>> Subject: Re: adding rsockets support into JDK
>>>>>
>>>>> On 08/12/2018 09:39, Chris Hegarty wrote:
>>>>>> :
>>>>>>
>>>>>> - It has become apparent that mixing blocking and non-blocking
>>>>>> connect/accept operations, in the same thread, may cause issues. For
>>>>>> example, attempting to setup a connected-socket on the same host by
>>>>>> issuing a non-blocking connect followed by a blocking accept, will
>>>>>> just hang and not make progress [3]. Upon further enquiries it
>>>>>> appears
>>>>>> that the programming model for rsocket is a subtly different than
>>>>>> that
>>>>>> of the regular Berkeley sockets ( at least for the connection
>>>>>> handshake ). It is not immediately clear how to reasonably
>>>>>> workaround
>>>>>> this issue ( it's not a bug in rdma-core, but more a fundamental
>>>>>> part
>>>>>> of its thread-less operation ).
>>>>>>
>>>>> Would it be possible to expand on this to say whether the same
>>>>> issues arises when the non-blocking connect is initiated on a
>>>>> different thread, or in a different process, or even a different machine on
>> the fabric.
>>>>> That is, if the socket is non-blocking and I do a rconnect and then
>>>>> delay before doing anything else on the socket then will the peer
>>>>> doing accept be blocked/hung in the mean-time?
>>>>>
>>>>> -Alan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20181213/7221ec64/attachment-0001.html>
More information about the nio-dev
mailing list