adding rsockets support into JDK
Lu, Yingqi
yingqi.lu at intel.com
Tue Dec 11 20:05:13 UTC 2018
Hi Chris,
On issue #1:
Q: can rwrite return EAGAIN? I have not checked yet.
I checked with the testing app. Yes, it can return EAGAIN when the resource is not available. In this case, we need to do an rpoll on POLLOUT. I can quickly try a patch to address it.
On issue #2:
I discussed if we need to poll on the unfinished accepts with kernel RDMA developer yesterday. The answer I got was we do not really need to (we can of course). Reason is that connect side is the initiator of the handshake. To start a connection (rconnect being called), it sends out a "connection request message" to the accept side. If the message has not been sent over, raccept will simply return. If the message has arrived, rdma_accept be called and "connection response message" will be sent to connection side. The connect side will read the message and then send "connection ready message" to the accept side. rconnect only sends the first message, that is why we need a thread on the connect side to make sure the rest of steps are being taken care of.
Thanks,
Lucy
>-----Original Message-----
>From: Chris Hegarty [mailto:chris.hegarty at oracle.com]
>Sent: Tuesday, December 11, 2018 11:31 AM
>To: Lu, Yingqi <yingqi.lu at intel.com>
>Cc: Alan Bateman <Alan.Bateman at oracle.com>; nio-dev at openjdk.java.net;
>Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Aundhe, Shirish
><shirish.aundhe at intel.com>; Kaczmarek, Eric <eric.kaczmarek at intel.com>
>Subject: Re: adding rsockets support into JDK
>
>Lucy,
>
>Sure, the small test scenarios can be modified to make them "work". The
>bigger question is how the proposed JDK-RDMA implementation code can be
>modified to provide the semantics of the `SocketChannel` API.
>
>Issue #1: rread returns EAGAIN. One possible solution could be that the
>blocking code path in RdmaSocketChannelImpl could fallback back into a
>blocking rpoll POLLOUT if the IOUtil.read method invocation returns
>IOStatus.UNAVAILABLE. I think this should work, and not have too much of a
>negative impact since the fallback will only occur infrequently.
> Q: can rwrite return EAGAIN? I have not checked yet.
>
>Issue #2: This issue is likely to be encountered mainly during testing, since a
>non-blocking connect followed by an accept, on the same thread, is not all the
>common in non-test code. That said, the semantics of the SocketChannel API
>would lead one to expect it to work. ( I get that rsocket is not asynchronous,
>but the semantics of non-blocking channels implies some asynchronousity ). I
>wonder if the JDK-RDMA implementation should have a dedicated thread that
>"pulls" on unfinished non-blocking connects that are not subsequently
>registered with a Selector? Maybe accepts too? I'm not sure yet.
>
>-Chris.
>
>> On 11 Dec 2018, at 00:36, Lu, Yingqi <yingqi.lu at intel.com> wrote:
>>
>> Hi Alan/Chris,
>>
>> I was able to confirm that connecting on non-blocking socket causes issues.
>It happens when connect/accept occurs in the same thread or different
>threads in the same process.
>>
>> Then, I did a small tweak in Chris's sample application by spawning a thread
>doing rpoll on the connection_fd. Now, the connect/accept works in both of
>the cases above. Please let me know if this is a valid workaround for the issue.
>>
>> Performance wise, this workaround should not impact send/receive at all. It
>might only add a small overhead to the connection setup phase only with non-
>blocking RDMA socket.
>>
>> The modified app code is available at
>>
>> For connect/accept occur in the same thread:
>> https://cr.openjdk.java.net/~ylu/testNonBlocking_raccept_modified.c
>>
>> For connect/accept occur in two different threads:
>> https://cr.openjdk.java.net/~ylu/testNonBlocking_raccept_modified_2thr
>> eads.c
>>
>> Thanks,
>> Lucy
>>
>>> -----Original Message-----
>>> From: Alan Bateman [mailto:Alan.Bateman at oracle.com]
>>> Sent: Saturday, December 8, 2018 8:10 AM
>>> To: Chris Hegarty <chris.hegarty at oracle.com>; Lu, Yingqi
>>> <yingqi.lu at intel.com>; nio-dev at openjdk.java.net
>>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Aundhe,
>>> Shirish <shirish.aundhe at intel.com>; Kaczmarek, Eric
>>> <eric.kaczmarek at intel.com>
>>> Subject: Re: adding rsockets support into JDK
>>>
>>> On 08/12/2018 09:39, Chris Hegarty wrote:
>>>> :
>>>>
>>>> - It has become apparent that mixing blocking and non-blocking
>>>> connect/accept operations, in the same thread, may cause issues. For
>>>> example, attempting to setup a connected-socket on the same host by
>>>> issuing a non-blocking connect followed by a blocking accept, will
>>>> just hang and not make progress [3]. Upon further enquiries it
>>>> appears
>>>> that the programming model for rsocket is a subtly different than
>>>> that
>>>> of the regular Berkeley sockets ( at least for the connection
>>>> handshake ). It is not immediately clear how to reasonably
>>>> workaround
>>>> this issue ( it's not a bug in rdma-core, but more a fundamental
>>>> part
>>>> of its thread-less operation ).
>>>>
>>> Would it be possible to expand on this to say whether the same issues
>>> arises when the non-blocking connect is initiated on a different
>>> thread, or in a different process, or even a different machine on the fabric.
>>> That is, if the socket is non-blocking and I do a rconnect and then
>>> delay before doing anything else on the socket then will the peer
>>> doing accept be blocked/hung in the mean-time?
>>>
>>> -Alan
More information about the nio-dev
mailing list