adding rsockets support into JDK

Lu, Yingqi yingqi.lu at intel.com
Thu Dec 13 20:02:47 UTC 2018


Chris,

I will take a look at the new test and also try it with version 25 as well.

Thanks,
Lucy

From: Chris Hegarty [mailto:chris.hegarty at oracle.com]
Sent: Thursday, December 13, 2018 11:57 AM
To: Lu, Yingqi <yingqi.lu at intel.com>
Cc: Aundhe, Shirish <shirish.aundhe at intel.com>; nio-dev at openjdk.java.net; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Kaczmarek, Eric <eric.kaczmarek at intel.com>
Subject: Re: adding rsockets support into JDK

Lucy,

I will take a look at this latest version.

In the meantime, I’ve put together a test that exercises a number of
connecting scenarios exercising both blocking and non-blocking accept,
connect, and I/O operations. Twenty four scenarios in total.

The test can be configured to either use either RDMA sockets or regular
TCP sockets ( there is a static final field useRDMA ). The test passes
reliably using TCP sockets, not so with RDMA ( I have yet to test webrev
25, but I will do it ).

http://cr.openjdk.java.net/~chegar/rsocket/IOExchanges/

The test uses testng as it provides a number of useful mechanisms that
allow the writing of reasonably self contained and compact code. The
test scenarios are described within, and while there is a small amount
of duplication, it is reasonable to maintain readability.

-Chris.


On 13 Dec 2018, at 17:59, Lu, Yingqi <yingqi.lu at intel.com<mailto:yingqi.lu at intel.com>> wrote:

Hi Chris/Alan,

Here is the version 25 of the patch: http://cr.openjdk.java.net/~ylu/8195160.25/

In this version, I have modified following items:

1. I applied Javadoc wording change suggestions from Brian http://cr.openjdk.java.net/~bpb/8195160/webrev-22-delta/

2. I applied the suggested changes from Chris
http://cr.openjdk.java.net/~chegar/rsocket/webrev.23.1/
https://cr.openjdk.java.net/~chegar/rsocket/webrev.23.2/
http://cr.openjdk.java.net/~chegar/rsocket/webrev.23.3/
https://cr.openjdk.java.net/~chegar/rsocket/webrev.24.1/

3. I made changes to address the issue #1 (non-blocking connect/blocking accept). Instead of making changes inside RdmaSocketChannelImpl.java, I think it might be easier to directly change the native implementation at src/jdk.net/linux/native/libextnet/LinuxRdmaSocketDispatcherImpl.c

Please let me know your feedback. At the same time, I am working on addressing issue #2.

Thanks,
Lucy


-----Original Message-----
From: nio-dev [mailto:nio-dev-bounces at openjdk.java.net] On Behalf Of Lu,
Yingqi
Sent: Tuesday, December 11, 2018 12:05 PM
To: Chris Hegarty <chris.hegarty at oracle.com<mailto:chris.hegarty at oracle.com>>
Cc: Aundhe, Shirish <shirish.aundhe at intel.com<mailto:shirish.aundhe at intel.com>>; nio-dev at openjdk.java.net<mailto:nio-dev at openjdk.java.net>;
Viswanathan, Sandhya <sandhya.viswanathan at intel.com<mailto:sandhya.viswanathan at intel.com>>; Kaczmarek, Eric
<eric.kaczmarek at intel.com<mailto:eric.kaczmarek at intel.com>>
Subject: RE: adding rsockets support into JDK

Hi Chris,

On issue #1:
Q: can rwrite return EAGAIN? I have not checked yet.
I checked with the testing app. Yes, it can return EAGAIN when the resource is
not available. In this case, we need to do an rpoll on POLLOUT. I can quickly try
a patch to address it.

On issue #2:
I discussed if we need to poll on the unfinished accepts with kernel RDMA
developer yesterday. The answer I got was we do not really need to (we can
of course). Reason is that connect side is the initiator of the handshake. To
start a connection (rconnect being called), it sends out a "connection request
message" to the accept side. If the message has not been sent over, raccept
will simply return. If the message has arrived, rdma_accept be called and
"connection response message" will be sent to connection side. The connect
side will read the message and then send "connection ready message" to the
accept side. rconnect only sends the first message, that is why we need a
thread on the connect side to make sure the rest of steps are being taken
care of.

Thanks,
Lucy


-----Original Message-----
From: Chris Hegarty [mailto:chris.hegarty at oracle.com]
Sent: Tuesday, December 11, 2018 11:31 AM
To: Lu, Yingqi <yingqi.lu at intel.com<mailto:yingqi.lu at intel.com>>
Cc: Alan Bateman <Alan.Bateman at oracle.com<mailto:Alan.Bateman at oracle.com>>; nio-dev at openjdk.java.net<mailto:nio-dev at openjdk.java.net>;
Viswanathan, Sandhya <sandhya.viswanathan at intel.com<mailto:sandhya.viswanathan at intel.com>>; Aundhe, Shirish
<shirish.aundhe at intel.com<mailto:shirish.aundhe at intel.com>>; Kaczmarek, Eric <eric.kaczmarek at intel.com<mailto:eric.kaczmarek at intel.com>>
Subject: Re: adding rsockets support into JDK

Lucy,

Sure, the small test scenarios can be modified to make them "work". The
bigger question is how the proposed JDK-RDMA implementation code can
be

modified to provide the semantics of the `SocketChannel` API.

Issue #1: rread returns EAGAIN. One possible solution could be that the
blocking code path in RdmaSocketChannelImpl could fallback back into a
blocking rpoll POLLOUT if the IOUtil.read method invocation returns
IOStatus.UNAVAILABLE. I think this should work, and not have too much
of a negative impact since the fallback will only occur infrequently.
Q: can rwrite return EAGAIN? I have not checked yet.

Issue #2: This issue is likely to be encountered mainly during testing,
since a non-blocking connect followed by an accept, on the same thread,
is not all the common in non-test code. That said, the semantics of the
SocketChannel API would lead one to expect it to work. ( I get that
rsocket is not asynchronous, but the semantics of non-blocking channels
implies some asynchronousity ).  I wonder if the JDK-RDMA
implementation should have a dedicated thread that "pulls" on
unfinished non-blocking connects that are not subsequently registered with
a Selector? Maybe accepts too? I'm not sure yet.


-Chris.


On 11 Dec 2018, at 00:36, Lu, Yingqi <yingqi.lu at intel.com<mailto:yingqi.lu at intel.com>> wrote:

Hi Alan/Chris,

I was able to confirm that connecting on non-blocking socket causes issues.
It happens when connect/accept occurs in the same thread or different
threads in the same process.


Then, I did a small tweak in Chris's sample application by spawning a
thread
doing rpoll on the connection_fd. Now, the connect/accept works in both
of the cases above. Please let me know if this is a valid workaround for the
issue.


Performance wise, this workaround should not impact send/receive at
all. It
might only add a small overhead to the connection setup phase only with
non- blocking RDMA socket.


The modified app code is available at

For connect/accept occur in the same thread:
https://cr.openjdk.java.net/~ylu/testNonBlocking_raccept_modified.c

For connect/accept occur in two different threads:
https://cr.openjdk.java.net/~ylu/testNonBlocking_raccept_modified_2th
r
eads.c

Thanks,
Lucy


-----Original Message-----
From: Alan Bateman [mailto:Alan.Bateman at oracle.com]
Sent: Saturday, December 8, 2018 8:10 AM
To: Chris Hegarty <chris.hegarty at oracle.com<mailto:chris.hegarty at oracle.com>>; Lu, Yingqi
<yingqi.lu at intel.com<mailto:yingqi.lu at intel.com>>; nio-dev at openjdk.java.net<mailto:nio-dev at openjdk.java.net>
Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com<mailto:sandhya.viswanathan at intel.com>>; Aundhe,
Shirish <shirish.aundhe at intel.com<mailto:shirish.aundhe at intel.com>>; Kaczmarek, Eric
<eric.kaczmarek at intel.com<mailto:eric.kaczmarek at intel.com>>
Subject: Re: adding rsockets support into JDK

On 08/12/2018 09:39, Chris Hegarty wrote:

:

- It has become apparent that mixing blocking and non-blocking
 connect/accept operations, in the same thread, may cause issues. For
 example, attempting to setup a connected-socket on the same host by
 issuing a non-blocking connect followed by a blocking accept, will
 just hang and not make progress [3]. Upon further enquiries it
appears
 that the programming model for rsocket is a subtly different than
that
 of the regular Berkeley sockets ( at least for the connection
 handshake ). It is not immediately clear how to reasonably
workaround
 this issue ( it's not a bug in rdma-core, but more a fundamental
part
 of its thread-less operation ).
Would it be possible to expand on this to say whether the same
issues arises when the non-blocking connect is initiated on a
different thread, or in a different process, or even a different machine on
the fabric.

That is, if the socket is non-blocking and I do a rconnect and then
delay before doing anything else on the socket then will the peer
doing accept be blocked/hung in the mean-time?

-Alan


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20181213/271cd0ab/attachment-0001.html>


More information about the nio-dev mailing list