RFR 8075484:SocketInputStream.socketRead0 can hang even with soTimeout set

Langer, Christoph christoph.langer at sap.com
Wed Sep 7 10:57:51 UTC 2016


Hi Vyom,

I ran an AIX build and I had to change the flag for the recv call in NET_NonBlockingRead from MSG_DONTWAIT to MSG_NONBLOCK to get it compiled. Other than that it's fine in principal.

However, I had another look about where to place which code. I suggest to put the common NET_* functions which don't have specific implementations on the various operating systems in file net_util_md.c. That way we'll avoid some code deduplication and code is located at more logical places, at least from my perspective.

Please check my webrev which is an update to your latest suggestion:
http://cr.openjdk.java.net/~clanger/webrevs/8075484-clanger/

Maybe that's a better distribution of code?

It builds on AIX - other platforms I didn't test so far.

Best regards
Christoph

> -----Original Message-----
> From: Langer, Christoph
> Sent: Mittwoch, 7. September 2016 11:20
> To: 'Vyom Tewari' <vyom.tewari at oracle.com>
> Cc: net-dev at openjdk.java.net; Chris Hegarty <chris.hegarty at oracle.com>
> Subject: RE: RFR 8075484:SocketInputStream.socketRead0 can hang even with
> soTimeout set
> 
> Hi Vyom,
> 
> I'll do an AIX build with your patch and let you know later today.
> 
> Best regards
> Christoph
> 
> > -----Original Message-----
> > From: net-dev [mailto:net-dev-bounces at openjdk.java.net] On Behalf Of
> Vyom
> > Tewari
> > Sent: Mittwoch, 7. September 2016 08:46
> > To: Chris Hegarty <chris.hegarty at oracle.com>
> > Cc: net-dev at openjdk.java.net
> > Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even with
> > soTimeout set
> >
> > Hi Chris,
> >
> > Please find the latest
> > webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.3/index.html
> > <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.3/index.html>). I
> > did some code refactoring, JPRT is in progress.
> >
> > I need help from some one to build and test latest changes on AIX, may
> > be Christoph can help.
> >
> > Thanks,
> >
> > Vyom
> >
> >
> > On Tuesday 06 September 2016 06:25 PM, Chris Hegarty wrote:
> > > Vyom,
> > >
> > >> On 6 Sep 2016, at 10:20, Vyom Tewari <vyom.tewari at oracle.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> Please find the latest
> > webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.2/index.html
> > <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.2/index.html>). I
> > have incorporated the review comments.
> > > Your changes completely avoid NET_Timeout, which is quite complex on
> > > Linux, Mac, and AIX, for various reasons. ( NET_Timeout will use the
> > > async close mechanism to signal/interrupt a thread blocked in a poll /
> > > select ). Also, select is used on Mac, which will use poll after your
> > > changes ( I do remember that there was a bug in poll on Mac around
> > > the time of the original Mac port ).
> > >
> > > Would is be better, rather than replicating the logic in  NET_Timeout,
> > > to have a version that supports passing a function pointer, to run if
> > > the poll/select returns before the timeout? Just an idea.
> > >
> > > -Chris.
> > >
> > >> Thanks,
> > >>
> > >> Vyom
> > >>
> > >>
> > >> On Monday 05 September 2016 08:37 PM, Chris Hegarty wrote:
> > >>> On 05/09/16 15:37, Mark Sheppard wrote:
> > >>>> if the desire is to avoid making double calls on gettimeofday in the
> > >>>> NET_ReadWithTimeout's while loop for its main call flow,
> > >>> Yes, the desire is to make no more calls to gettimeofday, than are
> > >>> already done.
> > >>>
> > >>>> then consider a modification to NET_Timeout and create a version
> > >>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct timeval *
> > >>>> current_time)
> > >>>>
> > >>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct timeval *
> > >>>> current_time) {
> > >>>>     int result;
> > >>>>     struct timeval t;
> > >>>>     long prevtime, newtime;
> > >>>>     struct pollfd pfd;
> > >>>>     pfd.fd = s;
> > >>>>     pfd.events = POLLIN;
> > >>>>
> > >>>>     if (timeout > 0) {
> > >>>>         prevtime = (current_time->tv_sec * 1000)  +
> > >>>> current_time->tv_usec / 1000;
> > >>>>     }
> > >>>>
> > >>>>     for(;;) {
> > >>>>         result = poll(&pfd, 1, timeout);
> > >>>>         if (result < 0 && errno == EINTR) {
> > >>>>             if (timeout > 0) {
> > >>>>                 gettimeofday(&t, NULL);
> > >>>>                 newtime = (t.tv_sec * 1000)  +  t.tv_usec /1000;
> > >>>>                 timeout -= newtime - prevtime;
> > >>>>                 if (timeout <= 0)
> > >>>>                     return 0;
> > >>>>                 prevtime = newtime;
> > >>>>             }
> > >>>>         } else {
> > >>>>             return result;
> > >>>>         }
> > >>>>     }
> > >>>> }
> > >>>>
> > >>>> the NET_ReadWithTimeout is modified with.
> > >>>>
> > >>>>    while (timeout > 0) {
> > >>>>         result = NET_TimeoutWithCurrentTime(fd, timeout, &t);
> > >>>>
> > >>>> in any case there is the potential for multiple invocation of
> > >>>> gettimeofday due to a need to make
> > >>>> poll restartable,
> > >>> Yes, and that is fine. Just no more than is already there.
> > >>>
> > >>> and adjust the originally specified timeout
> > >>>> accordingly, and for the edge case
> > >>>> after the non blocking read.
> > >>> After an initial skim, your suggestion seems reasonable.
> > >>>
> > >>> -Chris.
> > >>>
> > >>>> regards
> > >>>> Mark
> > >>>>
> > >>>>
> > >>>>
> > >>>> On 05/09/2016 12:02, Chris Hegarty wrote:
> > >>>>> Vyom,
> > >>>>>
> > >>>>>
> > webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
> > >>>>>
> > >>>>> There is some concern about the potential performance effect, etc,
> > >>>>> of gettimeofday, maybe there is a way out of this. The reuse of
> > >>>>> NET_Timeout is good, but it also calls gettimeofday. It seems that
> > >>>>> a specific NET_ReadWithTimeout could be written to NOT reuse
> > >>>>> NET_Timeout, but effectively inline its interruptible operation.
> > >>>>> Or write a variant of NET_Timeout that takes a function to
> > >>>>> execute. Rather than effectively two loops conditioned on the
> > >>>>> timeout.  Disclaimer: I have not actually tried to do this, but
> > >>>>> it seems worth considering / evaluating.
> > >>>>>
> > >>>>> -Chris.
> > >>>>>
> > >>>>> On 02/09/16 04:39, Vyom Tewari wrote:
> > >>>>>> hi Dimitry,
> > >>>>>>
> > >>>>>> thanks for review, I did consider to use  a monotonically increasing
> > >>>>>> clock like "clock_gettime" but  existing nearby code("NET_Timeout")
> > uses
> > >>>>>> "gettimeofday"  so i choose to be consistent with the existing code.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>>
> > >>>>>> Vyom
> > >>>>>>
> > >>>>>>
> > >>>>>> On Friday 02 September 2016 01:38 AM, Dmitry Samersoff wrote:
> > >>>>>>> Vyom,
> > >>>>>>>
> > >>>>>>> Did you consider to use select() to calculate timeout instead of
> > >>>>>>> gettimeofday ?
> > >>>>>>>
> > >>>>>>> gettimeofday is affected by system time changes, so running ntpd
> can
> > >>>>>>> cause unpredictable behavior of this code. Also it's rather expensive
> > >>>>>>> syscall.
> > >>>>>>>
> > >>>>>>> -Dmitry
> > >>>>>>>
> > >>>>>>> On 2016-09-01 19:03, Vyom Tewari wrote:
> > >>>>>>>> please find the updated
> > >>>>>>>>
> > webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
> > >>>>>>>>
> > >>>>>>>>
> > <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.1/index.html>).
> > >>>>>>>> I
> > >>>>>>>> incorporated the review comments.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>>
> > >>>>>>>> Vyom
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Tuesday 30 August 2016 04:11 PM, Mark Sheppard wrote:
> > >>>>>>>>> Hi
> > >>>>>>>>>    perhaps there is an opportunity to do  some refactoring here (...
> > >>>>>>>>> for me a "goto " carries a code smell! )
> > >>>>>>>>>
> > >>>>>>>>> along the lines
> > >>>>>>>>>
> > >>>>>>>>> if (timeout) {
> > >>>>>>>>>      nread =  NET_ReadWithTimeout(...);
> > >>>>>>>>> } else {
> > >>>>>>>>>       nread = NET_Read(...);
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> the NET_ReadWithTimeout (...) function will contain a
> > >>>>>>>>> restructuring of
> > >>>>>>>>> your goto loop
> > >>>>>>>>> while (_timeout > 0) { nread = NET_Timeout(fd, _timeout);
> > >>>>>>>>>            if (nread <= 0) {
> > >>>>>>>>>                if (nread == 0) {
> > >>>>>>>>>                    JNU_ThrowByName(env, JNU_JAVANETPKG
> > >>>>>>>>> "SocketTimeoutException",
> > >>>>>>>>>                                "Read timed out");
> > >>>>>>>>>                } else if (nread == -1) {
> > >>>>>>>>>                    if (errno == EBADF) {
> > >>>>>>>>>                         JNU_ThrowByName(env, JNU_JAVANETPKG
> > >>>>>>>>> "SocketException", "Socket closed");
> > >>>>>>>>>                    } else if (errno == ENOMEM) {
> > >>>>>>>>>                        JNU_ThrowOutOfMemoryError(env, "NET_Timeout
> > >>>>>>>>> native heap allocation failed");
> > >>>>>>>>>                    } else {
> > >>>>>>>>> JNU_ThrowByNameWithMessageAndLastError
> > >>>>>>>>>                            (env, JNU_JAVANETPKG "SocketException",
> > >>>>>>>>> "select/poll failed");
> > >>>>>>>>>                    }
> > >>>>>>>>>                }
> > >>>>>>>>>                  // release buffer in main call flow
> > >>>>>>>>> //              if (bufP != BUF) {
> > >>>>>>>>> //                  free(bufP);
> > >>>>>>>>> //             }
> > >>>>>>>>>               nread = -1;
> > >>>>>>>>>               break;
> > >>>>>>>>>            } else {
> > >>>>>>>>> nread = NET_NonBlockingRead(fd, bufP, len);
> > >>>>>>>>> if (nread == -1 && ((errno == EAGAIN) || (errno ==
> > EWOULDBLOCK))) {
> > >>>>>>>>>                gettimeofday(&t, NULL);
> > >>>>>>>>> newtime = t.tv_sec * 1000 + t.tv_usec / 1000;
> > >>>>>>>>> _timeout -= newtime - prevtime;
> > >>>>>>>>> if(_timeout > 0){
> > >>>>>>>>> prevtime = newtime;
> > >>>>>>>>>                    }
> > >>>>>>>>> } else { break; } } } return nread;
> > >>>>>>>>>
> > >>>>>>>>> e&oe
> > >>>>>>>>>
> > >>>>>>>>> regards
> > >>>>>>>>> Mark
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On 29/08/2016 10:58, Vyom Tewari wrote:
> > >>>>>>>>>> gentle reminder, please review the below code change.
> > >>>>>>>>>>
> > >>>>>>>>>> Vyom
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Monday 22 August 2016 05:12 PM, Vyom Tewari wrote:
> > >>>>>>>>>>> Hi All,
> > >>>>>>>>>>>
> > >>>>>>>>>>> Please review the code changes for below issue.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Bug         : https://bugs.openjdk.java.net/browse/JDK-8075484
> > >>>>>>>>>>>
> > >>>>>>>>>>> webrev    :
> > >>>>>>>>>>>
> > http://cr.openjdk.java.net/~vtewari/8075484/webrev0.0/index.html
> > >>>>>>>>>>>
> > <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.0/index.html>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> This issue is SocketInputStream.socketread0() hangs even with
> > >>>>>>>>>>> "soTimeout" set.the implementation of
> > >>>>>>>>>>> Java_java_net_SocketInputStream_socketRead0 assumes that
> > read()
> > >>>>>>>>>>> won't block after poll() reports that a read is possible.
> > >>>>>>>>>>>
> > >>>>>>>>>>> This assumption does not hold, as noted on the man page for
> > select
> > >>>>>>>>>>> (referenced by the man page for poll): Under Linux, select()
> may
> > >>>>>>>>>>> report a socket file descriptor as "ready for reading", while
> > >>>>>>>>>>> nevertheless a subsequent read blocks. This could for example
> > >>>>>>>>>>> happen
> > >>>>>>>>>>> when data has arrived but upon examination has wrong
> > checksum
> > >>>>>>>>>>> and is
> > >>>>>>>>>>> discarded.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks,
> > >>>>>>>>>>>
> > >>>>>>>>>>> Vyom
> > >>>>>>>>>>>
> > >>>>>>>>>>>



More information about the net-dev mailing list