RFR 8075484:SocketInputStream.socketRead0 can hang even with soTimeout set

Langer, Christoph christoph.langer at sap.com
Thu Sep 8 09:20:07 UTC 2016


Hi Vyom,

this looks good.

Is there any particular reason why NET_ReadWithTimeout should remain in SocketInputStream.c and not also move to net_util_md.c? That way you could also have a "static long getCurrentTime()" inside net_util_md.c, instead of an exported NET_GetCurrentTime().

And no "#include <sys/time.h>" would be needed in SocketInputStream.c - maybe not even now as it is.

Best regards
Christoph

> -----Original Message-----
> From: Vyom Tewari [mailto:vyom.tewari at oracle.com]
> Sent: Donnerstag, 8. September 2016 05:20
> To: Langer, Christoph <christoph.langer at sap.com>
> Cc: net-dev at openjdk.java.net
> Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even with
> soTimeout set
> 
> Hi All,
> 
> Please find the latest
> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.5/index.html
> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.5/index.html>).
> 
> I fixed the AIX flag issue and move some come common code to
> "net_util_md.c" file.
> 
> Thanks,
> Vyom
> 
> On 9/8/2016 12:32 PM, Langer, Christoph wrote:
> > Hi Vyom,
> >
> > ok, thanks. I have one addition to my proposal: I don't think there's a need for
> a global NET_GetCurrentTime function. This one should probably be a static
> function inside net_util_md.c (static long getCurrentTime()).
> >
> > Best
> > Christoph
> >
> >> -----Original Message-----
> >> From: Vyom Tewari [mailto:vyom.tewari at oracle.com]
> >> Sent: Mittwoch, 7. September 2016 18:31
> >> To: Langer, Christoph <christoph.langer at sap.com>
> >> Cc: net-dev at openjdk.java.net; Chris Hegarty <chris.hegarty at oracle.com>
> >> Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even
> with
> >> soTimeout set
> >>
> >> Hi Chris,
> >>
> >> Sorry I was mostly focusing on our test  failure first, i will check
> >> your suggestions.
> >>
> >> Thanks,
> >>
> >> Vyom
> >>
> >>
> >> On Wednesday 07 September 2016 08:44 PM, Langer, Christoph wrote:
> >>> Hi Vyom,
> >>>
> >>> did you have a look at my suggestions regarding AIX and refactoring? I
> can't
> >> find none of it in the new webrev nor did you comment on it.
> >>> Best regards
> >>> Christoph
> >>>
> >>>> -----Original Message-----
> >>>> From: net-dev [mailto:net-dev-bounces at openjdk.java.net] On Behalf Of
> >> Vyom
> >>>> Tewari
> >>>> Sent: Mittwoch, 7. September 2016 17:10
> >>>> To: Chris Hegarty <chris.hegarty at oracle.com>
> >>>> Cc: net-dev at openjdk.java.net
> >>>> Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even
> >> with
> >>>> soTimeout set
> >>>>
> >>>> Hi All,
> >>>>
> >>>> Please find the latest
> >>>>
> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.4/index.html
> >>>>
> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.4/index.html>).
> >>>> there were some test failures in JPRT and Chris also pointed the same
> >>>> issue in modified code. Now with latest code JPRT is clean.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Vyom
> >>>>
> >>>>
> >>>> On Wednesday 07 September 2016 03:18 PM, Chris Hegarty wrote:
> >>>>> On 07/09/16 07:45, Vyom Tewari wrote:
> >>>>>> Hi Chris,
> >>>>>>
> >>>>>> Please find the latest
> >>>>>>
> >> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.3/index.html
> >> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.3/index.html>). I
> >>>>>> did some code refactoring, JPRT is in progress.
> >>>>> In terms of NET_Timeout**, I'm happy with this version, but I
> >>>>> have an additional comment.
> >>>>>
> >>>>> If NET_ReadWithTimeout returns -1 because
> >> NET_TimeoutWithCurrentTime
> >>>>> returns <= 0, then a JNI pending exception will be set. So the code
> >>>>> calling NET_ReadWithTimeout should, a) free bufP, and b) return -1,
> >>>>> immediately rather than continuing and possibly trying to set another
> >>>>> JNI pending exception.
> >>>>>
> >>>>> One option is to check the return value from NET_ReadWithTimeout and
> >>>>> also (*env)->ExceptionCheck(env). Another is to possibly consolidate
> >>>>> the setting of JNI pending exceptions in one place, towards the end
> >>>>> of Java_java_net_SocketInputStream_socketRead0, for example EBADF
> is
> >>>>> handled in two places.
> >>>>>
> >>>>> -Chris.
> >>>>>
> >>>>>> I need help from some one to build and test latest changes on AIX, may
> >>>>>> be Christoph can help.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Vyom
> >>>>>>
> >>>>>>
> >>>>>> On Tuesday 06 September 2016 06:25 PM, Chris Hegarty wrote:
> >>>>>>> Vyom,
> >>>>>>>
> >>>>>>>> On 6 Sep 2016, at 10:20, Vyom Tewari <vyom.tewari at oracle.com>
> >> wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Please find the latest
> >>>>>>>>
> >>>>
> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.2/index.html
> >>>>
> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.2/index.html>).
> >>>>>>>> I
> >>>>>>>> have incorporated the review comments.
> >>>>>>> Your changes completely avoid NET_Timeout, which is quite complex
> on
> >>>>>>> Linux, Mac, and AIX, for various reasons. ( NET_Timeout will use the
> >>>>>>> async close mechanism to signal/interrupt a thread blocked in a poll /
> >>>>>>> select ). Also, select is used on Mac, which will use poll after your
> >>>>>>> changes ( I do remember that there was a bug in poll on Mac around
> >>>>>>> the time of the original Mac port ).
> >>>>>>>
> >>>>>>> Would is be better, rather than replicating the logic in NET_Timeout,
> >>>>>>> to have a version that supports passing a function pointer, to run if
> >>>>>>> the poll/select returns before the timeout? Just an idea.
> >>>>>>>
> >>>>>>> -Chris.
> >>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>> Vyom
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Monday 05 September 2016 08:37 PM, Chris Hegarty wrote:
> >>>>>>>>> On 05/09/16 15:37, Mark Sheppard wrote:
> >>>>>>>>>> if the desire is to avoid making double calls on gettimeofday in the
> >>>>>>>>>> NET_ReadWithTimeout's while loop for its main call flow,
> >>>>>>>>> Yes, the desire is to make no more calls to gettimeofday, than are
> >>>>>>>>> already done.
> >>>>>>>>>
> >>>>>>>>>> then consider a modification to NET_Timeout and create a version
> >>>>>>>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct
> timeval *
> >>>>>>>>>> current_time)
> >>>>>>>>>>
> >>>>>>>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct
> timeval *
> >>>>>>>>>> current_time) {
> >>>>>>>>>>       int result;
> >>>>>>>>>>       struct timeval t;
> >>>>>>>>>>       long prevtime, newtime;
> >>>>>>>>>>       struct pollfd pfd;
> >>>>>>>>>>       pfd.fd = s;
> >>>>>>>>>>       pfd.events = POLLIN;
> >>>>>>>>>>
> >>>>>>>>>>       if (timeout > 0) {
> >>>>>>>>>>           prevtime = (current_time->tv_sec * 1000)  +
> >>>>>>>>>> current_time->tv_usec / 1000;
> >>>>>>>>>>       }
> >>>>>>>>>>
> >>>>>>>>>>       for(;;) {
> >>>>>>>>>>           result = poll(&pfd, 1, timeout);
> >>>>>>>>>>           if (result < 0 && errno == EINTR) {
> >>>>>>>>>>               if (timeout > 0) {
> >>>>>>>>>>                   gettimeofday(&t, NULL);
> >>>>>>>>>>                   newtime = (t.tv_sec * 1000)  + t.tv_usec /1000;
> >>>>>>>>>>                   timeout -= newtime - prevtime;
> >>>>>>>>>>                   if (timeout <= 0)
> >>>>>>>>>>                       return 0;
> >>>>>>>>>>                   prevtime = newtime;
> >>>>>>>>>>               }
> >>>>>>>>>>           } else {
> >>>>>>>>>>               return result;
> >>>>>>>>>>           }
> >>>>>>>>>>       }
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> the NET_ReadWithTimeout is modified with.
> >>>>>>>>>>
> >>>>>>>>>>      while (timeout > 0) {
> >>>>>>>>>>           result = NET_TimeoutWithCurrentTime(fd, timeout, &t);
> >>>>>>>>>>
> >>>>>>>>>> in any case there is the potential for multiple invocation of
> >>>>>>>>>> gettimeofday due to a need to make
> >>>>>>>>>> poll restartable,
> >>>>>>>>> Yes, and that is fine. Just no more than is already there.
> >>>>>>>>>
> >>>>>>>>> and adjust the originally specified timeout
> >>>>>>>>>> accordingly, and for the edge case
> >>>>>>>>>> after the non blocking read.
> >>>>>>>>> After an initial skim, your suggestion seems reasonable.
> >>>>>>>>>
> >>>>>>>>> -Chris.
> >>>>>>>>>
> >>>>>>>>>> regards
> >>>>>>>>>> Mark
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 05/09/2016 12:02, Chris Hegarty wrote:
> >>>>>>>>>>> Vyom,
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>
> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
> >>>>>>>>>>>
> >>>>>>>>>>> There is some concern about the potential performance effect,
> etc,
> >>>>>>>>>>> of gettimeofday, maybe there is a way out of this. The reuse of
> >>>>>>>>>>> NET_Timeout is good, but it also calls gettimeofday. It seems that
> >>>>>>>>>>> a specific NET_ReadWithTimeout could be written to NOT reuse
> >>>>>>>>>>> NET_Timeout, but effectively inline its interruptible operation.
> >>>>>>>>>>> Or write a variant of NET_Timeout that takes a function to
> >>>>>>>>>>> execute. Rather than effectively two loops conditioned on the
> >>>>>>>>>>> timeout.  Disclaimer: I have not actually tried to do this, but
> >>>>>>>>>>> it seems worth considering / evaluating.
> >>>>>>>>>>>
> >>>>>>>>>>> -Chris.
> >>>>>>>>>>>
> >>>>>>>>>>> On 02/09/16 04:39, Vyom Tewari wrote:
> >>>>>>>>>>>> hi Dimitry,
> >>>>>>>>>>>>
> >>>>>>>>>>>> thanks for review, I did consider to use  a monotonically
> >>>>>>>>>>>> increasing
> >>>>>>>>>>>> clock like "clock_gettime" but  existing nearby
> >>>>>>>>>>>> code("NET_Timeout") uses
> >>>>>>>>>>>> "gettimeofday"  so i choose to be consistent with the existing
> >>>>>>>>>>>> code.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Vyom
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Friday 02 September 2016 01:38 AM, Dmitry Samersoff
> wrote:
> >>>>>>>>>>>>> Vyom,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Did you consider to use select() to calculate timeout instead of
> >>>>>>>>>>>>> gettimeofday ?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> gettimeofday is affected by system time changes, so running
> ntpd
> >>>>>>>>>>>>> can
> >>>>>>>>>>>>> cause unpredictable behavior of this code. Also it's rather
> >>>>>>>>>>>>> expensive
> >>>>>>>>>>>>> syscall.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Dmitry
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 2016-09-01 19:03, Vyom Tewari wrote:
> >>>>>>>>>>>>>> please find the updated
> >>>>>>>>>>>>>>
> >>>>
> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>
> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.1/index.html>).
> >>>>>>>>>>>>>> I
> >>>>>>>>>>>>>> incorporated the review comments.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Vyom
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tuesday 30 August 2016 04:11 PM, Mark Sheppard wrote:
> >>>>>>>>>>>>>>> Hi
> >>>>>>>>>>>>>>>      perhaps there is an opportunity to do  some refactoring
> >>>>>>>>>>>>>>> here (...
> >>>>>>>>>>>>>>> for me a "goto " carries a code smell! )
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> along the lines
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> if (timeout) {
> >>>>>>>>>>>>>>>        nread =  NET_ReadWithTimeout(...);
> >>>>>>>>>>>>>>> } else {
> >>>>>>>>>>>>>>>         nread = NET_Read(...);
> >>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> the NET_ReadWithTimeout (...) function will contain a
> >>>>>>>>>>>>>>> restructuring of
> >>>>>>>>>>>>>>> your goto loop
> >>>>>>>>>>>>>>> while (_timeout > 0) { nread = NET_Timeout(fd, _timeout);
> >>>>>>>>>>>>>>>              if (nread <= 0) {
> >>>>>>>>>>>>>>>                  if (nread == 0) {
> >>>>>>>>>>>>>>>                      JNU_ThrowByName(env, JNU_JAVANETPKG
> >>>>>>>>>>>>>>> "SocketTimeoutException",
> >>>>>>>>>>>>>>>                                  "Read timed out");
> >>>>>>>>>>>>>>>                  } else if (nread == -1) {
> >>>>>>>>>>>>>>>                      if (errno == EBADF) {
> >>>>>>>>>>>>>>>                           JNU_ThrowByName(env, JNU_JAVANETPKG
> >>>>>>>>>>>>>>> "SocketException", "Socket closed");
> >>>>>>>>>>>>>>>                      } else if (errno == ENOMEM) {
> >>>>>>>>>>>>>>> JNU_ThrowOutOfMemoryError(env,
> >>>>>>>>>>>>>>> "NET_Timeout
> >>>>>>>>>>>>>>> native heap allocation failed");
> >>>>>>>>>>>>>>>                      } else {
> >>>>>>>>>>>>>>> JNU_ThrowByNameWithMessageAndLastError
> >>>>>>>>>>>>>>>                              (env, JNU_JAVANETPKG
> >>>>>>>>>>>>>>> "SocketException",
> >>>>>>>>>>>>>>> "select/poll failed");
> >>>>>>>>>>>>>>>                      }
> >>>>>>>>>>>>>>>                  }
> >>>>>>>>>>>>>>>                    // release buffer in main call flow
> >>>>>>>>>>>>>>> //              if (bufP != BUF) {
> >>>>>>>>>>>>>>> //                  free(bufP);
> >>>>>>>>>>>>>>> //             }
> >>>>>>>>>>>>>>>                 nread = -1;
> >>>>>>>>>>>>>>>                 break;
> >>>>>>>>>>>>>>>              } else {
> >>>>>>>>>>>>>>> nread = NET_NonBlockingRead(fd, bufP, len);
> >>>>>>>>>>>>>>> if (nread == -1 && ((errno == EAGAIN) || (errno ==
> >>>>>>>>>>>>>>> EWOULDBLOCK))) {
> >>>>>>>>>>>>>>>                  gettimeofday(&t, NULL);
> >>>>>>>>>>>>>>> newtime = t.tv_sec * 1000 + t.tv_usec / 1000;
> >>>>>>>>>>>>>>> _timeout -= newtime - prevtime;
> >>>>>>>>>>>>>>> if(_timeout > 0){
> >>>>>>>>>>>>>>> prevtime = newtime;
> >>>>>>>>>>>>>>>                      }
> >>>>>>>>>>>>>>> } else { break; } } } return nread;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> e&oe
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> regards
> >>>>>>>>>>>>>>> Mark
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 29/08/2016 10:58, Vyom Tewari wrote:
> >>>>>>>>>>>>>>>> gentle reminder, please review the below code change.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Vyom
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Monday 22 August 2016 05:12 PM, Vyom Tewari wrote:
> >>>>>>>>>>>>>>>>> Hi All,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Please review the code changes for below issue.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Bug         :
> >>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8075484
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> webrev    :
> >>>>>>>>>>>>>>>>>
> >>>> http://cr.openjdk.java.net/~vtewari/8075484/webrev0.0/index.html
> >>>>>>>>>>>>>>>>>
> >>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.0/index.html>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> This issue is SocketInputStream.socketread0() hangs even
> >> with
> >>>>>>>>>>>>>>>>> "soTimeout" set.the implementation of
> >>>>>>>>>>>>>>>>> Java_java_net_SocketInputStream_socketRead0 assumes
> >> that
> >>>>>>>>>>>>>>>>> read()
> >>>>>>>>>>>>>>>>> won't block after poll() reports that a read is possible.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> This assumption does not hold, as noted on the man page
> for
> >>>>>>>>>>>>>>>>> select
> >>>>>>>>>>>>>>>>> (referenced by the man page for poll): Under Linux,
> select()
> >>>>>>>>>>>>>>>>> may
> >>>>>>>>>>>>>>>>> report a socket file descriptor as "ready for reading",
> while
> >>>>>>>>>>>>>>>>> nevertheless a subsequent read blocks. This could for
> >> example
> >>>>>>>>>>>>>>>>> happen
> >>>>>>>>>>>>>>>>> when data has arrived but upon examination has wrong
> >>>> checksum
> >>>>>>>>>>>>>>>>> and is
> >>>>>>>>>>>>>>>>> discarded.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Vyom
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>



More information about the net-dev mailing list