RFR 8075484:SocketInputStream.socketRead0 can hang even with soTimeout set

Vyom Tewari vyom.tewari at oracle.com
Thu Sep 8 03:19:49 UTC 2016


Hi All,

Please find the latest 
webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.5/index.html 
<http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.5/index.html>).

I fixed the AIX flag issue and move some come common code to 
"net_util_md.c" file.

Thanks,
Vyom

On 9/8/2016 12:32 PM, Langer, Christoph wrote:
> Hi Vyom,
>
> ok, thanks. I have one addition to my proposal: I don't think there's a need for a global NET_GetCurrentTime function. This one should probably be a static function inside net_util_md.c (static long getCurrentTime()).
>
> Best
> Christoph
>
>> -----Original Message-----
>> From: Vyom Tewari [mailto:vyom.tewari at oracle.com]
>> Sent: Mittwoch, 7. September 2016 18:31
>> To: Langer, Christoph <christoph.langer at sap.com>
>> Cc: net-dev at openjdk.java.net; Chris Hegarty <chris.hegarty at oracle.com>
>> Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even with
>> soTimeout set
>>
>> Hi Chris,
>>
>> Sorry I was mostly focusing on our test  failure first, i will check
>> your suggestions.
>>
>> Thanks,
>>
>> Vyom
>>
>>
>> On Wednesday 07 September 2016 08:44 PM, Langer, Christoph wrote:
>>> Hi Vyom,
>>>
>>> did you have a look at my suggestions regarding AIX and refactoring? I can't
>> find none of it in the new webrev nor did you comment on it.
>>> Best regards
>>> Christoph
>>>
>>>> -----Original Message-----
>>>> From: net-dev [mailto:net-dev-bounces at openjdk.java.net] On Behalf Of
>> Vyom
>>>> Tewari
>>>> Sent: Mittwoch, 7. September 2016 17:10
>>>> To: Chris Hegarty <chris.hegarty at oracle.com>
>>>> Cc: net-dev at openjdk.java.net
>>>> Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even
>> with
>>>> soTimeout set
>>>>
>>>> Hi All,
>>>>
>>>> Please find the latest
>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.4/index.html
>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.4/index.html>).
>>>> there were some test failures in JPRT and Chris also pointed the same
>>>> issue in modified code. Now with latest code JPRT is clean.
>>>>
>>>> Thanks,
>>>>
>>>> Vyom
>>>>
>>>>
>>>> On Wednesday 07 September 2016 03:18 PM, Chris Hegarty wrote:
>>>>> On 07/09/16 07:45, Vyom Tewari wrote:
>>>>>> Hi Chris,
>>>>>>
>>>>>> Please find the latest
>>>>>>
>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.3/index.html
>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.3/index.html>). I
>>>>>> did some code refactoring, JPRT is in progress.
>>>>> In terms of NET_Timeout**, I'm happy with this version, but I
>>>>> have an additional comment.
>>>>>
>>>>> If NET_ReadWithTimeout returns -1 because
>> NET_TimeoutWithCurrentTime
>>>>> returns <= 0, then a JNI pending exception will be set. So the code
>>>>> calling NET_ReadWithTimeout should, a) free bufP, and b) return -1,
>>>>> immediately rather than continuing and possibly trying to set another
>>>>> JNI pending exception.
>>>>>
>>>>> One option is to check the return value from NET_ReadWithTimeout and
>>>>> also (*env)->ExceptionCheck(env). Another is to possibly consolidate
>>>>> the setting of JNI pending exceptions in one place, towards the end
>>>>> of Java_java_net_SocketInputStream_socketRead0, for example EBADF is
>>>>> handled in two places.
>>>>>
>>>>> -Chris.
>>>>>
>>>>>> I need help from some one to build and test latest changes on AIX, may
>>>>>> be Christoph can help.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Vyom
>>>>>>
>>>>>>
>>>>>> On Tuesday 06 September 2016 06:25 PM, Chris Hegarty wrote:
>>>>>>> Vyom,
>>>>>>>
>>>>>>>> On 6 Sep 2016, at 10:20, Vyom Tewari <vyom.tewari at oracle.com>
>> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Please find the latest
>>>>>>>>
>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.2/index.html
>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.2/index.html>).
>>>>>>>> I
>>>>>>>> have incorporated the review comments.
>>>>>>> Your changes completely avoid NET_Timeout, which is quite complex on
>>>>>>> Linux, Mac, and AIX, for various reasons. ( NET_Timeout will use the
>>>>>>> async close mechanism to signal/interrupt a thread blocked in a poll /
>>>>>>> select ). Also, select is used on Mac, which will use poll after your
>>>>>>> changes ( I do remember that there was a bug in poll on Mac around
>>>>>>> the time of the original Mac port ).
>>>>>>>
>>>>>>> Would is be better, rather than replicating the logic in NET_Timeout,
>>>>>>> to have a version that supports passing a function pointer, to run if
>>>>>>> the poll/select returns before the timeout? Just an idea.
>>>>>>>
>>>>>>> -Chris.
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Vyom
>>>>>>>>
>>>>>>>>
>>>>>>>> On Monday 05 September 2016 08:37 PM, Chris Hegarty wrote:
>>>>>>>>> On 05/09/16 15:37, Mark Sheppard wrote:
>>>>>>>>>> if the desire is to avoid making double calls on gettimeofday in the
>>>>>>>>>> NET_ReadWithTimeout's while loop for its main call flow,
>>>>>>>>> Yes, the desire is to make no more calls to gettimeofday, than are
>>>>>>>>> already done.
>>>>>>>>>
>>>>>>>>>> then consider a modification to NET_Timeout and create a version
>>>>>>>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct timeval *
>>>>>>>>>> current_time)
>>>>>>>>>>
>>>>>>>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct timeval *
>>>>>>>>>> current_time) {
>>>>>>>>>>       int result;
>>>>>>>>>>       struct timeval t;
>>>>>>>>>>       long prevtime, newtime;
>>>>>>>>>>       struct pollfd pfd;
>>>>>>>>>>       pfd.fd = s;
>>>>>>>>>>       pfd.events = POLLIN;
>>>>>>>>>>
>>>>>>>>>>       if (timeout > 0) {
>>>>>>>>>>           prevtime = (current_time->tv_sec * 1000)  +
>>>>>>>>>> current_time->tv_usec / 1000;
>>>>>>>>>>       }
>>>>>>>>>>
>>>>>>>>>>       for(;;) {
>>>>>>>>>>           result = poll(&pfd, 1, timeout);
>>>>>>>>>>           if (result < 0 && errno == EINTR) {
>>>>>>>>>>               if (timeout > 0) {
>>>>>>>>>>                   gettimeofday(&t, NULL);
>>>>>>>>>>                   newtime = (t.tv_sec * 1000)  + t.tv_usec /1000;
>>>>>>>>>>                   timeout -= newtime - prevtime;
>>>>>>>>>>                   if (timeout <= 0)
>>>>>>>>>>                       return 0;
>>>>>>>>>>                   prevtime = newtime;
>>>>>>>>>>               }
>>>>>>>>>>           } else {
>>>>>>>>>>               return result;
>>>>>>>>>>           }
>>>>>>>>>>       }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> the NET_ReadWithTimeout is modified with.
>>>>>>>>>>
>>>>>>>>>>      while (timeout > 0) {
>>>>>>>>>>           result = NET_TimeoutWithCurrentTime(fd, timeout, &t);
>>>>>>>>>>
>>>>>>>>>> in any case there is the potential for multiple invocation of
>>>>>>>>>> gettimeofday due to a need to make
>>>>>>>>>> poll restartable,
>>>>>>>>> Yes, and that is fine. Just no more than is already there.
>>>>>>>>>
>>>>>>>>> and adjust the originally specified timeout
>>>>>>>>>> accordingly, and for the edge case
>>>>>>>>>> after the non blocking read.
>>>>>>>>> After an initial skim, your suggestion seems reasonable.
>>>>>>>>>
>>>>>>>>> -Chris.
>>>>>>>>>
>>>>>>>>>> regards
>>>>>>>>>> Mark
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 05/09/2016 12:02, Chris Hegarty wrote:
>>>>>>>>>>> Vyom,
>>>>>>>>>>>
>>>>>>>>>>>
>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
>>>>>>>>>>>
>>>>>>>>>>> There is some concern about the potential performance effect, etc,
>>>>>>>>>>> of gettimeofday, maybe there is a way out of this. The reuse of
>>>>>>>>>>> NET_Timeout is good, but it also calls gettimeofday. It seems that
>>>>>>>>>>> a specific NET_ReadWithTimeout could be written to NOT reuse
>>>>>>>>>>> NET_Timeout, but effectively inline its interruptible operation.
>>>>>>>>>>> Or write a variant of NET_Timeout that takes a function to
>>>>>>>>>>> execute. Rather than effectively two loops conditioned on the
>>>>>>>>>>> timeout.  Disclaimer: I have not actually tried to do this, but
>>>>>>>>>>> it seems worth considering / evaluating.
>>>>>>>>>>>
>>>>>>>>>>> -Chris.
>>>>>>>>>>>
>>>>>>>>>>> On 02/09/16 04:39, Vyom Tewari wrote:
>>>>>>>>>>>> hi Dimitry,
>>>>>>>>>>>>
>>>>>>>>>>>> thanks for review, I did consider to use  a monotonically
>>>>>>>>>>>> increasing
>>>>>>>>>>>> clock like "clock_gettime" but  existing nearby
>>>>>>>>>>>> code("NET_Timeout") uses
>>>>>>>>>>>> "gettimeofday"  so i choose to be consistent with the existing
>>>>>>>>>>>> code.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Vyom
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Friday 02 September 2016 01:38 AM, Dmitry Samersoff wrote:
>>>>>>>>>>>>> Vyom,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Did you consider to use select() to calculate timeout instead of
>>>>>>>>>>>>> gettimeofday ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> gettimeofday is affected by system time changes, so running ntpd
>>>>>>>>>>>>> can
>>>>>>>>>>>>> cause unpredictable behavior of this code. Also it's rather
>>>>>>>>>>>>> expensive
>>>>>>>>>>>>> syscall.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Dmitry
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2016-09-01 19:03, Vyom Tewari wrote:
>>>>>>>>>>>>>> please find the updated
>>>>>>>>>>>>>>
>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.1/index.html>).
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>> incorporated the review comments.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tuesday 30 August 2016 04:11 PM, Mark Sheppard wrote:
>>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>>      perhaps there is an opportunity to do  some refactoring
>>>>>>>>>>>>>>> here (...
>>>>>>>>>>>>>>> for me a "goto " carries a code smell! )
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> along the lines
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> if (timeout) {
>>>>>>>>>>>>>>>        nread =  NET_ReadWithTimeout(...);
>>>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>>>         nread = NET_Read(...);
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> the NET_ReadWithTimeout (...) function will contain a
>>>>>>>>>>>>>>> restructuring of
>>>>>>>>>>>>>>> your goto loop
>>>>>>>>>>>>>>> while (_timeout > 0) { nread = NET_Timeout(fd, _timeout);
>>>>>>>>>>>>>>>              if (nread <= 0) {
>>>>>>>>>>>>>>>                  if (nread == 0) {
>>>>>>>>>>>>>>>                      JNU_ThrowByName(env, JNU_JAVANETPKG
>>>>>>>>>>>>>>> "SocketTimeoutException",
>>>>>>>>>>>>>>>                                  "Read timed out");
>>>>>>>>>>>>>>>                  } else if (nread == -1) {
>>>>>>>>>>>>>>>                      if (errno == EBADF) {
>>>>>>>>>>>>>>>                           JNU_ThrowByName(env, JNU_JAVANETPKG
>>>>>>>>>>>>>>> "SocketException", "Socket closed");
>>>>>>>>>>>>>>>                      } else if (errno == ENOMEM) {
>>>>>>>>>>>>>>> JNU_ThrowOutOfMemoryError(env,
>>>>>>>>>>>>>>> "NET_Timeout
>>>>>>>>>>>>>>> native heap allocation failed");
>>>>>>>>>>>>>>>                      } else {
>>>>>>>>>>>>>>> JNU_ThrowByNameWithMessageAndLastError
>>>>>>>>>>>>>>>                              (env, JNU_JAVANETPKG
>>>>>>>>>>>>>>> "SocketException",
>>>>>>>>>>>>>>> "select/poll failed");
>>>>>>>>>>>>>>>                      }
>>>>>>>>>>>>>>>                  }
>>>>>>>>>>>>>>>                    // release buffer in main call flow
>>>>>>>>>>>>>>> //              if (bufP != BUF) {
>>>>>>>>>>>>>>> //                  free(bufP);
>>>>>>>>>>>>>>> //             }
>>>>>>>>>>>>>>>                 nread = -1;
>>>>>>>>>>>>>>>                 break;
>>>>>>>>>>>>>>>              } else {
>>>>>>>>>>>>>>> nread = NET_NonBlockingRead(fd, bufP, len);
>>>>>>>>>>>>>>> if (nread == -1 && ((errno == EAGAIN) || (errno ==
>>>>>>>>>>>>>>> EWOULDBLOCK))) {
>>>>>>>>>>>>>>>                  gettimeofday(&t, NULL);
>>>>>>>>>>>>>>> newtime = t.tv_sec * 1000 + t.tv_usec / 1000;
>>>>>>>>>>>>>>> _timeout -= newtime - prevtime;
>>>>>>>>>>>>>>> if(_timeout > 0){
>>>>>>>>>>>>>>> prevtime = newtime;
>>>>>>>>>>>>>>>                      }
>>>>>>>>>>>>>>> } else { break; } } } return nread;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> e&oe
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> regards
>>>>>>>>>>>>>>> Mark
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 29/08/2016 10:58, Vyom Tewari wrote:
>>>>>>>>>>>>>>>> gentle reminder, please review the below code change.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Monday 22 August 2016 05:12 PM, Vyom Tewari wrote:
>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Please review the code changes for below issue.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Bug         :
>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8075484
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> webrev    :
>>>>>>>>>>>>>>>>>
>>>> http://cr.openjdk.java.net/~vtewari/8075484/webrev0.0/index.html
>>>>>>>>>>>>>>>>>
>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.0/index.html>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This issue is SocketInputStream.socketread0() hangs even
>> with
>>>>>>>>>>>>>>>>> "soTimeout" set.the implementation of
>>>>>>>>>>>>>>>>> Java_java_net_SocketInputStream_socketRead0 assumes
>> that
>>>>>>>>>>>>>>>>> read()
>>>>>>>>>>>>>>>>> won't block after poll() reports that a read is possible.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This assumption does not hold, as noted on the man page for
>>>>>>>>>>>>>>>>> select
>>>>>>>>>>>>>>>>> (referenced by the man page for poll): Under Linux, select()
>>>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>> report a socket file descriptor as "ready for reading", while
>>>>>>>>>>>>>>>>> nevertheless a subsequent read blocks. This could for
>> example
>>>>>>>>>>>>>>>>> happen
>>>>>>>>>>>>>>>>> when data has arrived but upon examination has wrong
>>>> checksum
>>>>>>>>>>>>>>>>> and is
>>>>>>>>>>>>>>>>> discarded.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>



More information about the net-dev mailing list