RFR 8075484:SocketInputStream.socketRead0 can hang even with soTimeout set

Vyom Tewari vyom.tewari at oracle.com
Tue Sep 13 03:00:45 UTC 2016


any comment on latest 
webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.5/index.html 
<http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.5/index.html>) ?

Vyom


On Thursday 08 September 2016 03:13 PM, Vyom Tewari wrote:
>
> On Thursday 08 September 2016 02:50 PM, Langer, Christoph wrote:
>> Hi Vyom,
>>
>> this looks good.
>>
>> Is there any particular reason why NET_ReadWithTimeout should remain in SocketInputStream.c and not also move to net_util_md.c? That way you could also have a "static long getCurrentTime()" inside net_util_md.c, instead of an exported NET_GetCurrentTime().
> I thought this "NET_ReadWithTimeou" is specific to SocketInputStream 
> so i will be good if we put this method to socketinputstream.c.
>
> In future we will in using the monotonic increasing clock so 
> "NET_GetCurrentTime()" will help , Just change the implementation you 
> are done.
>
> Vyom
>> And no "#include <sys/time.h>" would be needed in SocketInputStream.c - maybe not even now as it is.
>>
>> Best regards
>> Christoph
>>
>>> -----Original Message-----
>>> From: Vyom Tewari [mailto:vyom.tewari at oracle.com]
>>> Sent: Donnerstag, 8. September 2016 05:20
>>> To: Langer, Christoph<christoph.langer at sap.com>
>>> Cc:net-dev at openjdk.java.net
>>> Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even with
>>> soTimeout set
>>>
>>> Hi All,
>>>
>>> Please find the latest
>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.5/index.html
>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.5/index.html>).
>>>
>>> I fixed the AIX flag issue and move some come common code to
>>> "net_util_md.c" file.
>>>
>>> Thanks,
>>> Vyom
>>>
>>> On 9/8/2016 12:32 PM, Langer, Christoph wrote:
>>>> Hi Vyom,
>>>>
>>>> ok, thanks. I have one addition to my proposal: I don't think there's a need for
>>> a global NET_GetCurrentTime function. This one should probably be a static
>>> function inside net_util_md.c (static long getCurrentTime()).
>>>> Best
>>>> Christoph
>>>>
>>>>> -----Original Message-----
>>>>> From: Vyom Tewari [mailto:vyom.tewari at oracle.com]
>>>>> Sent: Mittwoch, 7. September 2016 18:31
>>>>> To: Langer, Christoph<christoph.langer at sap.com>
>>>>> Cc:net-dev at openjdk.java.net; Chris Hegarty<chris.hegarty at oracle.com>
>>>>> Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even
>>> with
>>>>> soTimeout set
>>>>>
>>>>> Hi Chris,
>>>>>
>>>>> Sorry I was mostly focusing on our test  failure first, i will check
>>>>> your suggestions.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Vyom
>>>>>
>>>>>
>>>>> On Wednesday 07 September 2016 08:44 PM, Langer, Christoph wrote:
>>>>>> Hi Vyom,
>>>>>>
>>>>>> did you have a look at my suggestions regarding AIX and refactoring? I
>>> can't
>>>>> find none of it in the new webrev nor did you comment on it.
>>>>>> Best regards
>>>>>> Christoph
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: net-dev [mailto:net-dev-bounces at openjdk.java.net] On Behalf Of
>>>>> Vyom
>>>>>>> Tewari
>>>>>>> Sent: Mittwoch, 7. September 2016 17:10
>>>>>>> To: Chris Hegarty<chris.hegarty at oracle.com>
>>>>>>> Cc:net-dev at openjdk.java.net
>>>>>>> Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even
>>>>> with
>>>>>>> soTimeout set
>>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Please find the latest
>>>>>>>
>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.4/index.html
>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.4/index.html>).
>>>>>>> there were some test failures in JPRT and Chris also pointed the same
>>>>>>> issue in modified code. Now with latest code JPRT is clean.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Vyom
>>>>>>>
>>>>>>>
>>>>>>> On Wednesday 07 September 2016 03:18 PM, Chris Hegarty wrote:
>>>>>>>> On 07/09/16 07:45, Vyom Tewari wrote:
>>>>>>>>> Hi Chris,
>>>>>>>>>
>>>>>>>>> Please find the latest
>>>>>>>>>
>>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.3/index.html
>>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.3/index.html>). I
>>>>>>>>> did some code refactoring, JPRT is in progress.
>>>>>>>> In terms of NET_Timeout**, I'm happy with this version, but I
>>>>>>>> have an additional comment.
>>>>>>>>
>>>>>>>> If NET_ReadWithTimeout returns -1 because
>>>>> NET_TimeoutWithCurrentTime
>>>>>>>> returns <= 0, then a JNI pending exception will be set. So the code
>>>>>>>> calling NET_ReadWithTimeout should, a) free bufP, and b) return -1,
>>>>>>>> immediately rather than continuing and possibly trying to set another
>>>>>>>> JNI pending exception.
>>>>>>>>
>>>>>>>> One option is to check the return value from NET_ReadWithTimeout and
>>>>>>>> also (*env)->ExceptionCheck(env). Another is to possibly consolidate
>>>>>>>> the setting of JNI pending exceptions in one place, towards the end
>>>>>>>> of Java_java_net_SocketInputStream_socketRead0, for example EBADF
>>> is
>>>>>>>> handled in two places.
>>>>>>>>
>>>>>>>> -Chris.
>>>>>>>>
>>>>>>>>> I need help from some one to build and test latest changes on AIX, may
>>>>>>>>> be Christoph can help.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Vyom
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tuesday 06 September 2016 06:25 PM, Chris Hegarty wrote:
>>>>>>>>>> Vyom,
>>>>>>>>>>
>>>>>>>>>>> On 6 Sep 2016, at 10:20, Vyom Tewari<vyom.tewari at oracle.com>
>>>>> wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Please find the latest
>>>>>>>>>>>
>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.2/index.html
>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.2/index.html>).
>>>>>>>>>>> I
>>>>>>>>>>> have incorporated the review comments.
>>>>>>>>>> Your changes completely avoid NET_Timeout, which is quite complex
>>> on
>>>>>>>>>> Linux, Mac, and AIX, for various reasons. ( NET_Timeout will use the
>>>>>>>>>> async close mechanism to signal/interrupt a thread blocked in a poll /
>>>>>>>>>> select ). Also, select is used on Mac, which will use poll after your
>>>>>>>>>> changes ( I do remember that there was a bug in poll on Mac around
>>>>>>>>>> the time of the original Mac port ).
>>>>>>>>>>
>>>>>>>>>> Would is be better, rather than replicating the logic in NET_Timeout,
>>>>>>>>>> to have a version that supports passing a function pointer, to run if
>>>>>>>>>> the poll/select returns before the timeout? Just an idea.
>>>>>>>>>>
>>>>>>>>>> -Chris.
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Vyom
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Monday 05 September 2016 08:37 PM, Chris Hegarty wrote:
>>>>>>>>>>>> On 05/09/16 15:37, Mark Sheppard wrote:
>>>>>>>>>>>>> if the desire is to avoid making double calls on gettimeofday in the
>>>>>>>>>>>>> NET_ReadWithTimeout's while loop for its main call flow,
>>>>>>>>>>>> Yes, the desire is to make no more calls to gettimeofday, than are
>>>>>>>>>>>> already done.
>>>>>>>>>>>>
>>>>>>>>>>>>> then consider a modification to NET_Timeout and create a version
>>>>>>>>>>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct
>>> timeval *
>>>>>>>>>>>>> current_time)
>>>>>>>>>>>>>
>>>>>>>>>>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct
>>> timeval *
>>>>>>>>>>>>> current_time) {
>>>>>>>>>>>>>        int result;
>>>>>>>>>>>>>        struct timeval t;
>>>>>>>>>>>>>        long prevtime, newtime;
>>>>>>>>>>>>>        struct pollfd pfd;
>>>>>>>>>>>>>        pfd.fd = s;
>>>>>>>>>>>>>        pfd.events = POLLIN;
>>>>>>>>>>>>>
>>>>>>>>>>>>>        if (timeout > 0) {
>>>>>>>>>>>>>            prevtime = (current_time->tv_sec * 1000)  +
>>>>>>>>>>>>> current_time->tv_usec / 1000;
>>>>>>>>>>>>>        }
>>>>>>>>>>>>>
>>>>>>>>>>>>>        for(;;) {
>>>>>>>>>>>>>            result = poll(&pfd, 1, timeout);
>>>>>>>>>>>>>            if (result < 0 && errno == EINTR) {
>>>>>>>>>>>>>                if (timeout > 0) {
>>>>>>>>>>>>>                    gettimeofday(&t, NULL);
>>>>>>>>>>>>>                    newtime = (t.tv_sec * 1000)  + t.tv_usec /1000;
>>>>>>>>>>>>>                    timeout -= newtime - prevtime;
>>>>>>>>>>>>>                    if (timeout <= 0)
>>>>>>>>>>>>>                        return 0;
>>>>>>>>>>>>>                    prevtime = newtime;
>>>>>>>>>>>>>                }
>>>>>>>>>>>>>            } else {
>>>>>>>>>>>>>                return result;
>>>>>>>>>>>>>            }
>>>>>>>>>>>>>        }
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> the NET_ReadWithTimeout is modified with.
>>>>>>>>>>>>>
>>>>>>>>>>>>>       while (timeout > 0) {
>>>>>>>>>>>>>            result = NET_TimeoutWithCurrentTime(fd, timeout, &t);
>>>>>>>>>>>>>
>>>>>>>>>>>>> in any case there is the potential for multiple invocation of
>>>>>>>>>>>>> gettimeofday due to a need to make
>>>>>>>>>>>>> poll restartable,
>>>>>>>>>>>> Yes, and that is fine. Just no more than is already there.
>>>>>>>>>>>>
>>>>>>>>>>>> and adjust the originally specified timeout
>>>>>>>>>>>>> accordingly, and for the edge case
>>>>>>>>>>>>> after the non blocking read.
>>>>>>>>>>>> After an initial skim, your suggestion seems reasonable.
>>>>>>>>>>>>
>>>>>>>>>>>> -Chris.
>>>>>>>>>>>>
>>>>>>>>>>>>> regards
>>>>>>>>>>>>> Mark
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 05/09/2016 12:02, Chris Hegarty wrote:
>>>>>>>>>>>>>> Vyom,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
>>>>>>>>>>>>>> There is some concern about the potential performance effect,
>>> etc,
>>>>>>>>>>>>>> of gettimeofday, maybe there is a way out of this. The reuse of
>>>>>>>>>>>>>> NET_Timeout is good, but it also calls gettimeofday. It seems that
>>>>>>>>>>>>>> a specific NET_ReadWithTimeout could be written to NOT reuse
>>>>>>>>>>>>>> NET_Timeout, but effectively inline its interruptible operation.
>>>>>>>>>>>>>> Or write a variant of NET_Timeout that takes a function to
>>>>>>>>>>>>>> execute. Rather than effectively two loops conditioned on the
>>>>>>>>>>>>>> timeout.  Disclaimer: I have not actually tried to do this, but
>>>>>>>>>>>>>> it seems worth considering / evaluating.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Chris.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 02/09/16 04:39, Vyom Tewari wrote:
>>>>>>>>>>>>>>> hi Dimitry,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> thanks for review, I did consider to use  a monotonically
>>>>>>>>>>>>>>> increasing
>>>>>>>>>>>>>>> clock like "clock_gettime" but  existing nearby
>>>>>>>>>>>>>>> code("NET_Timeout") uses
>>>>>>>>>>>>>>> "gettimeofday"  so i choose to be consistent with the existing
>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Friday 02 September 2016 01:38 AM, Dmitry Samersoff
>>> wrote:
>>>>>>>>>>>>>>>> Vyom,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Did you consider to use select() to calculate timeout instead of
>>>>>>>>>>>>>>>> gettimeofday ?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> gettimeofday is affected by system time changes, so running
>>> ntpd
>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>> cause unpredictable behavior of this code. Also it's rather
>>>>>>>>>>>>>>>> expensive
>>>>>>>>>>>>>>>> syscall.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Dmitry
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2016-09-01 19:03, Vyom Tewari wrote:
>>>>>>>>>>>>>>>>> please find the updated
>>>>>>>>>>>>>>>>>
>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.1/index.html>).
>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>> incorporated the review comments.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tuesday 30 August 2016 04:11 PM, Mark Sheppard wrote:
>>>>>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>>>>>       perhaps there is an opportunity to do  some refactoring
>>>>>>>>>>>>>>>>>> here (...
>>>>>>>>>>>>>>>>>> for me a "goto " carries a code smell! )
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> along the lines
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> if (timeout) {
>>>>>>>>>>>>>>>>>>         nread =  NET_ReadWithTimeout(...);
>>>>>>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>>>>>>          nread = NET_Read(...);
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> the NET_ReadWithTimeout (...) function will contain a
>>>>>>>>>>>>>>>>>> restructuring of
>>>>>>>>>>>>>>>>>> your goto loop
>>>>>>>>>>>>>>>>>> while (_timeout > 0) { nread = NET_Timeout(fd, _timeout);
>>>>>>>>>>>>>>>>>>               if (nread <= 0) {
>>>>>>>>>>>>>>>>>>                   if (nread == 0) {
>>>>>>>>>>>>>>>>>>                       JNU_ThrowByName(env, JNU_JAVANETPKG
>>>>>>>>>>>>>>>>>> "SocketTimeoutException",
>>>>>>>>>>>>>>>>>>                                   "Read timed out");
>>>>>>>>>>>>>>>>>>                   } else if (nread == -1) {
>>>>>>>>>>>>>>>>>>                       if (errno == EBADF) {
>>>>>>>>>>>>>>>>>>                            JNU_ThrowByName(env, JNU_JAVANETPKG
>>>>>>>>>>>>>>>>>> "SocketException", "Socket closed");
>>>>>>>>>>>>>>>>>>                       } else if (errno == ENOMEM) {
>>>>>>>>>>>>>>>>>> JNU_ThrowOutOfMemoryError(env,
>>>>>>>>>>>>>>>>>> "NET_Timeout
>>>>>>>>>>>>>>>>>> native heap allocation failed");
>>>>>>>>>>>>>>>>>>                       } else {
>>>>>>>>>>>>>>>>>> JNU_ThrowByNameWithMessageAndLastError
>>>>>>>>>>>>>>>>>>                               (env, JNU_JAVANETPKG
>>>>>>>>>>>>>>>>>> "SocketException",
>>>>>>>>>>>>>>>>>> "select/poll failed");
>>>>>>>>>>>>>>>>>>                       }
>>>>>>>>>>>>>>>>>>                   }
>>>>>>>>>>>>>>>>>>                     // release buffer in main call flow
>>>>>>>>>>>>>>>>>> //              if (bufP != BUF) {
>>>>>>>>>>>>>>>>>> //                  free(bufP);
>>>>>>>>>>>>>>>>>> //             }
>>>>>>>>>>>>>>>>>>                  nread = -1;
>>>>>>>>>>>>>>>>>>                  break;
>>>>>>>>>>>>>>>>>>               } else {
>>>>>>>>>>>>>>>>>> nread = NET_NonBlockingRead(fd, bufP, len);
>>>>>>>>>>>>>>>>>> if (nread == -1 && ((errno == EAGAIN) || (errno ==
>>>>>>>>>>>>>>>>>> EWOULDBLOCK))) {
>>>>>>>>>>>>>>>>>>                   gettimeofday(&t, NULL);
>>>>>>>>>>>>>>>>>> newtime = t.tv_sec * 1000 + t.tv_usec / 1000;
>>>>>>>>>>>>>>>>>> _timeout -= newtime - prevtime;
>>>>>>>>>>>>>>>>>> if(_timeout > 0){
>>>>>>>>>>>>>>>>>> prevtime = newtime;
>>>>>>>>>>>>>>>>>>                       }
>>>>>>>>>>>>>>>>>> } else { break; } } } return nread;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> e&oe
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> regards
>>>>>>>>>>>>>>>>>> Mark
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 29/08/2016 10:58, Vyom Tewari wrote:
>>>>>>>>>>>>>>>>>>> gentle reminder, please review the below code change.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Monday 22 August 2016 05:12 PM, Vyom Tewari wrote:
>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Please review the code changes for below issue.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Bug         :
>>>>>>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8075484
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> webrev    :
>>>>>>>>>>>>>>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~vtewari/8075484/webrev0.0/index.html
>>>>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.0/index.html>
>>>>>>>>>>>>>>>>>>>> This issue is SocketInputStream.socketread0() hangs even
>>>>> with
>>>>>>>>>>>>>>>>>>>> "soTimeout" set.the implementation of
>>>>>>>>>>>>>>>>>>>> Java_java_net_SocketInputStream_socketRead0 assumes
>>>>> that
>>>>>>>>>>>>>>>>>>>> read()
>>>>>>>>>>>>>>>>>>>> won't block after poll() reports that a read is possible.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> This assumption does not hold, as noted on the man page
>>> for
>>>>>>>>>>>>>>>>>>>> select
>>>>>>>>>>>>>>>>>>>> (referenced by the man page for poll): Under Linux,
>>> select()
>>>>>>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>>>>> report a socket file descriptor as "ready for reading",
>>> while
>>>>>>>>>>>>>>>>>>>> nevertheless a subsequent read blocks. This could for
>>>>> example
>>>>>>>>>>>>>>>>>>>> happen
>>>>>>>>>>>>>>>>>>>> when data has arrived but upon examination has wrong
>>>>>>> checksum
>>>>>>>>>>>>>>>>>>>> and is
>>>>>>>>>>>>>>>>>>>> discarded.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/net-dev/attachments/20160913/20121f5d/attachment-0001.html>


More information about the net-dev mailing list