RFR 8075484:SocketInputStream.socketRead0 can hang even with soTimeout set
Vyom Tewari
vyom.tewari at oracle.com
Wed Sep 7 16:31:10 UTC 2016
Hi Chris,
Sorry I was mostly focusing on our test failure first, i will check
your suggestions.
Thanks,
Vyom
On Wednesday 07 September 2016 08:44 PM, Langer, Christoph wrote:
> Hi Vyom,
>
> did you have a look at my suggestions regarding AIX and refactoring? I can't find none of it in the new webrev nor did you comment on it.
>
> Best regards
> Christoph
>
>> -----Original Message-----
>> From: net-dev [mailto:net-dev-bounces at openjdk.java.net] On Behalf Of Vyom
>> Tewari
>> Sent: Mittwoch, 7. September 2016 17:10
>> To: Chris Hegarty <chris.hegarty at oracle.com>
>> Cc: net-dev at openjdk.java.net
>> Subject: Re: RFR 8075484:SocketInputStream.socketRead0 can hang even with
>> soTimeout set
>>
>> Hi All,
>>
>> Please find the latest
>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.4/index.html
>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.4/index.html>).
>> there were some test failures in JPRT and Chris also pointed the same
>> issue in modified code. Now with latest code JPRT is clean.
>>
>> Thanks,
>>
>> Vyom
>>
>>
>> On Wednesday 07 September 2016 03:18 PM, Chris Hegarty wrote:
>>>
>>> On 07/09/16 07:45, Vyom Tewari wrote:
>>>> Hi Chris,
>>>>
>>>> Please find the latest
>>>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.3/index.html
>>>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.3/index.html>). I
>>>> did some code refactoring, JPRT is in progress.
>>> In terms of NET_Timeout**, I'm happy with this version, but I
>>> have an additional comment.
>>>
>>> If NET_ReadWithTimeout returns -1 because NET_TimeoutWithCurrentTime
>>> returns <= 0, then a JNI pending exception will be set. So the code
>>> calling NET_ReadWithTimeout should, a) free bufP, and b) return -1,
>>> immediately rather than continuing and possibly trying to set another
>>> JNI pending exception.
>>>
>>> One option is to check the return value from NET_ReadWithTimeout and
>>> also (*env)->ExceptionCheck(env). Another is to possibly consolidate
>>> the setting of JNI pending exceptions in one place, towards the end
>>> of Java_java_net_SocketInputStream_socketRead0, for example EBADF is
>>> handled in two places.
>>>
>>> -Chris.
>>>
>>>> I need help from some one to build and test latest changes on AIX, may
>>>> be Christoph can help.
>>>>
>>>> Thanks,
>>>>
>>>> Vyom
>>>>
>>>>
>>>> On Tuesday 06 September 2016 06:25 PM, Chris Hegarty wrote:
>>>>> Vyom,
>>>>>
>>>>>> On 6 Sep 2016, at 10:20, Vyom Tewari <vyom.tewari at oracle.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Please find the latest
>>>>>>
>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.2/index.html
>>>>>>
>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.2/index.html>).
>>>>>> I
>>>>>> have incorporated the review comments.
>>>>> Your changes completely avoid NET_Timeout, which is quite complex on
>>>>> Linux, Mac, and AIX, for various reasons. ( NET_Timeout will use the
>>>>> async close mechanism to signal/interrupt a thread blocked in a poll /
>>>>> select ). Also, select is used on Mac, which will use poll after your
>>>>> changes ( I do remember that there was a bug in poll on Mac around
>>>>> the time of the original Mac port ).
>>>>>
>>>>> Would is be better, rather than replicating the logic in NET_Timeout,
>>>>> to have a version that supports passing a function pointer, to run if
>>>>> the poll/select returns before the timeout? Just an idea.
>>>>>
>>>>> -Chris.
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Vyom
>>>>>>
>>>>>>
>>>>>> On Monday 05 September 2016 08:37 PM, Chris Hegarty wrote:
>>>>>>> On 05/09/16 15:37, Mark Sheppard wrote:
>>>>>>>> if the desire is to avoid making double calls on gettimeofday in the
>>>>>>>> NET_ReadWithTimeout's while loop for its main call flow,
>>>>>>> Yes, the desire is to make no more calls to gettimeofday, than are
>>>>>>> already done.
>>>>>>>
>>>>>>>> then consider a modification to NET_Timeout and create a version
>>>>>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct timeval *
>>>>>>>> current_time)
>>>>>>>>
>>>>>>>> int NET_TimeoutWithCurrentTime(int s, long timeout, stuct timeval *
>>>>>>>> current_time) {
>>>>>>>> int result;
>>>>>>>> struct timeval t;
>>>>>>>> long prevtime, newtime;
>>>>>>>> struct pollfd pfd;
>>>>>>>> pfd.fd = s;
>>>>>>>> pfd.events = POLLIN;
>>>>>>>>
>>>>>>>> if (timeout > 0) {
>>>>>>>> prevtime = (current_time->tv_sec * 1000) +
>>>>>>>> current_time->tv_usec / 1000;
>>>>>>>> }
>>>>>>>>
>>>>>>>> for(;;) {
>>>>>>>> result = poll(&pfd, 1, timeout);
>>>>>>>> if (result < 0 && errno == EINTR) {
>>>>>>>> if (timeout > 0) {
>>>>>>>> gettimeofday(&t, NULL);
>>>>>>>> newtime = (t.tv_sec * 1000) + t.tv_usec /1000;
>>>>>>>> timeout -= newtime - prevtime;
>>>>>>>> if (timeout <= 0)
>>>>>>>> return 0;
>>>>>>>> prevtime = newtime;
>>>>>>>> }
>>>>>>>> } else {
>>>>>>>> return result;
>>>>>>>> }
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>> the NET_ReadWithTimeout is modified with.
>>>>>>>>
>>>>>>>> while (timeout > 0) {
>>>>>>>> result = NET_TimeoutWithCurrentTime(fd, timeout, &t);
>>>>>>>>
>>>>>>>> in any case there is the potential for multiple invocation of
>>>>>>>> gettimeofday due to a need to make
>>>>>>>> poll restartable,
>>>>>>> Yes, and that is fine. Just no more than is already there.
>>>>>>>
>>>>>>> and adjust the originally specified timeout
>>>>>>>> accordingly, and for the edge case
>>>>>>>> after the non blocking read.
>>>>>>> After an initial skim, your suggestion seems reasonable.
>>>>>>>
>>>>>>> -Chris.
>>>>>>>
>>>>>>>> regards
>>>>>>>> Mark
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/09/2016 12:02, Chris Hegarty wrote:
>>>>>>>>> Vyom,
>>>>>>>>>
>>>>>>>>>
>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> There is some concern about the potential performance effect, etc,
>>>>>>>>> of gettimeofday, maybe there is a way out of this. The reuse of
>>>>>>>>> NET_Timeout is good, but it also calls gettimeofday. It seems that
>>>>>>>>> a specific NET_ReadWithTimeout could be written to NOT reuse
>>>>>>>>> NET_Timeout, but effectively inline its interruptible operation.
>>>>>>>>> Or write a variant of NET_Timeout that takes a function to
>>>>>>>>> execute. Rather than effectively two loops conditioned on the
>>>>>>>>> timeout. Disclaimer: I have not actually tried to do this, but
>>>>>>>>> it seems worth considering / evaluating.
>>>>>>>>>
>>>>>>>>> -Chris.
>>>>>>>>>
>>>>>>>>> On 02/09/16 04:39, Vyom Tewari wrote:
>>>>>>>>>> hi Dimitry,
>>>>>>>>>>
>>>>>>>>>> thanks for review, I did consider to use a monotonically
>>>>>>>>>> increasing
>>>>>>>>>> clock like "clock_gettime" but existing nearby
>>>>>>>>>> code("NET_Timeout") uses
>>>>>>>>>> "gettimeofday" so i choose to be consistent with the existing
>>>>>>>>>> code.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Vyom
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Friday 02 September 2016 01:38 AM, Dmitry Samersoff wrote:
>>>>>>>>>>> Vyom,
>>>>>>>>>>>
>>>>>>>>>>> Did you consider to use select() to calculate timeout instead of
>>>>>>>>>>> gettimeofday ?
>>>>>>>>>>>
>>>>>>>>>>> gettimeofday is affected by system time changes, so running ntpd
>>>>>>>>>>> can
>>>>>>>>>>> cause unpredictable behavior of this code. Also it's rather
>>>>>>>>>>> expensive
>>>>>>>>>>> syscall.
>>>>>>>>>>>
>>>>>>>>>>> -Dmitry
>>>>>>>>>>>
>>>>>>>>>>> On 2016-09-01 19:03, Vyom Tewari wrote:
>>>>>>>>>>>> please find the updated
>>>>>>>>>>>>
>> webrev(http://cr.openjdk.java.net/~vtewari/8075484/webrev0.1/index.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.1/index.html>).
>>>>>>>>>>>>
>>>>>>>>>>>> I
>>>>>>>>>>>> incorporated the review comments.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Vyom
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tuesday 30 August 2016 04:11 PM, Mark Sheppard wrote:
>>>>>>>>>>>>> Hi
>>>>>>>>>>>>> perhaps there is an opportunity to do some refactoring
>>>>>>>>>>>>> here (...
>>>>>>>>>>>>> for me a "goto " carries a code smell! )
>>>>>>>>>>>>>
>>>>>>>>>>>>> along the lines
>>>>>>>>>>>>>
>>>>>>>>>>>>> if (timeout) {
>>>>>>>>>>>>> nread = NET_ReadWithTimeout(...);
>>>>>>>>>>>>> } else {
>>>>>>>>>>>>> nread = NET_Read(...);
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> the NET_ReadWithTimeout (...) function will contain a
>>>>>>>>>>>>> restructuring of
>>>>>>>>>>>>> your goto loop
>>>>>>>>>>>>> while (_timeout > 0) { nread = NET_Timeout(fd, _timeout);
>>>>>>>>>>>>> if (nread <= 0) {
>>>>>>>>>>>>> if (nread == 0) {
>>>>>>>>>>>>> JNU_ThrowByName(env, JNU_JAVANETPKG
>>>>>>>>>>>>> "SocketTimeoutException",
>>>>>>>>>>>>> "Read timed out");
>>>>>>>>>>>>> } else if (nread == -1) {
>>>>>>>>>>>>> if (errno == EBADF) {
>>>>>>>>>>>>> JNU_ThrowByName(env, JNU_JAVANETPKG
>>>>>>>>>>>>> "SocketException", "Socket closed");
>>>>>>>>>>>>> } else if (errno == ENOMEM) {
>>>>>>>>>>>>> JNU_ThrowOutOfMemoryError(env,
>>>>>>>>>>>>> "NET_Timeout
>>>>>>>>>>>>> native heap allocation failed");
>>>>>>>>>>>>> } else {
>>>>>>>>>>>>> JNU_ThrowByNameWithMessageAndLastError
>>>>>>>>>>>>> (env, JNU_JAVANETPKG
>>>>>>>>>>>>> "SocketException",
>>>>>>>>>>>>> "select/poll failed");
>>>>>>>>>>>>> }
>>>>>>>>>>>>> }
>>>>>>>>>>>>> // release buffer in main call flow
>>>>>>>>>>>>> // if (bufP != BUF) {
>>>>>>>>>>>>> // free(bufP);
>>>>>>>>>>>>> // }
>>>>>>>>>>>>> nread = -1;
>>>>>>>>>>>>> break;
>>>>>>>>>>>>> } else {
>>>>>>>>>>>>> nread = NET_NonBlockingRead(fd, bufP, len);
>>>>>>>>>>>>> if (nread == -1 && ((errno == EAGAIN) || (errno ==
>>>>>>>>>>>>> EWOULDBLOCK))) {
>>>>>>>>>>>>> gettimeofday(&t, NULL);
>>>>>>>>>>>>> newtime = t.tv_sec * 1000 + t.tv_usec / 1000;
>>>>>>>>>>>>> _timeout -= newtime - prevtime;
>>>>>>>>>>>>> if(_timeout > 0){
>>>>>>>>>>>>> prevtime = newtime;
>>>>>>>>>>>>> }
>>>>>>>>>>>>> } else { break; } } } return nread;
>>>>>>>>>>>>>
>>>>>>>>>>>>> e&oe
>>>>>>>>>>>>>
>>>>>>>>>>>>> regards
>>>>>>>>>>>>> Mark
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 29/08/2016 10:58, Vyom Tewari wrote:
>>>>>>>>>>>>>> gentle reminder, please review the below code change.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Monday 22 August 2016 05:12 PM, Vyom Tewari wrote:
>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please review the code changes for below issue.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Bug :
>>>>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8075484
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> webrev :
>>>>>>>>>>>>>>>
>> http://cr.openjdk.java.net/~vtewari/8075484/webrev0.0/index.html
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>> <http://cr.openjdk.java.net/%7Evtewari/8075484/webrev0.0/index.html>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This issue is SocketInputStream.socketread0() hangs even with
>>>>>>>>>>>>>>> "soTimeout" set.the implementation of
>>>>>>>>>>>>>>> Java_java_net_SocketInputStream_socketRead0 assumes that
>>>>>>>>>>>>>>> read()
>>>>>>>>>>>>>>> won't block after poll() reports that a read is possible.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This assumption does not hold, as noted on the man page for
>>>>>>>>>>>>>>> select
>>>>>>>>>>>>>>> (referenced by the man page for poll): Under Linux, select()
>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>> report a socket file descriptor as "ready for reading", while
>>>>>>>>>>>>>>> nevertheless a subsequent read blocks. This could for example
>>>>>>>>>>>>>>> happen
>>>>>>>>>>>>>>> when data has arrived but upon examination has wrong
>> checksum
>>>>>>>>>>>>>>> and is
>>>>>>>>>>>>>>> discarded.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Vyom
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
More information about the net-dev
mailing list