Discussion on root cause analysis of JDK-7052625 : com/sun/net/httpserver/bugs/6725892/Test.java fails intermittently
Chris Hegarty
chris.hegarty at oracle.com
Thu Feb 20 03:24:35 PST 2014
Michael,
I’m ok with your analysis, and suggested fix.
From the original test output, in the bug description, I can see that there are 58 println’s with "Request from:" for test3, and two "Worker: Error writing to server”. This would tend to support your analysis that that server, in some cases, is not accepting the barrage of requests.
Please provide a webrev/changeset and I will sponsor the change for you.
-Chris.
On 20 Feb 2014, at 08:25, michael cui <michael.cui at oracle.com> wrote:
> Hi,
>
> I would like to discuss my current root cause analysis of JDK-7052625 : com/sun/net/httpserver/bugs/6725892/Test.java fails intermittently
>
> As JDK-6725892 <https://bugs.openjdk.java.net/browse/JDK-6725892> stated, the purpose of this regression test is testing bad http connections can be handled correctly which including
> + send no request
> + send an incomplete request
> + fail to read the response completely.
>
> test3() method will start 20 threads for each type listed above at same time. So totally 60 threads started in test3(). Each thread will open connection to httpserver and simulate the normal or bad http request to see if http server can handle them correctly. (20 threads for incomplete read, 20 threads for incomplete write, 20 threads for read/write normal case)
>
> Those threads will be started at same time. Among them, 40 threads using sleep to simulate bad request.
>
> The http server created by the following api call :
> s1 = HttpServer.create (addr, 0);
>
> According API doc <http://docs.oracle.com/javase/7/docs/api/java/net/ServerSocket.html#ServerSocket%28int%29> and ServerSocket.java source code, the second parameter is backlog of socket which is the maximum number of queued incoming connections to allow on the listening socket. Queued TCP connections exceeding this limit may be rejected by the TCP implementation.. The default value 50 will be used if it was set to zero (See api doc <http://docs.oracle.com/javase/7/docs/api/java/net/ServerSocket.html#ServerSocket%28int%29> and ServerSocket.java )).
>
> Since in test3(), 40 threads out of total 60 threads will simulate bad http request by sleeping either at reading or writing, there could be a very little possibility that httpserver 's socket connection queue reach his limit (50 for default value) and some tcp connection will be rest at that situation.
>
> This could be the root cause of this intermittently failure.
>
> Test result of the original version :
> 0 failure on Linux for 10000 runs.
> 0 failure on solaris for 10000 runs.
> 6 failure on windows for 10000 runs
> 28 failures on mac for 10000 runs
>
> By increasing the thread number of bad request, we can observe that the frequency of failure will be increased.
>
> Test result of fix version in which backlog of httpserver was changed from 0 to 100.
> 0 failure on Linux for 10000 runs.
> 0 failure on solaris for 10000 runs.
> 0 failure on windows for 10000 runs
> 0 failures on mac for 10000 runs
>
> It seems to me that using default 0 for backlog of httpserver could be root cause of this intermittently failure.
> Are we comfortable with this analysis? If it is the root cause, could setting backlog as 100 be a suggest fix?
>
> Thanks,
> Michael Cui
More information about the net-dev
mailing list