Discussion on root cause analysis of JDK-7052625 : com/sun/net/httpserver/bugs/6725892/Test.java fails intermittently
michael cui
michael.cui at oracle.com
Thu Feb 20 06:11:03 UTC 2014
On 02/18/2014 12:51 AM, michael cui wrote:
> Hi,
>
> I would like to discuss my current root cause analysis of JDK-7052625
> : com/sun/net/httpserver/bugs/6725892/Test.java fails intermittently
>
> As JDK-6725892 <https://bugs.openjdk.java.net/browse/JDK-6725892>
> stated, the purpose of this regression test is testing bad http
> connections can be handled correctly which including
> + send no request
> + send an incomplete request
> + fail to read the response completely.
>
> test3() method will start 20 threads for each type listed above at
> same time. So totally 60 threads started in test3(). Each thread will
> open connection to httpserver and simulate the normal or bad http
> request to see if http server can handle them correctly. (20 threads
> for incomplete read, 20 threads for incomplete write, 20 threads for
> read/write normal case)
>
> Those threads will be started at same time. Among them, 40 threads
> using sleep to simulate bad request.
>
> The http server created by the following api call :
> s1 = HttpServer.create (addr, 0);
>
> According API doc
> <http://docs.oracle.com/javase/7/docs/api/java/net/ServerSocket.html#ServerSocket%28int%29>
> and ServerSocket.java source code, the second parameter is backlog of
> socket which is the maximum number of queued incoming connections to
> allow on the listening socket. Queued TCP connections exceeding this
> limit may be rejected by the TCP implementation.. The default value 50
> will be used if it was set to zero (See api doc
> <http://docs.oracle.com/javase/7/docs/api/java/net/ServerSocket.html#ServerSocket%28int%29>
> and ServerSocket.java )).
>
> Since in test3(), 40 threads out of total 60 threads will simulate bad
> http request by sleeping either at reading or writing, there could be
> a very little possibility that httpserver 's socket connection queue
> reach his limit (50 for default value) and some tcp connection will be
> rest at that situation.
>
> This could be the root cause of this intermittently failure.
>
> Test result of the original version :
> 0 failure on Linux for 10000 runs.
> 0 failure on solaris for 10000 runs.
> 6 failure on windows for 10000 runs
> 28 failures on mac for 10000 runs
>
> By increasing the thread number of bad request, we can observe that
> the frequency of failure will be increased.
>
> Test result of fix version in which backlog of httpserver was changed
> from 0 to 100.
> 0 failure on Linux for 10000 runs.
> 0 failure on solaris for 10000 runs.
> 0 failure on windows for 10000 runs
> 0 failures on mac for 10000 runs
>
> It seems to me that using default 0 for backlog of httpserver could be
> root cause of this intermittently failure.
> Are we comfortable with this analysis? If it is the root cause, could
> setting backlog as 100 be a suggest fix?
>
> Thanks,
> Michael Cui
>
Could anyone provide some insight on this analysis?
Michael Cui
More information about the core-libs-dev
mailing list