Discussion on root cause analysis of JDK-7052625 : com/sun/net/httpserver/bugs/6725892/Test.java fails intermittently

michael cui michael.cui at oracle.com
Thu Feb 20 06:11:03 UTC 2014


On 02/18/2014 12:51 AM, michael cui wrote:
> Hi,
>
> I would like to discuss my current root cause analysis of JDK-7052625  
> : com/sun/net/httpserver/bugs/6725892/Test.java fails intermittently
>
> As JDK-6725892 <https://bugs.openjdk.java.net/browse/JDK-6725892> 
> stated, the purpose of this regression test is testing bad http 
> connections can be handled correctly which including
> + send no request
> + send an incomplete request
> + fail to read the response completely.
>
> test3() method will start 20 threads for each type listed above at 
> same time. So totally 60 threads started in test3(). Each thread will 
> open connection to httpserver and simulate the normal or bad http 
> request to see if http server can handle them correctly. (20 threads 
> for incomplete read, 20 threads for incomplete write, 20 threads for 
> read/write normal case)
>
> Those threads will be started at same time. Among them, 40 threads 
> using sleep to simulate bad request.
>
> The http server created by the following api call :
> s1 = HttpServer.create (addr, 0);
>
> According API doc 
> <http://docs.oracle.com/javase/7/docs/api/java/net/ServerSocket.html#ServerSocket%28int%29> 
> and ServerSocket.java source code, the second parameter is backlog of 
> socket which is the maximum number of queued incoming connections to 
> allow on the listening socket. Queued TCP connections exceeding this 
> limit may be rejected by the TCP implementation.. The default value 50 
> will be used if it was set to zero (See api doc 
> <http://docs.oracle.com/javase/7/docs/api/java/net/ServerSocket.html#ServerSocket%28int%29> 
> and ServerSocket.java )).
>
> Since in test3(), 40 threads out of total 60 threads will simulate bad 
> http request by sleeping either at reading or writing, there could be 
> a very little possibility that httpserver 's socket connection queue 
> reach his limit (50 for default value) and some tcp connection will be 
> rest at that situation.
>
> This could be the root cause of this intermittently failure.
>
> Test result of the original version :
> 0 failure on Linux for 10000 runs.
> 0 failure on solaris for 10000 runs.
> 6 failure on windows for 10000 runs
> 28 failures on mac for 10000 runs
>
> By increasing the thread number of bad request, we can observe that 
> the frequency of failure will be increased.
>
> Test result of fix version in which backlog of httpserver was changed 
> from 0 to 100.
> 0 failure on Linux for 10000 runs.
> 0 failure on solaris for 10000 runs.
> 0 failure on windows for 10000 runs
> 0 failures on mac for 10000 runs
>
> It seems to me that using default 0 for backlog of httpserver could be 
> root cause of this intermittently failure.
> Are we comfortable with this analysis? If it is the root cause, could 
> setting backlog as 100 be a suggest fix?
>
> Thanks,
> Michael Cui
>
Could anyone provide some insight on this analysis?

Michael Cui




More information about the core-libs-dev mailing list