Discussion on root cause analysis of JDK-7052625 : com/sun/net/httpserver/bugs/6725892/Test.java fails intermittently
michael cui
michael.cui at oracle.com
Thu Feb 20 00:25:37 PST 2014
Hi,
I would like to discuss my current root cause analysis of JDK-7052625 :
com/sun/net/httpserver/bugs/6725892/Test.java fails intermittently
As JDK-6725892 <https://bugs.openjdk.java.net/browse/JDK-6725892>
stated, the purpose of this regression test is testing bad http
connections can be handled correctly which including
+ send no request
+ send an incomplete request
+ fail to read the response completely.
test3() method will start 20 threads for each type listed above at same
time. So totally 60 threads started in test3(). Each thread will open
connection to httpserver and simulate the normal or bad http request to
see if http server can handle them correctly. (20 threads for incomplete
read, 20 threads for incomplete write, 20 threads for read/write normal
case)
Those threads will be started at same time. Among them, 40 threads using
sleep to simulate bad request.
The http server created by the following api call :
s1 = HttpServer.create (addr, 0);
According API doc
<http://docs.oracle.com/javase/7/docs/api/java/net/ServerSocket.html#ServerSocket%28int%29>
and ServerSocket.java source code, the second parameter is backlog of
socket which is the maximum number of queued incoming connections to
allow on the listening socket. Queued TCP connections exceeding this
limit may be rejected by the TCP implementation.. The default value 50
will be used if it was set to zero (See api doc
<http://docs.oracle.com/javase/7/docs/api/java/net/ServerSocket.html#ServerSocket%28int%29>
and ServerSocket.java )).
Since in test3(), 40 threads out of total 60 threads will simulate bad
http request by sleeping either at reading or writing, there could be a
very little possibility that httpserver 's socket connection queue reach
his limit (50 for default value) and some tcp connection will be rest at
that situation.
This could be the root cause of this intermittently failure.
Test result of the original version :
0 failure on Linux for 10000 runs.
0 failure on solaris for 10000 runs.
6 failure on windows for 10000 runs
28 failures on mac for 10000 runs
By increasing the thread number of bad request, we can observe that the
frequency of failure will be increased.
Test result of fix version in which backlog of httpserver was changed
from 0 to 100.
0 failure on Linux for 10000 runs.
0 failure on solaris for 10000 runs.
0 failure on windows for 10000 runs
0 failures on mac for 10000 runs
It seems to me that using default 0 for backlog of httpserver could be
root cause of this intermittently failure.
Are we comfortable with this analysis? If it is the root cause, could
setting backlog as 100 be a suggest fix?
Thanks,
Michael Cui
More information about the net-dev
mailing list