RFR [8024521] (process) Async close issues with Process InputStream
Ivan Gerasimov
ivan.gerasimov at oracle.com
Tue Oct 15 15:31:18 UTC 2013
Here's the updated webrev with the solution by Paul:
http://cr.openjdk.java.net/~igerasim/8024521/1/webrev/
I've also excluded the test from it.
Instead, I'm going to write a manual testing instruction for SQE.
I've tested the fix and it works as expected.
Sincerely yours,
Ivan
On 15.10.2013 10:29, Martin Buchholz wrote:
> Hi
>
> I'm the author of this code.
> I'm perfectly willing to believe the close/drain code is buggy.
> It is difficult to get right.
> I'm looking forward to the updated webrev.
>
> Often the best way to handle a stress test is to make it run quickly
> by default, but provide a system property to make it run as long as
> desired, for manual ad-hoc testing.
>
>
> On Mon, Oct 14, 2013 at 9:04 AM, Ivan Gerasimov
> <ivan.gerasimov at oracle.com <mailto:ivan.gerasimov at oracle.com>> wrote:
>
> Thanks Paul!
>
> Yes, I like your solution better.
> I'll test it and send the updated webrev shortly.
>
> Sincerely yours,
> Ivan
>
>
>
> On 14.10.2013 15:33, Paul Sandoz wrote:
>
> On Oct 14, 2013, at 11:36 AM, Ivan Gerasimov
> <ivan.gerasimov at oracle.com <mailto:ivan.gerasimov at oracle.com>>
> wrote:
>
> Hi Paul,
>
> Thank you for looking at the webrev!
>
> On 14.10.2013 12:50, Paul Sandoz wrote:
>
> Hi Ivan,
>
> Why do you require closeLock? is it not sufficient
> leave processExited as is and just add:
>
> public synchronized void close() throws IOException {
> super.close();
> }
>
> I want to limit the impact on the code, that's why I
> synchronize close() with only processExited().
> If I made close() synchronized with all the stream's
> methods, then I think it could introduce some
> unpredictable drawback.
>
> Yeah, the close will block until the drain is complete, which
> might also have some unpredictable drawback.
>
> A concern is there was already some code in place to check for
> async close, which if detected throws away all the data and
> sets the underlying filtered input stream is set to null. So
> all that effort draining is wasted.
>
> Another solution is to keep checking in the loop of
> drainInputStream for an async close and fail-fast:
>
> private InputStream drainInputStream(InputStream in)
> throws IOException {
> int n = 0;
> int j;
> byte[] a = null;
> while ((j = in.available()) > 0) {
> if (buf == null) // asynchronous close()?
> return null; // stop reading and discard
> a = (a == null) ? new byte[j] :
> Arrays.copyOf(a, n + j);
> n += in.read(a, n, j);
> }
>
> if (buf == null) // asynchronous close()?
> return null; // discard
> else if (a == null)
> return ProcessBuilder.NullInputStream.INSTANCE;
> else
> return ByteArrayInputStream(n == a.length ? a
> : Arrays.copyOf(a, n));
> }
>
> /** Called by the process reaper thread when the
> process exits. */
> synchronized void processExited() {
> // Most BufferedInputStream methods are
> synchronized, but close()
> // is not, and so we have to handle concurrent
> racing close().
> try {
> InputStream in = this.in <http://this.in>;
> if (in != null) {
> InputStream stragglers =
> drainInputStream(in);
> in.close();
> this.in <http://this.in> = stragglers;
> }
> } catch (IOException ignored) {
> // probably an asynchronous close().
> }
> }
>
> This still has the potential to look at the availability of
> the current input before async closing, and read from a
> different input after async closing, but it will break out on
> the next loop check if there are more bytes available to read
> otherwise it will be caught again on the final check, so no
> munged data will be exposed.
>
>
>
> Plus there was already some logic checking if close
> was asynchronously called (see BufferedInputStream.close):
>
> 372 if (buf == null) //
> asynchronous close()?
> 373 this.in <http://this.in> = null;
>
> and this would no longer be required
>
> Yes, you're right and these two lines can be removed.
> I wouldn't remove the surrounding try {} catch
> (IOException) though, as the stream can still be closed
> asynchronously.
>
> I am somewhat struggling to understand the test (not
> too familiar with this area). Requires a more expert
> eye in this area than I.
>
> IIUC you are trying to induce the the output process
> fd of ExecLoop to "flip" to another fd opened by OpenLoop.
>
> Yes.
> I actually adopted a testcase provided by the reporter of
> the problem.
> I only changed the test in such a way that it can be run
> in an automated mode.
>
> Below is the exact comment from the original testcase.
> It may be a good idea to include it in the regtest.
>
> /* Following code is a test case that demonstrates a close
> race in ProcessPipeInputStream.
> *.
> * A snippet of the offending code within
> ProcessPipeInputStream:.
> * // Called by the process reaper thread when the
> process exits..
> * synchronized void processExited() {
> * // Most BufferedInputStream methods are
> synchronized, but close()
> * // is not, and so we have to handle concurrent
> racing close().
> * try {
> * InputStream in = this.in <http://this.in>;
> * if (in != null) {
> * byte[] stragglers = drainInputStream(in);
> * in.close();
> *
> * The comment even talks about the race. However, the code
> is not adequately
> * protecting against it. drainInputStream() (available()
> in particular) is not
> * threadsafe when a close happens in parallel. What's
> worse is that a subsequent
> * open() is likely to reassign the closed fd so
> drainInputStream ends up draining
> * the wrong fd! This can cause OOMs, data corruption on
> other fds, and other very
> * bad things.
> *.
> * Notes:.
> * - Test creates a large sparse file in /tmp/bigsparsefile
> * - Test uses "/proc/meminfo" for simple input..
> * - Invoke using "-Xmx128m" to insure file size of 1G is
> greater than heap size
> * - Test will run forever, if successful there will not
> be any OOM exceptions
> * (on my test system they occur about 1 per minute)
> * - You need multiple cores (I use 2 x Xeon E5620 = 8
> cores, 16 threads).
> *..
> * The basic premise of the test is to create a 1GB sparse
> file. Then have lots
> * of threads opening and closing this file. In a loop
> within a separate thread use UNIXprocess
> * to run a simple command. We read the output of the
> command, make sure the process
> * has exited and then close the inputstream. When the bug
> happens, processExited()
> * will use FIONREAD to figure out how much data to drain
> BUT the fd it is checking
> * will be closed and reopened (now referring to
> /tmp/bigsparsefile) so it will
> * see the size as 1GB and then attempt to allocate a
> buffer of that size and.
> * then OOM..
> */
>
>
>
> The test concerns me since it is gonna use up loads
> of resources and proceed for 10 minutes until deciding
> beyond reasonable doubt that there is no issue.
>
> When I ran the test across different machines (with the
> unpatched jdk) it reported from one to five races during
> 10 minutes.
> We can surely decrease the timeout for the price of higher
> probability of a false positive result.
>
> Unit-like tests should really run in under a minute, plus
> should ideally be frugal in their resource consumption
> otherwise it could starve other tests running in parallel. If
> we cannot create a more deterministic test (not sure it is
> possible) then perhaps this test belongs to SQE and not in the
> JDK tests?
>
> Paul.
>
>
>
More information about the core-libs-dev
mailing list