RFR [8024521] (process) Async close issues with Process InputStream

Tue Oct 15 15:31:18 UTC 2013

Here's the updated webrev with the solution by Paul:
http://cr.openjdk.java.net/~igerasim/8024521/1/webrev/

I've also excluded the test from it.
Instead, I'm going to write a manual testing instruction for SQE.

I've tested the fix and it works as expected.

Sincerely yours,
Ivan

On 15.10.2013 10:29, Martin Buchholz wrote:
> Hi
>
> I'm the author of this code.
> I'm perfectly willing to believe the close/drain code is buggy.
> It is difficult to get right.
> I'm looking forward to the updated webrev.
>
> Often the best way to handle a stress test is to make it run quickly 
> by default, but provide a system property to make it run as long as 
> desired, for manual ad-hoc testing.
>
>
> On Mon, Oct 14, 2013 at 9:04 AM, Ivan Gerasimov 
> <ivan.gerasimov at oracle.com <mailto:ivan.gerasimov at oracle.com>> wrote:
>
>     Thanks Paul!
>
>     Yes, I like your solution better.
>     I'll test it and send the updated webrev shortly.
>
>     Sincerely yours,
>     Ivan
>
>
>
>     On 14.10.2013 15:33, Paul Sandoz wrote:
>
>         On Oct 14, 2013, at 11:36 AM, Ivan Gerasimov
>         <ivan.gerasimov at oracle.com <mailto:ivan.gerasimov at oracle.com>>
>         wrote:
>
>             Hi Paul,
>
>             Thank you for looking at the webrev!
>
>             On 14.10.2013 12:50, Paul Sandoz wrote:
>
>                 Hi Ivan,
>
>                 Why do you require closeLock? is it not sufficient
>                 leave processExited as is and just add:
>
>                 public synchronized void close() throws IOException {
>                      super.close();
>                 }
>
>             I want to limit the impact on the code, that's why I
>             synchronize close() with only processExited().
>             If I made close() synchronized with all the stream's
>             methods, then I think it could introduce some
>             unpredictable drawback.
>
>         Yeah, the close will block until the drain is complete, which
>         might also have some unpredictable drawback.
>
>         A concern is there was already some code in place to check for
>         async close, which if detected throws away all the data and
>         sets the underlying filtered input stream is set to null. So
>         all that effort draining is wasted.
>
>         Another solution is to keep checking in the loop of
>         drainInputStream for an async close and fail-fast:
>
>                  private InputStream drainInputStream(InputStream in)
>                          throws IOException {
>                      int n = 0;
>                      int j;
>                      byte[] a = null;
>                      while ((j = in.available()) > 0) {
>                          if (buf == null) // asynchronous close()?
>                              return null; // stop reading and discard
>                          a = (a == null) ? new byte[j] :
>         Arrays.copyOf(a, n + j);
>                          n += in.read(a, n, j);
>                      }
>
>                      if (buf == null) // asynchronous close()?
>                          return null; // discard
>                      else if (a == null)
>                          return ProcessBuilder.NullInputStream.INSTANCE;
>                      else
>                          return ByteArrayInputStream(n == a.length ? a
>         : Arrays.copyOf(a, n));
>                  }
>
>                  /** Called by the process reaper thread when the
>         process exits. */
>                  synchronized void processExited() {
>                      // Most BufferedInputStream methods are
>         synchronized, but close()
>                      // is not, and so we have to handle concurrent
>         racing close().
>                      try {
>                          InputStream in = this.in <http://this.in>;
>                          if (in != null) {
>                              InputStream stragglers =
>         drainInputStream(in);
>                              in.close();
>         this.in <http://this.in> = stragglers;
>                          }
>                      } catch (IOException ignored) {
>                          // probably an asynchronous close().
>                      }
>                  }
>
>         This still has the potential to look at the availability of
>         the current input before async closing, and read from a
>         different input after async closing, but it will break out on
>         the next loop check if there are more bytes available to read
>         otherwise it will be caught again on the final check, so no
>         munged data will be exposed.
>
>
>
>                 Plus there was already some logic checking if close
>                 was asynchronously called (see BufferedInputStream.close):
>
>                   372                         if (buf == null) //
>                 asynchronous close()?
>                   373 this.in <http://this.in> = null;
>
>                 and this would no longer be required
>
>             Yes, you're right and these two lines can be removed.
>             I wouldn't remove the surrounding try {} catch
>             (IOException) though, as the stream can still be closed
>             asynchronously.
>
>                 I am somewhat struggling to understand the test (not
>                 too familiar with this area). Requires a more expert
>                 eye in this area than I.
>
>                 IIUC you are trying to induce the the output process
>                 fd of ExecLoop to "flip" to another fd opened by OpenLoop.
>
>             Yes.
>             I actually adopted a testcase provided by the reporter of
>             the problem.
>             I only changed the test in such a way that it can be run
>             in an automated mode.
>
>             Below is the exact comment from the original testcase.
>             It may be a good idea to include it in the regtest.
>
>             /* Following code is a test case that demonstrates a close
>             race in ProcessPipeInputStream.
>             *.
>             * A snippet of the offending code within
>             ProcessPipeInputStream:.
>             *       // Called by the process reaper thread when the
>             process exits..
>             *       synchronized void processExited() {
>             *          // Most BufferedInputStream methods are
>             synchronized, but close()
>             *           // is not, and so we have to handle concurrent
>             racing close().
>             *           try {
>             *               InputStream in = this.in <http://this.in>;
>             *               if (in != null) {
>             *                   byte[] stragglers = drainInputStream(in);
>             *                   in.close();
>             *
>             * The comment even talks about the race. However, the code
>             is not adequately
>             * protecting against it. drainInputStream() (available()
>             in particular) is not
>             * threadsafe when a close happens in parallel. What's
>             worse is that a subsequent
>             * open() is likely to reassign the closed fd so
>             drainInputStream ends up draining
>             * the wrong fd! This can cause OOMs, data corruption on
>             other fds, and other very
>             * bad things.
>             *.
>             * Notes:.
>             *  - Test creates a large sparse file in /tmp/bigsparsefile
>             *  - Test uses "/proc/meminfo" for simple input..
>             *  - Invoke using "-Xmx128m" to insure file size of 1G is
>             greater than heap size
>             *  - Test will run forever, if successful there will not
>             be any OOM exceptions
>             *  (on my test system they occur about 1 per minute)
>             *  - You need multiple cores (I use 2 x Xeon E5620 = 8
>             cores, 16 threads).
>             *..
>             * The basic premise of the test is to create a 1GB sparse
>             file. Then have lots
>             * of threads opening and closing this file. In a loop
>             within a separate thread use UNIXprocess
>             * to run a simple command. We read the output of the
>             command, make sure the process
>             * has exited and then close the inputstream. When the bug
>             happens, processExited()
>             * will use FIONREAD to figure out how much data to drain
>             BUT the fd it is checking
>             * will be closed and reopened (now referring  to
>             /tmp/bigsparsefile) so it will
>             * see the size as 1GB and then attempt to allocate a
>             buffer of that size and.
>             * then OOM..
>             */
>
>
>
>                   The test concerns me since it is gonna use up loads
>                 of resources and proceed for 10 minutes until deciding
>                 beyond reasonable doubt that there is no issue.
>
>             When I ran the test across different machines (with the
>             unpatched jdk) it reported from one to five races during
>             10 minutes.
>             We can surely decrease the timeout for the price of higher
>             probability of a false positive result.
>
>         Unit-like tests should really run in under a minute, plus
>         should ideally be frugal in their resource consumption
>         otherwise it could starve other tests running in parallel. If
>         we cannot create a more deterministic test (not sure it is
>         possible) then perhaps this test belongs to SQE and not in the
>         JDK tests?
>
>         Paul.
>
>
>