Thread hangs reading from process output streams, even though process has terminated. (possible JDK bug?)

Martin Buchholz martinrb at google.com
Wed Apr 23 23:16:38 UTC 2014


Very high level:

Ensuring that streams get EOF when subprocesses terminate is very tricky,
and we did a bunch of work on Linux and Solaris to try to make it reliable.
 And even now we're not quite there - If grandchildren linger, they might
keep file descriptors open.  I'm not aware of similar problems on Windows,
but it's not at all surprising if the same kinds of effects are seen.


On Wed, Apr 23, 2014 at 2:31 PM, Bruno Medeiros <bruno.do.medeiros at gmail.com
> wrote:

> After exploring this bug when running my full application, I have a lead on
> what seems to be a necessary condition/cause for it, and possibly a way to
> create a short reproducible case. The isolated code I posted originally is
> not enough.
>
> Here is what I found out. First lets call the process my Java application
> starts, process A, the one that terminates, and yet the stream reader
> threads hang upon.
> A necessary condition for that bug to happen, is that *another process* is
> started by the Java application, and similarly some worker threads are
> spawned to read the streams of that process. Let's call this process B.
> Process B doesn't not terminate because it is a server program.
> Here's an interesting bit: if process B is forcibly killed, the reader
> threads of process A become unstuck! (to be clear the processes are not
> related. They are not even the same program.)
> I should be able to reduce this to a short reproducible example, as soon as
> I have more time.
>
> Also, I tried JDK 8, but was not able to reproduce the issue. But given the
> fickle nature of this bug, it's no guarantee the bug is not present in JDK
> 8. So I still want to find the cause of this and see it resolved.
>
>
>
> On Wed, Apr 16, 2014 at 5:23 PM, Bruno Medeiros <
> bruno.do.medeiros at gmail.com
> > wrote:
>
> > I have some code where I start an external process
> (ProcessBuilder.start()
> > ,etc.) and then I spawn two worker threads to read the stdout and stderr
> of
> > the external process. I directly read the streams provided by
> > process.getInputStream() and process.getErrorStream() , I'm not wrapping
> > them with my own streams or anything. Rather, the worker threads are
> > calling java.io.InputStream.read(byte[]) in a loop.
> >
> > I've encountered a situation, where the worker threads hang despite the
> > process having been terminated already!
> > ( Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) ,
> > Windows 7)
> >
> > I'm able to caught this whilst running the Java program under the
> > debugger. I invoked process.exitValue() under the debugger to see if the
> > JVM has indeed realized the process has terminated. It returned 0, so it
> > seems it knows the process has terminated. Yet the streams are still
> > blocked, in a native method:
> >
> > The stdout worker thread is stuck here:
> > Daemon Thread [ExternalProcessEclipseHelper.MainWorker] (Suspended)
> >     owns: BufferedInputStream  (id=145)
> >     FileInputStream.readBytes(byte[], int, int) line: not available
> > [native method]
> >     FileInputStream.read(byte[], int, int) line: 272
> >     BufferedInputStream.fill() line: 235
> >     BufferedInputStream.read1(byte[], int, int) line: 275
> >     BufferedInputStream.read(byte[], int, int) line: 334
> >     BufferedInputStream(FilterInputStream).read(byte[]) line: 107
> >
> >
> ExternalProcessNotifyingHelper$1(ExternalProcessHelper$ReadAllBytesTask).doRun()
> > line: 73
> >
> > The stderr worker thread is similarly stuck :
> > Daemon Thread [ExternalProcessEclipseHelper.StdErrWorker] (Suspended)
> >     FileInputStream.readBytes(byte[], int, int) line: not available
> > [native method]
> >     FileInputStream.read(byte[]) line: 243
> >
> >
> ExternalProcessNotifyingHelper$2(ExternalProcessHelper$ReadAllBytesTask).doRun()
> > line: 73
> >
> > Could this be a JVM bug? I don't see that this scenario should ever be
> > happening, unless some other part of my code somehow did some violation
> and
> > messed up the JVM state.
> >
> > I've added a sample of the relevant code I'm using here:
> > https://github.com/bruno-medeiros/Scratchpad/tree/jvm-processio-issue
> > However, I haven't yet been able to replicate this bug using the isolated
> > code from there. At the moment, I can only replicate it when I run my
> full
> > application. The sample code could be simplified further, but I haven't
> > done it yet since I couldn't replicate the bug using that.
> >
> > One interesting bit, is that I can only replicate it when I run the
> > application for the first time, per computer session. That is,
> apparently I
> > need to reboot my computer for the bug to manifest again!
> >
> > I'd like to narrow this down, but I would appreciate some help or
> > suggestions for that. What could affect the JVM, such that subsequent
> > invocations apparently don't cause the bug? Some code cache issue? I also
> > wonder if the OSGi runtime could be a factor here.
> >
> > --
> > Bruno Medeiros
> > https://twitter.com/brunodomedeiros
> >
>
>
>
> --
> Bruno Medeiros
> https://twitter.com/brunodomedeiros
>



More information about the core-libs-dev mailing list