SelectableChannels and Process API

Wed Apr 15 18:11:45 UTC 2015

Hi Peter,

I don't know the history behind the stream draining in ProcessImpl.
I understood it to be a performance/scalability issue.
Maybe Martin, Alan, or someone else can fill in the history.

Roger

On 4/15/2015 2:31 AM, Peter Levart wrote:
> Hi Roger,
>
> So I started new thread...
>
>
> On 04/14/2015 11:33 PM, Roger Riggs wrote:
>>
>> On 4/14/2015 11:47 AM, Peter Levart wrote:
>>> I have been thinking of another small Process API update. Some 
>>> people find it odd how redirected in/out/err streams are exposed:
>>>
>>> http://blog.headius.com/2013/06/the-pain-of-broken-subprocess.html
>> yep, I've read that several times.
>
> To be fair, it's mostly, but not entirely correct. The part that says:
>
> " So when the child process exits, the any data waiting to be read 
> from its output stream is drained into a buffer. All of it. In memory.
>
> Did you launch a process that writes a gigabyte of data to its output 
> stream and then terminates? Well, friend, I sure hope you have a 
> gigabyte of memory, because the JDK is going to read that sucker in 
> and there's nothing you can do about it. And let's hope there's not 
> more than 2GB of data, since this code basically just grows a byte[], 
> which in Java can only grow to 2GB. If there's more than 2GB of data 
> on that stream, this logic errors out and the data is lost forever."
>
> ...is exaggeration. This does not happen as the pipe has a bounded 
> buffer. When subprocess exits, there is at most that much data left in 
> the buffer (64k typically) and only that much is sucked into the Java 
> process and the underlying handle closed.
>
>>>
>>> They basically don't like:
>>>
>>> - that exposed Input/Output streams are buffered
>>> - that underlying streams are File(Input/Output)Streams which, 
>>> although the backing OS implementation are not files but pipes, 
>>> don't expose selectable channels so that non-blocking event-based IO 
>>> could be performed on them.
>>> - that exposed IO streams are automatically "managed" in UNIX 
>>> variants of ProcessImpl which needs subtle "hacks" to do it in a 
>>> perceptively transparent way (delayed close, draining input on exit 
>>> and making it available after the underlying handle is already 
>>> closed, ...)
>>>
>>> So I've been playing with the idea of exposing the "real" pipe 
>>> channels in last couple of days. Here's the prototype I came up with:
>>>
>>> http://cr.openjdk.java.net/~plevart/jdk9-sandbox/JDK-8046092-branch/Process.PipeChannel/webrev.01/ 
>>>
>>>
>>> This adds new Redirect type to the API and 3 new methods to Process 
>>> that return Pipe channels when this new Redirect type is used. It's 
>>> interesting that no native code changes were necessary. The behavior 
>>> of pipes on Windows is a little different (perhaps because the Pipe 
>>> NIO API uses sockets under the hood on Windows - why is that? 
>>> Windows does have a pipe equivalent). What bothers me is that file 
>>> handles opened on files (when redirecting to/from File) can be 
>>> closed as soon as the subprocess is started and the subprocess is 
>>> still able to read/write from the files (like with UNIX). It's not 
>>> the same with pipe (i.e. socket) handles on Windows. They must be 
>>> closed only after subprocess exits.
>>>
>>> If this subtle difference between file handles and socket handles on 
>>> Windows could be dealt with (perhaps some options exist that affect 
>>> subprocess spawning), then the extra waiting thread would not be 
>>> needed on Windows.
>>>
>>> So what do you think of this API update?
>> Definitely worthy of a separate thread.  It looks promising and 
>> addresses some of the issues
>> raised, while moving other problems from the implementation to the 
>> application.
>> Such as closing of the channels and cleanup.  I worry about how the 
>> resources are freed
>> if the code spawning the app doesn't do the cleanup.  Will it require 
>> hooks (like a finalizer)
>> to do the cleanup?
>> Also, it doesn't help with Martin's goal of being able to implement
>> emacs in Java since it doesn't provide pty control.
>> As you are aware the complexity in Process is to ensure a timely 
>> cleanup and
>> allowing the Process to terminate and release the process resources
>> when it was done and not having to wait for the stdout/stderr consumer.
>
> I wonder how this automatic stream cleanup really helps in real-world 
> programs. It doesn't help the Process to terminate and release the 
> process resources any sooner as the process terminates on it's own 
> (unless killed) and OS releases it's resources without the outside 
> help anyway. Draining and closing the stream after the process has 
> already exited just releases one file handle (the consuming side of 
> the pipe) in a promptly manner. This could be left to the user and/or 
> finalizer. Draining after the process has already exited does not help 
> the process to exit any sooner as it happens after the fact. A program 
> that doesn't consume the stream can cause the process to hang forever 
> as the pipe's buffer is bounded (64k typically). So draining and 
> closing after the process has exited only potentially helps for the 
> last 64k of the stream and only to release one file handle in a 
> potentially more timely manner.
>
> OTOH now that ProcessImpl for UNIX does that (and why does Windows 
> implementation not do that?) sloppy programs might exist that would 
> potentially break if the status quo is not maintained.
I think Windows use of handles makes sure they are open for as long as 
any process
holds a handle, so they don't get prematurely closed.
>
> But new functionality need not be so permissive. I'll take a look at 
> how and if Channel(s) do any kind of automatic cleanup based on 
> reachability and whether this can be bolted on for Process use. I 
> doubt it is possible to drain and close a Channel without disturbing 
> the ongoing Selector IO processing...
>
> Regards, Peter
>
>>
>> Thanks, Roger
>>
>>
>