SelectableChannels and Process API

Wed Apr 15 19:54:34 UTC 2015

I have internalized the idea that finalization is a method of last resort.
If there's a chance to free some OS resource by doing some work NOW, do it,
and don't leave it for a finalizer to do later.

On Wed, Apr 15, 2015 at 12:31 PM, Peter Levart <peter.levart at gmail.com>
wrote:

>
>
> On 04/15/2015 07:59 PM, Martin Buchholz wrote:
>
> I was at least partly responsible for the pipe buffer cleanup code.
>
> Subprocess terminates, but may have written some data to the pipe buffer
> (typically 4k on Linux).  Usually the pipe buffer is empty, but in case
> it's not, you don't want to lose the straggler data, you want to drain it
> and close the file descriptor, because it's easier to manage the memory
> than the fd.  Messy, but I didn't see a better way.
>
>
> But the data would stay there (in the pipe's buffer) until it is read by
> the user. The producing end of pipe may already be closed, but the
> consuming end is still open. You would just have to keep the file
> descriptor open and let user drain and close it (or leave it to
> FileInputStream finalizer to close it). Yes, a file descriptor will be
> potentially open some more time, but you wouldn't loose any data. That's
> how Windows implementation works, I think. There's not reaper thread in
> Windows that would trigger asynchronous actions when subprocess exits.
>
> Regards, Peter
>
>
>
> On Tue, Apr 14, 2015 at 11:31 PM, Peter Levart <peter.levart at gmail.com>
> wrote:
>
>> Hi Roger,
>>
>> So I started new thread...
>>
>>
>> On 04/14/2015 11:33 PM, Roger Riggs wrote:
>>
>>>
>>> On 4/14/2015 11:47 AM, Peter Levart wrote:
>>>
>>>> I have been thinking of another small Process API update. Some people
>>>> find it odd how redirected in/out/err streams are exposed:
>>>>
>>>> http://blog.headius.com/2013/06/the-pain-of-broken-subprocess.html
>>>>
>>> yep, I've read that several times.
>>>
>>
>> To be fair, it's mostly, but not entirely correct. The part that says:
>>
>> " So when the child process exits, the any data waiting to be read from
>> its output stream is drained into a buffer. All of it. In memory.
>>
>> Did you launch a process that writes a gigabyte of data to its output
>> stream and then terminates? Well, friend, I sure hope you have a gigabyte
>> of memory, because the JDK is going to read that sucker in and there's
>> nothing you can do about it. And let's hope there's not more than 2GB of
>> data, since this code basically just grows a byte[], which in Java can only
>> grow to 2GB. If there's more than 2GB of data on that stream, this logic
>> errors out and the data is lost forever."
>>
>> ...is exaggeration. This does not happen as the pipe has a bounded
>> buffer. When subprocess exits, there is at most that much data left in the
>> buffer (64k typically) and only that much is sucked into the Java process
>> and the underlying handle closed.
>>
>>
>>>> They basically don't like:
>>>>
>>>> - that exposed Input/Output streams are buffered
>>>> - that underlying streams are File(Input/Output)Streams which, although
>>>> the backing OS implementation are not files but pipes, don't expose
>>>> selectable channels so that non-blocking event-based IO could be performed
>>>> on them.
>>>> - that exposed IO streams are automatically "managed" in UNIX variants
>>>> of ProcessImpl which needs subtle "hacks" to do it in a perceptively
>>>> transparent way (delayed close, draining input on exit and making it
>>>> available after the underlying handle is already closed, ...)
>>>>
>>>> So I've been playing with the idea of exposing the "real" pipe channels
>>>> in last couple of days. Here's the prototype I came up with:
>>>>
>>>>
>>>> http://cr.openjdk.java.net/~plevart/jdk9-sandbox/JDK-8046092-branch/Process.PipeChannel/webrev.01/
>>>>
>>>> This adds new Redirect type to the API and 3 new methods to Process
>>>> that return Pipe channels when this new Redirect type is used. It's
>>>> interesting that no native code changes were necessary. The behavior of
>>>> pipes on Windows is a little different (perhaps because the Pipe NIO API
>>>> uses sockets under the hood on Windows - why is that? Windows does have a
>>>> pipe equivalent). What bothers me is that file handles opened on files
>>>> (when redirecting to/from File) can be closed as soon as the subprocess is
>>>> started and the subprocess is still able to read/write from the files (like
>>>> with UNIX). It's not the same with pipe (i.e. socket) handles on Windows.
>>>> They must be closed only after subprocess exits.
>>>>
>>>> If this subtle difference between file handles and socket handles on
>>>> Windows could be dealt with (perhaps some options exist that affect
>>>> subprocess spawning), then the extra waiting thread would not be needed on
>>>> Windows.
>>>>
>>>> So what do you think of this API update?
>>>>
>>> Definitely worthy of a separate thread.  It looks promising and
>>> addresses some of the issues
>>> raised, while moving other problems from the implementation to the
>>> application.
>>> Such as closing of the channels and cleanup.  I worry about how the
>>> resources are freed
>>> if the code spawning the app doesn't do the cleanup.  Will it require
>>> hooks (like a finalizer)
>>> to do the cleanup?
>>> Also, it doesn't help with Martin's goal of being able to implement
>>> emacs in Java since it doesn't provide pty control.
>>> As you are aware the complexity in Process is to ensure a timely cleanup
>>> and
>>> allowing the Process to terminate and release the process resources
>>> when it was done and not having to wait for the stdout/stderr consumer.
>>>
>>
>> I wonder how this automatic stream cleanup really helps in real-world
>> programs. It doesn't help the Process to terminate and release the process
>> resources any sooner as the process terminates on it's own (unless killed)
>> and OS releases it's resources without the outside help anyway. Draining
>> and closing the stream after the process has already exited just releases
>> one file handle (the consuming side of the pipe) in a promptly manner. This
>> could be left to the user and/or finalizer. Draining after the process has
>> already exited does not help the process to exit any sooner as it happens
>> after the fact. A program that doesn't consume the stream can cause the
>> process to hang forever as the pipe's buffer is bounded (64k typically). So
>> draining and closing after the process has exited only potentially helps
>> for the last 64k of the stream and only to release one file handle in a
>> potentially more timely manner.
>>
>> OTOH now that ProcessImpl for UNIX does that (and why does Windows
>> implementation not do that?) sloppy programs might exist that would
>> potentially break if the status quo is not maintained.
>>
>> But new functionality need not be so permissive. I'll take a look at how
>> and if Channel(s) do any kind of automatic cleanup based on reachability
>> and whether this can be bolted on for Process use. I doubt it is possible
>> to drain and close a Channel without disturbing the ongoing Selector IO
>> processing...
>>
>> Regards, Peter
>>
>>
>>> Thanks, Roger
>>>
>>>
>>>
>>
>
>