ProcessReaper: single thread reaper

Roger Riggs Roger.Riggs at Oracle.com
Tue Mar 25 21:59:34 UTC 2014


Hi Peter,

The stream draining code was added in 2011 to address resource exhaustion
due to poor hygiene.

  * JDK-6944584 <https://bugs.openjdk.java.net/browse/JDK-6944584>
    Improvements to subprocess handling on Unix

Moves the resource 'leak' from the OS kernel/process space to the heap
and allows the process to be cleaned up sooner.

Roger


On 3/25/14 9:50 AM, Peter Levart wrote:
> On 03/25/2014 05:25 PM, Roger Riggs wrote:
>> Hi Peter,
>>
>> Thanks for the comments, I'll fix the races conditions, I intended
>> to go back to the dedicated thread but wanted to check for scalability
>> in the case of a large number of threads and non-trivial termination 
>> overheads
>> where the streams had to be drained.
>>
>
> Why does stream need to be drained right away (in reaper thread)? 
> Isn't it enough for child pid to be waited upon and let the client of 
> the stream pull from the stream until it reaches the EOF or not if it 
> doesn't need the data... Is this just to be able to reclaim the file 
> descriptor of the pipe right away? Wouldn't FileInputStream's close() 
> method do the job (or finalize() if user forgets to call close())? 
> There would be a native resource leakage if user forgets to call 
> close() AND retains the reference to InputStream, but this is the same 
> with all normal streams like FileInputStream...
>
> Regards, Peter
>
>> Thanks, Roger
>>
>>
>>
>> On 3/25/14 7:05 AM, Peter Levart wrote:
>>> On 03/24/2014 10:05 PM, roger riggs wrote:
>>>> Hi Rob, Martin, et.al.
>>>>
>>>> I've prototyped (for 9) a thread reaper[1] that uses a single 
>>>> thread to wait for exiting
>>>> processes and calling back to the process with the exit status.
>>>> the interesting part is getting the exit status back to the Process 
>>>> that needs it
>>>> It needs more testing and hardening.
>>>>
>>>> I had not considered using a signal handler for SIGCHLD but that's 
>>>> an option,
>>>> though we need to be very careful about thread usage.
>>>>
>>>> Roger
>>>>
>>>> p.s. comments on the single thread reaper appreciated (in a new thread)
>>>> [1] http://cr.openjdk.java.net/~rriggs/webrev-waitpid/
>>>>
>>>>
>>> Hi Roger,
>>>
>>> I think I found a little race. Suppose reaper task is still alive 
>>> and that all consumers have been serviced (consumerCount is 0). The 
>>> reaper task waits for 500 millis for next consumer to be registered, 
>>> but times out. Before calling "reaperThread.release()", new consumer 
>>> comes around and registers itself, also calling runReaper(), but 
>>> since reaperThread.release() has not yet been called by old reaper 
>>> task, new reaper task is not submitted to commonPool. The old reaper 
>>> task finishes, leaving one consumer on the waitingList with no 
>>> reaper task to service it. If no new consumers get registered, the 
>>> waiting consumer will never be notified...
>>>
>>> The simplest solution for this race, I think, would be to have a 
>>> dedicated long-running thread. It could be spawned lazily, but then 
>>> it would never finish.
>>>
>>> Otherwise a nice solution with two lists (exitList/waitList) and 
>>> avoidance of race with reversed orders between
>>> - consumer registration: register on waitList 1st then check 
>>> exitList, and
>>> - exit event dispatch: register on exitList 1st then check waitList
>>>
>>> ...but the check you use with consumeCount local variable to detect 
>>> processes spawned by other means (for purposes of logging only) has 
>>> a race:
>>>
>>> thread1: Suppose a new consumer is being registered with 
>>> onExitCall(...), is added on the waitList, but before checking 
>>> exitList().size() and iterating the exitList, ...
>>> thread2: the reaper task detects that the very same process has 
>>> finished (gets it's pid from waitpid) and adds it's pid to exitList. 
>>> Then before iterating waitList, ...
>>> thread1: iterates the exitList, finds a match and consumes the pid, 
>>> removing the matching entries from both exitList and waitList. Then ...
>>> thread2: the reaper task iterates waitList, doesn't find a matching 
>>> entry for exitPid, doesn't increment consumeCount and voila: debug 
>>> log("Unexpected process exit for pid:...").
>>>
>>>
>>> That's enough races for today.
>>>
>>> Regards, Peter
>>>
>>>>
>>>>
>>>> On 3/24/2014 12:38 AM, Rob McKenna wrote:
>>>>> Hi folks,
>>>>>
>>>>> Roger Riggs (cc'd) may want to chip in here as he's looking at the 
>>>>> reaper thread arrangement in 9 at the moment.
>>>>>
>>>>> On another note, I too support the merging of those files. I 
>>>>> didn't think there was much appetite for it at the time so I must 
>>>>> admit this fell down my todo list. Looking at this bug did remind 
>>>>> me that its something worth trying though. As per Alan's mail, I'm 
>>>>> going to tackle it separately if you folks don't mind. I'll have a 
>>>>> look at Peter's changes (thanks Peter!) as soon as I can and see 
>>>>> about getting them in.
>>>>>
>>>>>     -Rob
>>>>>
>>>>> On 23/03/14 22:30, Martin Buchholz wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 23, 2014 at 2:34 AM, Martin Buchholz 
>>>>>> <martinrb at google.com <mailto:martinrb at google.com>> wrote:
>>>>>>
>>>>>>
>>>>>>     We have also thought about whether having reaper threads is
>>>>>>     necessary.  The Unix rule is that child processes should be
>>>>>>     waited for, and some thread needs to do that.  There's no way
>>>>>>     to wait for a set of child pids, or to specify a "completion
>>>>>>     handler".  Well, you might be able to get the newish waitid()
>>>>>>     to do what you want, but it looks like it's not sufficient
>>>>>>     when java is running inside a process that might do
>>>>>>     independent subprocess creation outside of the JVM.
>>>>>>
>>>>>>
>>>>>> Actually, I take it back.  With sufficient work, it looks like 
>>>>>> you can get SIGCHLD to give you pid information in siginfo_t 
>>>>>> si_pid, and that can be used to trigger the reaping.  It looks 
>>>>>> like waitpid is "async-signal-safe", so we can call it from our 
>>>>>> signal handler.
>>>>>>
>>>>>> While we're at it we can fix SIGCHLD handling to do signal 
>>>>>> chaining, as with other signals.
>>>>>
>>>>
>>>
>>
>




More information about the core-libs-dev mailing list