Some Classes with a public void close() don't implement AutoCloseable

Sat Apr 18 19:18:55 UTC 2020

On Apr 18, 2020, at 8:18 AM, Florian Weimer <fw at deneb.enyo.de> wrote:
> 
> * John Rose:
> 
>> On Apr 17, 2020, at 9:35 AM, Florian Weimer <fw at deneb.enyo.de> wrote:
>>> 
>>> * forax:
>>> 
>>>> - until lambdas are retrofitted as inline types, i'm worry people
>>>> will use forceTryable with a ReentrantLock, try(var __ =
>>>> forceTryable(lock::unlock)) { ... }  because most of the time, the
>>>> lambda will not be allocated but not always, sometimes it may still
>>>> be found escaping by the JITs.
>>> 
>>> The JDK itself routinely does something very similar when registering
>>> cleaners.  Is this lack of robustness in the presence of OOMEs a
>>> quality-of-implementation issue?  Or less or more important?
>> 
>> Turning this around in my mind, it seems like the root cause of
>> this non-robustness is a minor mis-feature of TWR.  TWR says
>> to itself, “I only have one job, and that is to call close”.  That’s
>> fine, but that means that means that the object to be closed
>> must be created *and* opened before TWR takes control.
>> If the closing operation is mediated by an adapter, the adapter
>> will almost certainly be created *after* the original (unadapted)
>> object is created and opened.  This is the point where an OOME
>> could bollix things up.  To defend against such failures, TWR
>> would have to define a distinct point where the object is opened.
>> This point would be a precise state transition immediately before
>> entry into the block covered by catches and the `finally` block.
> 
> I think these issues were discussed and are the reason why
> try-with-resources requires the introduction of a new variable, to
> make clearer that there is no ownership in the evaluation of the
> expression.

The new variable requirement is reasonable, but as you know it
only amounts to a “nudge” in the right direction.  The “var” feature,
and perhaps pattern matching, make it easier to work around that
nudge.

And there’s a deeper problem with this attempt to nail down the life
cycle of the TWR resource to a named var.  Unlike C++, Java vars don’t
have a life cycle with a constructor bound to the occurrence of the var;
they just have a pre-history in whatever expression produced the initial
value of the var.  So the “stuff” that creates the var of a TWR has no
particular relation to the structure of the TWR.  My suggestion about
an AutoOpenable attempts to nail down the part of the “stuff” which
really and necessarily pertains to the TWR’s job, which is the specific
(sometimes atomic) operation that must be reversed by AutoCloseable::close.

> I guess this probably was onl partially successful, due to the
> wrapping of streams and code like this:
> 
>  try (PrintStream ps = new PrintStream(new FileOutputStream
>                                        (filename))) {
>      doCertReq(alias, sigAlgName, ps);
>  }
> 
> It should have been:
> 
>  try (InputStream in = new FileOutputStream (filename);
>      PrintStream ps = new PrintStream(in)) {
>      doCertReq(alias, sigAlgName, ps);
>  }
> 
>> How might this mis-feature be corrected?  Maybe it can’t.
>> Maybe there’s no way to deal with lost closes other than
>> educating programmers to avoid allocations or other
>> computations in the head of the TWR, at least allocations
>> after the logical open.  (Those are the ones which might cause
>> an OOME or SOE that would lead to a lost close.)
> 
> In the streams example above, the consequences of the wrong style may
> not be so bad because (at least in OpenJDK 8), garbage collection
> eventually cleans up the temporary resource leak.

GC is great; it lets us simplify everything.  But there’s a cost, which
is that the much-ignored GC can sometimes create very rare side effects.
(Same point for the virtually-unlimited JVM control stack.)  The corner
case I’m worrying at here is what happens when a very rare GC side effect
causes a lost close.  The solution I’m groping for is a clearer signal from
the source code about which operation is the Point of No Return when
the close must *not* be lost; currently the JVM & JIT only see a pile
of “stuff” including possible GC side effects, leading up to the actual
try/finally underneath the TWR.

To actually collect benefit from my groped-for solution, the libraries
that define things like the streams in your example would have to
be retrofitted to “speak AutoOpenable”, and (even worse) users might
well need to change their code to use new API points.

(Thought experiment:  What if most streams implemented AutoOpenable?
Then they would distinguish “pre-open” from “really open” states, where today
their constructors return them in fully open states, ready and raring to go.
How would this new “pre-open” state work?  For compatibility it would
have to automagically transition to “fully open” the first time any side-effect
was applied to the stream.  Surprises in store for the user!  But when just
chaining streams together, they would conspire to stay in the pre-open
state, in case a TWR was driving the bus.  Then the TWR would say,
“we are open for business” and all of them would then allocate resources,
with all due attention to fractional intermediate states needing cleanup.
In non-TWR use, the first “hard touch” on the outermost stream would
cause this resource allocation to happen.  Is it all worth it?  I don’t know;
a thought experiment cannot resolve this question IMO.)

> With the unlock example from the proposed function, the effect would
> be permanent, though.
> 
> However, with the move from finalize() to Cleaners, the OpenJDK itself
> has given up on avoiding permanent resource leaks due to OOME at
> inopportune moments, I think.

Yes, cleaners affect the balance here.  In the future, if we invest more
heavily in thread-confined data (see Panama), then missed closes might
also be swept up when a thread dies.  OTOH, thread-confined abstractions
rely heavily on TWR, so there’s possibly more motivation to enhance
TWR reliability.

> But (and this is why I’m going on at length here) maybe there
>> is a way to give the user more help avoiding lost closes.
>> It seems that if TWR were willing to define open operations
>> separately, some progress could be made.  Suppose the object
>> at the head of a TWR implemented an optional new interface
>> `java.lang.AutoOpenable`.  In such cases, the TWR would
>> call a nullary `open` operation on that object, immediately
>> before entering the block covered by the `finally` clause
>> that calls `close`. Would this help?  Yes, it would, because
>> the designer of the abstraction at the head of the TWR could
>> take care to do all argument validation and storage allocation
>> *without* doing the actual logical open or allocate the resource
>> that needs to be closed (or seize the lock).  The evaluation
>> of the head expression(s) of the TWR would (as before) not
>> be under any finally clause or catches.  The new feature here
>> would be that the designer of the abstraction (at the head
>> of the TWR) would know that the expression creating
>> the abstraction would do all necessary storage allocation,
>> adapter creation, argument validation, and (if necessary)
>> stack banging, *before* the final critical opening step.
>> It would not be up to the end user; it would be more under
>> the control of the abstraction, to sequence the preparatory
>> steps *before* the logical open.
> 
> Is this proposal similar to context managers for Python's with
> statement?

Yes, because it distinguishes __init__ from __enter__, rather than
putting their operations into the same “pile of stuff”.

The state diagram for a TWR resource today is:

created-and-opened =close=> closed

To the extent that we really, really need to distinguish a separate
“open” event that exactly matches the “close” event, we would
benefit from a TWR which supported this life cycle:

created =open=> opened =close=> closed

The use cases that support such an additional distinction are
involve worryingly low-level side effects like OOME and SOE
which break through otherwise-intact abstractions, which
(apart from OOME and SOE) would seem only to require the
two-state model instead of the three-state model.

> I don't think this helps with the chaining problem or the cleaners
> issue.  Maybe those issues are unrelated, after all.

I think it *would* help with some chaining problems, as I sketched
above.  It’s not surprising:  If you add a new O-O entry point that
reifies a new state in the state diagram, then the various objects
can get busy and properly model that new state, by means of
appropriate overrides to the new O-O entry point, and refactorings
to their interior state.

If you compose object A on top of object B, as var a = new A(new B()),
then the composite *can* (if both classes agree) implement two
states instead of one.  I assume Eric’s language (Prompto) does
something like this, and so can Python.  With exquisite care, the
transition to the second state (opened) can avoid even low-level side
effects like OOME and SOE, because all such effects can be covered by
previous actions performed for the first state (created).  This can
be ensured by cooperating objects A and B even with encapsulation
of B relative to A and vice versa, and can scale to any number of
objects A, B, C…

Here’s one more important point, and I’m done for now:  Even if
the end-user carelessly loads any number of side-effect-ful items
into the TWR head expression, none of those side effects interacts
with the TWR proper (this is an intentional feature of TWR today).
So TWR is robust about mis-placement of “piles of stuff” in the
TWR head expression, with the exception of the chaining problem,
when the user should have called out several “close points” to the
TWR.  But, if there’s a new state transition (AutoOpenable::open)
then the semantics of the “open” operation, like those of the “close”
operation, are not under the control of the end-user, but rather of
the library writer, who can more carefully take responsibility to
account for an exact state transition (open) that matches exactly to
the final state transition (close).  The library abstraction controls
atomicity and bracket matching, instead of the vagaries of user code.
Rather than requiring named variables to “hint” at the close points,
the object itself, and its abstraction over possible sub-objects, controls
both the opening and closing points.  I think that may be enough to
justify the cost of an upgrade to TWR, *if* the high cost would in fact
be justified at all, which it might not.

I’m done here, for now.  Got other stuff I need to worry about.
I hope an appropriate EG can take this up when a convenient
time rolls around.

— John