[jmm-dev] finalization

Wed Aug 13 14:52:48 UTC 2014

On 08/13/2014 08:38 AM, Doug Lea wrote:
> Consider:
>
> class ResourceHolder {
>    Resource r = createResource();
>    void f() { new Thread(() -> { use(r); }).start(); }
>    void finalize() { destroyResource(); }
>    static void use(Resource r) { ... } // maybe native or JNI
> }

To avoid details of compilation of lambdas (thanks Jan!),
I should have written it out in a more awkward unambiguous form:

class ResourceHolder {
   Resource r = createResource();
   void f() { new Thread(new Use(r)).start(); }
   void finalize() { destroyResource(); }
   static void use(Resource r) { ... } // maybe native or JNI

   static final class Use implements Runnable {
     final Resource resource;
     Use(Resource r) { this.resource = r; }
     public void run() { ResourceHolder.use(resource); }
   }
}

>
> This seems to be the clearest case in which a ResourceHolder can go
> out of scope (or otherwise become unreachable) while its resource is
> being used.  But under GC/finalization rules that are friendly to
> concurrent collectors (i.e., where the collector, not just the usage,
> may be concurrent), many different-looking constructions can be just
> as problematic.  The use of explicit reachability fences (above,
> inserting one after "use(r)") seems to be applicable to all of them:
> Ensure that every access of a destroyable resource, or use as an an
> argument in another method that does is followed by a reachability
> fence. This might be a candidate for language integration if applied
> to (only) the associated fields of finalizable classes. But we have no
> way of identifying the relevant fields and usages. This does suggest
> an extension of the tool/IDE approach though: Support a @Finalized
> annotation for fields, and let tools take it from there.  Possibly
> even source compilers (javac) themselves, but I can easily imagine
> cases where programmers would want to hand-craft specializations.  The
> annotation approach also seems to fit other scenarios discussed below.
>
> -Doug
>
> On 08/08/2014 07:54 PM, Hans Boehm wrote:
>> On Wed, Aug 6, 2014 at 2:45 PM, John Rose <john.r.rose at oracle.com> wrote:
>>>
>>> On Aug 5, 2014, at 3:45 PM, Hans Boehm <boehm at acm.org> wrote:
>>>
>>>> Native resource allocation makes for good examples here, but not all
>> use cases involve native memory allocation.  If I want to attach additional
>> fields f to a small subset of objects of type T, it makes sense to use an
>> IdentityHashMap to maintain the extra values, rather than adding a field to
>> the class T.  Again, in the general case, it seems I need a finalizer to
>> clean up the IdentityHashMap entries when a T object is no longer needed.
>>   This has the same problem.  The IdentityHashMap entry may be reclaimed by
>> the finalizer while I'm still accessing it.  And this doesn't sound to me
>> like it should be that advanced a use case.  (The results are likely to be
>> a much more debuggable null pointer exception or the like.  But that's
>> still not good.
>>>
>>> OK, that's a bad one.
>>>
>>> It cannot be a common case, partly because it will fail if stress-tested.
>>   Also, the author of class T has to bypass two mental warning signs in
>> order to get into that bad spot.  First, define some complicated
>> IdentityHashMap stuff outside of T, consciously choosing to do that instead
>> of the more obvious and natural route of defining T.extensionSlotTable.
>>   Second, fix the storage leak by defining T.finalize(), which should set
>> off alarm bells, since finalizers are mysterious.  Note that the user has
>> to have write access to T's definition in order to do the second step, so
>> the first step could have been done more naturally.
>>
>> [HB] I'm not sure what you're proposing here.  You want to put the
>> extensionSlotTable reference in each object?  That already adds to the size
>> of all objects, when I only want to add the optional field to a few.
>>
>>>
>>> But, some kind of patch like I suggested earlier would apply to this case
>> also:  Somehow tell the JVM, either by a change to bytecode semantics, or
>> by IDE-hinted insertion of fences, that variables (or bytecode locals) of
>> static type T where T is a "drop-sensitive" type (T.finalize() is not
>> Object.finalize(), or perhaps other conditions) extend to either
>> reassignment or to the end of the enclosing scope (block, method, or
>> constructor).  Every non-virtual method (or constructor) of T has 'this' in
>> scope, and therefore requires a fence (implicit or explicit) at the end of
>> the method (or constructor) to keep 'this' live.
>>
>> [HB] Right.  The trick is determining when something is "drop-sensitive".
>>   AFAICT, we've been telling people to use java.lang.ref instead of
>> finalize(), since finalize() doesn't handle dependencies between
>> finalizable objects correctly, etc.  But java.lang.ref makes it extremely
>> difficult for the implementation, IDE (or reader) to identify
>> "drop-sensitive" objects.
>>
>>>
>>>> I do not see how to hide this behind some sort of native pointer
>> wrapper class.
>>>
>>> I suppose the user would have to cut a wrapper object into T, in addition
>> to adding T.finalize().  In fact, the wrapper object would be
>> T.extensionSlotTable, and the finalizer is not needed.  The finalizer is a
>> lot of trouble just to save a word of storage.  (Note that finalizers may
>> have hidden storage costs, depending on JVM implementation.)
>>
>> [HB] But if I use java.lang.ref "finalization", I can avoid the
>> finalization overhead on objects without the field.  Another one of those
>> finalization deficiencies that java.lang.ref avoids, at the
>> not-strictly-necessary cost of making potentially finalizable objects
>> difficult to identify.
>>
>>>
>>>> Somehow the programmer still needs to specify that a particular object
>> shouldn't be finalized until a particular point in the code.  And the need
>> for that annotation seems extremely surprising to most people, even experts.
>>>
>>> Granted.  And if (1) education will not help much, then (2) an IDE
>> warning or (3) a tweak to bytecode semantics will help more, if there is a
>> likely way to detect problematic drops.
>>>
>>> Or, (4) create some sort of fail-fast behavior, where (in some kind of
>> test scenario) we have the JVM aggressively drop dead finalized variables
>> and call the GC and finalizers as quickly as possible, at least many times
>> a second.  That is the opposite of the IDE warning or bytecode tweak.  It
>> would move the surprise up early in the development cycle.
>>
>> [HB] Interesting idea, but I also don't know how to implement that one with
>> decent efficiency.  We'd need a really stupid reference count collector
>> that kicks the finalization thread immediately when something is dropped.
>>   In a way that doesn't result in deadlocks and somehow magically deals with
>> cycles without delay.  (RC cycle collectors usually introduce a delay.)
>>   Except when locking requires otherwise.
>>
>> It may be easier to extend a race detector to catch those.  But then we
>> have to get everyone to actually run one of those, in spite of the 10X or
>> so slowdown.
>>
>>>
>>>> I also don't see how to usefully factor BigInteger to separate out this
>> problem.  It is still the case that when I call multiply some function
>> between the one I call and the native one has to explicitly ensure that the
>> referenced Java BigInteger objects stay around.  That seems to unavoidably
>> require substantial amounts of boilerplate code, at least in the absence of
>> something like AspectJ.
>>>
>>> Put all accesses to the native data inside the wrapper object W, and have
>> the wrapper object (written by a finalizer expert) include suitable fences
>> so that W.this does not go dead until the native access (read or write) is
>> done.  The BigInteger has a W field.  The BigInteger can go dead, but the W
>> will stay live until it finishes its access.  If there is another access to
>> do, the W will be live as a local inside the BigInteger method, even if the
>> BigInteger is dead.  The user of the BigInteger doesn't have to know
>> anything about finalizers or native pointers.
>>
>> [HB] My concern is that if all native accesses go inside W, then W has to
>> wrap all arithmetic operations etc. on BigIntegers.  W becomes essentially
>> what BigInteger is now in such implementations.  The wrapper W is very
>> BigInteger specific and has to know about all BigInteger operations.
>>
>>>
>>>> Again, this does not just impact finalizers.  It also impacts
>> java.lang.ref.  BigInteger could be using reference queues rather than
>> finalizers.  Thus there doesn't appear to be easy way for an IDE or
>> compiler to tell whether an object is "finalizable", and hence needs
>> attention.
>>>
>>> Yuck.  More "drop-sensitive" types for the heuristic would be any of the
>> java.lang.ref types.
>>>
>>> My patch begins to fall apart, since drop-sensitivity might arguably
>> depend recursively on drop-sensitivity of child objects (e.g., a
>> WeakReference field of the parent object).  And of course you can create
>> scenarios where a drop-sensitive object is obscured by a normal static type
>> like Object.
>>
>> [HB] Right.  Creating a WeakReference to an object may make the constructor
>> argument "drop-sensitive".  Which means we need something like a whole
>> program flow analysis to identify "drop-sensitivity".  I think in practice
>> everything whose lifetime can't be effectively bounded by escape analysis
>> has to be considered "drop-sensitive".
>>
>>>
>>>> And the "this" object is not really different form other parameters or
>> local variables.
>>>
>>> Agreed.  In bytecode, 'this' is just another local (usually #0).  Static
>> heuristics for detecting drop-sensitivity should extend to all locals, not
>> just 'this'.
>>>
>>>    DS ds = makeDropSensitive();
>>>    workOn(ds);
>>>    // IDE warning: missing reachabilityFence(ds)
>>>    return;
>>>
>>>> It seems to me that this is another case, like OOTA values, where there
>> are no easy solutions.
>>>
>>> Yes.  Surprise is hard to define and therefore hard to outlaw.  Maybe
>> surprise and non-enumerable properties are related.  Busy-beaver Turing
>> machines and Chaitin's constant are somehow surprising and also
>> non-enumerable.  So are race conditions.
>>>
>>>> My sense is that in both cases, we're better off going with the
>> performance hit than the questionable/unusable semantics that we now have.
>>   In the finalizer case, we can punt it back to the programmer, but I think
>> we've seen in the last ten tears that we will not magically get correct
>> code if we do.  This problem isn't just overlooked by average programmers.
>>    All of the code in which I've been finding these problems was written by
>> experts, and has been examined by many people.
>>>
>>> The worst outcome would be to lose performance everywhere and still have
>> surprises somewhere.  Are there proposals on the table which would
>> absolutely rule out the sort of surprise scenarios we have been discussing?
>>   Probably not, if we cannot define what surprise is (with decidability).
>>>
>>> If instead we are looking for heuristic patches, it seems like we have to
>> tune the heuristics to compromise performance only in the surprise cases we
>> can predict.  It seems like an IDE warning would cover the cases we have
>> discussed so far.  An IDE warning has this big advantage over bytecode
>> semantic tweaks:  It teaches the user about the danger area.
>>
>> [HB] Agreed.  But I'm not sure we can do sufficient analysis in an IDE.
>>   And we don't have much control over which IDEs people do and don't use.
>>
>> I think we could effectively preclude surprises by guaranteeing that if
>> object x was "naively reachable" at the end or a block of full expression
>> (in the C sense) B, then (the end of) B synchronizes with x's finalizer.
>>   And I think that's implicitly enforced in current systems so long as we
>> refrain from eliminating dead references sufficiently.  My guess is that
>> involves a measurable space cost, but otherwise only a small time cost.  We
>> have to add some otherwise gratuitous spills to the stack.  But
>> almost-always-dead stores to the stack should be fairly cheap.  And escape
>> analysis might help some.  It's not free, but I think we're talking low
>> single-digit percent, as for the OOTA problem.
>>
>> Hans
>>
>>>
>>> — John
>>
>
>
>