[jmm-dev] finalization

Wed Aug 13 12:38:00 UTC 2014

Consider:

class ResourceHolder {
   Resource r = createResource();
   void f() { new Thread(() -> { use(r); }).start(); }
   void finalize() { destroyResource(); }
   static void use(Resource r) { ... } // maybe native or JNI
}

This seems to be the clearest case in which a ResourceHolder can go
out of scope (or otherwise become unreachable) while its resource is
being used.  But under GC/finalization rules that are friendly to
concurrent collectors (i.e., where the collector, not just the usage,
may be concurrent), many different-looking constructions can be just
as problematic.  The use of explicit reachability fences (above,
inserting one after "use(r)") seems to be applicable to all of them:
Ensure that every access of a destroyable resource, or use as an an
argument in another method that does is followed by a reachability
fence. This might be a candidate for language integration if applied
to (only) the associated fields of finalizable classes. But we have no
way of identifying the relevant fields and usages. This does suggest
an extension of the tool/IDE approach though: Support a @Finalized
annotation for fields, and let tools take it from there.  Possibly
even source compilers (javac) themselves, but I can easily imagine
cases where programmers would want to hand-craft specializations.  The
annotation approach also seems to fit other scenarios discussed below.

-Doug

On 08/08/2014 07:54 PM, Hans Boehm wrote:
> On Wed, Aug 6, 2014 at 2:45 PM, John Rose <john.r.rose at oracle.com> wrote:
>>
>> On Aug 5, 2014, at 3:45 PM, Hans Boehm <boehm at acm.org> wrote:
>>
>>> Native resource allocation makes for good examples here, but not all
> use cases involve native memory allocation.  If I want to attach additional
> fields f to a small subset of objects of type T, it makes sense to use an
> IdentityHashMap to maintain the extra values, rather than adding a field to
> the class T.  Again, in the general case, it seems I need a finalizer to
> clean up the IdentityHashMap entries when a T object is no longer needed.
>   This has the same problem.  The IdentityHashMap entry may be reclaimed by
> the finalizer while I'm still accessing it.  And this doesn't sound to me
> like it should be that advanced a use case.  (The results are likely to be
> a much more debuggable null pointer exception or the like.  But that's
> still not good.
>>
>> OK, that's a bad one.
>>
>> It cannot be a common case, partly because it will fail if stress-tested.
>   Also, the author of class T has to bypass two mental warning signs in
> order to get into that bad spot.  First, define some complicated
> IdentityHashMap stuff outside of T, consciously choosing to do that instead
> of the more obvious and natural route of defining T.extensionSlotTable.
>   Second, fix the storage leak by defining T.finalize(), which should set
> off alarm bells, since finalizers are mysterious.  Note that the user has
> to have write access to T's definition in order to do the second step, so
> the first step could have been done more naturally.
>
> [HB] I'm not sure what you're proposing here.  You want to put the
> extensionSlotTable reference in each object?  That already adds to the size
> of all objects, when I only want to add the optional field to a few.
>
>>
>> But, some kind of patch like I suggested earlier would apply to this case
> also:  Somehow tell the JVM, either by a change to bytecode semantics, or
> by IDE-hinted insertion of fences, that variables (or bytecode locals) of
> static type T where T is a "drop-sensitive" type (T.finalize() is not
> Object.finalize(), or perhaps other conditions) extend to either
> reassignment or to the end of the enclosing scope (block, method, or
> constructor).  Every non-virtual method (or constructor) of T has 'this' in
> scope, and therefore requires a fence (implicit or explicit) at the end of
> the method (or constructor) to keep 'this' live.
>
> [HB] Right.  The trick is determining when something is "drop-sensitive".
>   AFAICT, we've been telling people to use java.lang.ref instead of
> finalize(), since finalize() doesn't handle dependencies between
> finalizable objects correctly, etc.  But java.lang.ref makes it extremely
> difficult for the implementation, IDE (or reader) to identify
> "drop-sensitive" objects.
>
>>
>>> I do not see how to hide this behind some sort of native pointer
> wrapper class.
>>
>> I suppose the user would have to cut a wrapper object into T, in addition
> to adding T.finalize().  In fact, the wrapper object would be
> T.extensionSlotTable, and the finalizer is not needed.  The finalizer is a
> lot of trouble just to save a word of storage.  (Note that finalizers may
> have hidden storage costs, depending on JVM implementation.)
>
> [HB] But if I use java.lang.ref "finalization", I can avoid the
> finalization overhead on objects without the field.  Another one of those
> finalization deficiencies that java.lang.ref avoids, at the
> not-strictly-necessary cost of making potentially finalizable objects
> difficult to identify.
>
>>
>>> Somehow the programmer still needs to specify that a particular object
> shouldn't be finalized until a particular point in the code.  And the need
> for that annotation seems extremely surprising to most people, even experts.
>>
>> Granted.  And if (1) education will not help much, then (2) an IDE
> warning or (3) a tweak to bytecode semantics will help more, if there is a
> likely way to detect problematic drops.
>>
>> Or, (4) create some sort of fail-fast behavior, where (in some kind of
> test scenario) we have the JVM aggressively drop dead finalized variables
> and call the GC and finalizers as quickly as possible, at least many times
> a second.  That is the opposite of the IDE warning or bytecode tweak.  It
> would move the surprise up early in the development cycle.
>
> [HB] Interesting idea, but I also don't know how to implement that one with
> decent efficiency.  We'd need a really stupid reference count collector
> that kicks the finalization thread immediately when something is dropped.
>   In a way that doesn't result in deadlocks and somehow magically deals with
> cycles without delay.  (RC cycle collectors usually introduce a delay.)
>   Except when locking requires otherwise.
>
> It may be easier to extend a race detector to catch those.  But then we
> have to get everyone to actually run one of those, in spite of the 10X or
> so slowdown.
>
>>
>>> I also don't see how to usefully factor BigInteger to separate out this
> problem.  It is still the case that when I call multiply some function
> between the one I call and the native one has to explicitly ensure that the
> referenced Java BigInteger objects stay around.  That seems to unavoidably
> require substantial amounts of boilerplate code, at least in the absence of
> something like AspectJ.
>>
>> Put all accesses to the native data inside the wrapper object W, and have
> the wrapper object (written by a finalizer expert) include suitable fences
> so that W.this does not go dead until the native access (read or write) is
> done.  The BigInteger has a W field.  The BigInteger can go dead, but the W
> will stay live until it finishes its access.  If there is another access to
> do, the W will be live as a local inside the BigInteger method, even if the
> BigInteger is dead.  The user of the BigInteger doesn't have to know
> anything about finalizers or native pointers.
>
> [HB] My concern is that if all native accesses go inside W, then W has to
> wrap all arithmetic operations etc. on BigIntegers.  W becomes essentially
> what BigInteger is now in such implementations.  The wrapper W is very
> BigInteger specific and has to know about all BigInteger operations.
>
>>
>>> Again, this does not just impact finalizers.  It also impacts
> java.lang.ref.  BigInteger could be using reference queues rather than
> finalizers.  Thus there doesn't appear to be easy way for an IDE or
> compiler to tell whether an object is "finalizable", and hence needs
> attention.
>>
>> Yuck.  More "drop-sensitive" types for the heuristic would be any of the
> java.lang.ref types.
>>
>> My patch begins to fall apart, since drop-sensitivity might arguably
> depend recursively on drop-sensitivity of child objects (e.g., a
> WeakReference field of the parent object).  And of course you can create
> scenarios where a drop-sensitive object is obscured by a normal static type
> like Object.
>
> [HB] Right.  Creating a WeakReference to an object may make the constructor
> argument "drop-sensitive".  Which means we need something like a whole
> program flow analysis to identify "drop-sensitivity".  I think in practice
> everything whose lifetime can't be effectively bounded by escape analysis
> has to be considered "drop-sensitive".
>
>>
>>> And the "this" object is not really different form other parameters or
> local variables.
>>
>> Agreed.  In bytecode, 'this' is just another local (usually #0).  Static
> heuristics for detecting drop-sensitivity should extend to all locals, not
> just 'this'.
>>
>>    DS ds = makeDropSensitive();
>>    workOn(ds);
>>    // IDE warning: missing reachabilityFence(ds)
>>    return;
>>
>>> It seems to me that this is another case, like OOTA values, where there
> are no easy solutions.
>>
>> Yes.  Surprise is hard to define and therefore hard to outlaw.  Maybe
> surprise and non-enumerable properties are related.  Busy-beaver Turing
> machines and Chaitin's constant are somehow surprising and also
> non-enumerable.  So are race conditions.
>>
>>> My sense is that in both cases, we're better off going with the
> performance hit than the questionable/unusable semantics that we now have.
>   In the finalizer case, we can punt it back to the programmer, but I think
> we've seen in the last ten tears that we will not magically get correct
> code if we do.  This problem isn't just overlooked by average programmers.
>    All of the code in which I've been finding these problems was written by
> experts, and has been examined by many people.
>>
>> The worst outcome would be to lose performance everywhere and still have
> surprises somewhere.  Are there proposals on the table which would
> absolutely rule out the sort of surprise scenarios we have been discussing?
>   Probably not, if we cannot define what surprise is (with decidability).
>>
>> If instead we are looking for heuristic patches, it seems like we have to
> tune the heuristics to compromise performance only in the surprise cases we
> can predict.  It seems like an IDE warning would cover the cases we have
> discussed so far.  An IDE warning has this big advantage over bytecode
> semantic tweaks:  It teaches the user about the danger area.
>
> [HB] Agreed.  But I'm not sure we can do sufficient analysis in an IDE.
>   And we don't have much control over which IDEs people do and don't use.
>
> I think we could effectively preclude surprises by guaranteeing that if
> object x was "naively reachable" at the end or a block of full expression
> (in the C sense) B, then (the end of) B synchronizes with x's finalizer.
>   And I think that's implicitly enforced in current systems so long as we
> refrain from eliminating dead references sufficiently.  My guess is that
> involves a measurable space cost, but otherwise only a small time cost.  We
> have to add some otherwise gratuitous spills to the stack.  But
> almost-always-dead stores to the stack should be fairly cheap.  And escape
> analysis might help some.  It's not free, but I think we're talking low
> single-digit percent, as for the OOTA problem.
>
> Hans
>
>>
>> — John
>