[jmm-dev] finalization (was: ECOOP, JVMLS)

Wed Aug 6 21:45:15 UTC 2014

On Aug 5, 2014, at 3:45 PM, Hans Boehm <boehm at acm.org> wrote:

> Native resource allocation makes for good examples here, but not all use cases involve native memory allocation.  If I want to attach additional fields f to a small subset of objects of type T, it makes sense to use an IdentityHashMap to maintain the extra values, rather than adding a field to the class T.  Again, in the general case, it seems I need a finalizer to clean up the IdentityHashMap entries when a T object is no longer needed.  This has the same problem.  The IdentityHashMap entry may be reclaimed by the finalizer while I'm still accessing it.  And this doesn't sound to me like it should be that advanced a use case.  (The results are likely to be a much more debuggable null pointer exception or the like.  But that's still not good.

OK, that's a bad one.

It cannot be a common case, partly because it will fail if stress-tested.  Also, the author of class T has to bypass two mental warning signs in order to get into that bad spot.  First, define some complicated IdentityHashMap stuff outside of T, consciously choosing to do that instead of the more obvious and natural route of defining T.extensionSlotTable.  Second, fix the storage leak by defining T.finalize(), which should set off alarm bells, since finalizers are mysterious.  Note that the user has to have write access to T's definition in order to do the second step, so the first step could have been done more naturally.

But, some kind of patch like I suggested earlier would apply to this case also:  Somehow tell the JVM, either by a change to bytecode semantics, or by IDE-hinted insertion of fences, that variables (or bytecode locals) of static type T where T is a "drop-sensitive" type (T.finalize() is not Object.finalize(), or perhaps other conditions) extend to either reassignment or to the end of the enclosing scope (block, method, or constructor).  Every non-virtual method (or constructor) of T has 'this' in scope, and therefore requires a fence (implicit or explicit) at the end of the method (or constructor) to keep 'this' live.

> I do not see how to hide this behind some sort of native pointer wrapper class.

I suppose the user would have to cut a wrapper object into T, in addition to adding T.finalize().  In fact, the wrapper object would be T.extensionSlotTable, and the finalizer is not needed.  The finalizer is a lot of trouble just to save a word of storage.  (Note that finalizers may have hidden storage costs, depending on JVM implementation.)

> Somehow the programmer still needs to specify that a particular object shouldn't be finalized until a particular point in the code.  And the need for that annotation seems extremely surprising to most people, even experts.

Granted.  And if (1) education will not help much, then (2) an IDE warning or (3) a tweak to bytecode semantics will help more, if there is a likely way to detect problematic drops.

Or, (4) create some sort of fail-fast behavior, where (in some kind of test scenario) we have the JVM aggressively drop dead finalized variables and call the GC and finalizers as quickly as possible, at least many times a second.  That is the opposite of the IDE warning or bytecode tweak.  It would move the surprise up early in the development cycle.

> I also don't see how to usefully factor BigInteger to separate out this problem.  It is still the case that when I call multiply some function between the one I call and the native one has to explicitly ensure that the referenced Java BigInteger objects stay around.  That seems to unavoidably require substantial amounts of boilerplate code, at least in the absence of something like AspectJ.

Put all accesses to the native data inside the wrapper object W, and have the wrapper object (written by a finalizer expert) include suitable fences so that W.this does not go dead until the native access (read or write) is done.  The BigInteger has a W field.  The BigInteger can go dead, but the W will stay live until it finishes its access.  If there is another access to do, the W will be live as a local inside the BigInteger method, even if the BigInteger is dead.  The user of the BigInteger doesn't have to know anything about finalizers or native pointers.

> Again, this does not just impact finalizers.  It also impacts java.lang.ref.  BigInteger could be using reference queues rather than finalizers.  Thus there doesn't appear to be easy way for an IDE or compiler to tell whether an object is "finalizable", and hence needs attention.

Yuck.  More "drop-sensitive" types for the heuristic would be any of the java.lang.ref types.

My patch begins to fall apart, since drop-sensitivity might arguably depend recursively on drop-sensitivity of child objects (e.g., a WeakReference field of the parent object).  And of course you can create scenarios where a drop-sensitive object is obscured by a normal static type like Object.

> And the "this" object is not really different form other parameters or local variables.

Agreed.  In bytecode, 'this' is just another local (usually #0).  Static heuristics for detecting drop-sensitivity should extend to all locals, not just 'this'.

  DS ds = makeDropSensitive();
  workOn(ds);
  // IDE warning: missing reachabilityFence(ds)
  return;

> It seems to me that this is another case, like OOTA values, where there are no easy solutions.

Yes.  Surprise is hard to define and therefore hard to outlaw.  Maybe surprise and non-enumerable properties are related.  Busy-beaver Turing machines and Chaitin's constant are somehow surprising and also non-enumerable.  So are race conditions.

> My sense is that in both cases, we're better off going with the performance hit than the questionable/unusable semantics that we now have.  In the finalizer case, we can punt it back to the programmer, but I think we've seen in the last ten tears that we will not magically get correct code if we do.  This problem isn't just overlooked by average programmers.   All of the code in which I've been finding these problems was written by experts, and has been examined by many people.

The worst outcome would be to lose performance everywhere and still have surprises somewhere.  Are there proposals on the table which would absolutely rule out the sort of surprise scenarios we have been discussing?  Probably not, if we cannot define what surprise is (with decidability).

If instead we are looking for heuristic patches, it seems like we have to tune the heuristics to compromise performance only in the surprise cases we can predict.  It seems like an IDE warning would cover the cases we have discussed so far.  An IDE warning has this big advantage over bytecode semantic tweaks:  It teaches the user about the danger area.

— John