[jmm-dev] finalization (was: ECOOP, JVMLS)

Fri Aug 8 23:54:52 UTC 2014

On Wed, Aug 6, 2014 at 2:45 PM, John Rose <john.r.rose at oracle.com> wrote:
>
> On Aug 5, 2014, at 3:45 PM, Hans Boehm <boehm at acm.org> wrote:
>
> > Native resource allocation makes for good examples here, but not all
use cases involve native memory allocation.  If I want to attach additional
fields f to a small subset of objects of type T, it makes sense to use an
IdentityHashMap to maintain the extra values, rather than adding a field to
the class T.  Again, in the general case, it seems I need a finalizer to
clean up the IdentityHashMap entries when a T object is no longer needed.
 This has the same problem.  The IdentityHashMap entry may be reclaimed by
the finalizer while I'm still accessing it.  And this doesn't sound to me
like it should be that advanced a use case.  (The results are likely to be
a much more debuggable null pointer exception or the like.  But that's
still not good.
>
> OK, that's a bad one.
>
> It cannot be a common case, partly because it will fail if stress-tested.
 Also, the author of class T has to bypass two mental warning signs in
order to get into that bad spot.  First, define some complicated
IdentityHashMap stuff outside of T, consciously choosing to do that instead
of the more obvious and natural route of defining T.extensionSlotTable.
 Second, fix the storage leak by defining T.finalize(), which should set
off alarm bells, since finalizers are mysterious.  Note that the user has
to have write access to T's definition in order to do the second step, so
the first step could have been done more naturally.

[HB] I'm not sure what you're proposing here.  You want to put the
extensionSlotTable reference in each object?  That already adds to the size
of all objects, when I only want to add the optional field to a few.

>
> But, some kind of patch like I suggested earlier would apply to this case
also:  Somehow tell the JVM, either by a change to bytecode semantics, or
by IDE-hinted insertion of fences, that variables (or bytecode locals) of
static type T where T is a "drop-sensitive" type (T.finalize() is not
Object.finalize(), or perhaps other conditions) extend to either
reassignment or to the end of the enclosing scope (block, method, or
constructor).  Every non-virtual method (or constructor) of T has 'this' in
scope, and therefore requires a fence (implicit or explicit) at the end of
the method (or constructor) to keep 'this' live.

[HB] Right.  The trick is determining when something is "drop-sensitive".
 AFAICT, we've been telling people to use java.lang.ref instead of
finalize(), since finalize() doesn't handle dependencies between
finalizable objects correctly, etc.  But java.lang.ref makes it extremely
difficult for the implementation, IDE (or reader) to identify
"drop-sensitive" objects.

>
> > I do not see how to hide this behind some sort of native pointer
wrapper class.
>
> I suppose the user would have to cut a wrapper object into T, in addition
to adding T.finalize().  In fact, the wrapper object would be
T.extensionSlotTable, and the finalizer is not needed.  The finalizer is a
lot of trouble just to save a word of storage.  (Note that finalizers may
have hidden storage costs, depending on JVM implementation.)

[HB] But if I use java.lang.ref "finalization", I can avoid the
finalization overhead on objects without the field.  Another one of those
finalization deficiencies that java.lang.ref avoids, at the
not-strictly-necessary cost of making potentially finalizable objects
difficult to identify.

>
> > Somehow the programmer still needs to specify that a particular object
shouldn't be finalized until a particular point in the code.  And the need
for that annotation seems extremely surprising to most people, even experts.
>
> Granted.  And if (1) education will not help much, then (2) an IDE
warning or (3) a tweak to bytecode semantics will help more, if there is a
likely way to detect problematic drops.
>
> Or, (4) create some sort of fail-fast behavior, where (in some kind of
test scenario) we have the JVM aggressively drop dead finalized variables
and call the GC and finalizers as quickly as possible, at least many times
a second.  That is the opposite of the IDE warning or bytecode tweak.  It
would move the surprise up early in the development cycle.

[HB] Interesting idea, but I also don't know how to implement that one with
decent efficiency.  We'd need a really stupid reference count collector
that kicks the finalization thread immediately when something is dropped.
 In a way that doesn't result in deadlocks and somehow magically deals with
cycles without delay.  (RC cycle collectors usually introduce a delay.)
 Except when locking requires otherwise.

It may be easier to extend a race detector to catch those.  But then we
have to get everyone to actually run one of those, in spite of the 10X or
so slowdown.

>
> > I also don't see how to usefully factor BigInteger to separate out this
problem.  It is still the case that when I call multiply some function
between the one I call and the native one has to explicitly ensure that the
referenced Java BigInteger objects stay around.  That seems to unavoidably
require substantial amounts of boilerplate code, at least in the absence of
something like AspectJ.
>
> Put all accesses to the native data inside the wrapper object W, and have
the wrapper object (written by a finalizer expert) include suitable fences
so that W.this does not go dead until the native access (read or write) is
done.  The BigInteger has a W field.  The BigInteger can go dead, but the W
will stay live until it finishes its access.  If there is another access to
do, the W will be live as a local inside the BigInteger method, even if the
BigInteger is dead.  The user of the BigInteger doesn't have to know
anything about finalizers or native pointers.

[HB] My concern is that if all native accesses go inside W, then W has to
wrap all arithmetic operations etc. on BigIntegers.  W becomes essentially
what BigInteger is now in such implementations.  The wrapper W is very
BigInteger specific and has to know about all BigInteger operations.

>
> > Again, this does not just impact finalizers.  It also impacts
java.lang.ref.  BigInteger could be using reference queues rather than
finalizers.  Thus there doesn't appear to be easy way for an IDE or
compiler to tell whether an object is "finalizable", and hence needs
attention.
>
> Yuck.  More "drop-sensitive" types for the heuristic would be any of the
java.lang.ref types.
>
> My patch begins to fall apart, since drop-sensitivity might arguably
depend recursively on drop-sensitivity of child objects (e.g., a
WeakReference field of the parent object).  And of course you can create
scenarios where a drop-sensitive object is obscured by a normal static type
like Object.

[HB] Right.  Creating a WeakReference to an object may make the constructor
argument "drop-sensitive".  Which means we need something like a whole
program flow analysis to identify "drop-sensitivity".  I think in practice
everything whose lifetime can't be effectively bounded by escape analysis
has to be considered "drop-sensitive".

>
> > And the "this" object is not really different form other parameters or
local variables.
>
> Agreed.  In bytecode, 'this' is just another local (usually #0).  Static
heuristics for detecting drop-sensitivity should extend to all locals, not
just 'this'.
>
>   DS ds = makeDropSensitive();
>   workOn(ds);
>   // IDE warning: missing reachabilityFence(ds)
>   return;
>
> > It seems to me that this is another case, like OOTA values, where there
are no easy solutions.
>
> Yes.  Surprise is hard to define and therefore hard to outlaw.  Maybe
surprise and non-enumerable properties are related.  Busy-beaver Turing
machines and Chaitin's constant are somehow surprising and also
non-enumerable.  So are race conditions.
>
> > My sense is that in both cases, we're better off going with the
performance hit than the questionable/unusable semantics that we now have.
 In the finalizer case, we can punt it back to the programmer, but I think
we've seen in the last ten tears that we will not magically get correct
code if we do.  This problem isn't just overlooked by average programmers.
  All of the code in which I've been finding these problems was written by
experts, and has been examined by many people.
>
> The worst outcome would be to lose performance everywhere and still have
surprises somewhere.  Are there proposals on the table which would
absolutely rule out the sort of surprise scenarios we have been discussing?
 Probably not, if we cannot define what surprise is (with decidability).
>
> If instead we are looking for heuristic patches, it seems like we have to
tune the heuristics to compromise performance only in the surprise cases we
can predict.  It seems like an IDE warning would cover the cases we have
discussed so far.  An IDE warning has this big advantage over bytecode
semantic tweaks:  It teaches the user about the danger area.

[HB] Agreed.  But I'm not sure we can do sufficient analysis in an IDE.
 And we don't have much control over which IDEs people do and don't use.

I think we could effectively preclude surprises by guaranteeing that if
object x was "naively reachable" at the end or a block of full expression
(in the C sense) B, then (the end of) B synchronizes with x's finalizer.
 And I think that's implicitly enforced in current systems so long as we
refrain from eliminating dead references sufficiently.  My guess is that
involves a measurable space cost, but otherwise only a small time cost.  We
have to add some otherwise gratuitous spills to the stack.  But
almost-always-dead stores to the stack should be fairly cheap.  And escape
analysis might help some.  It's not free, but I think we're talking low
single-digit percent, as for the OOTA problem.

Hans

>
> — John