[jmm-dev] finalization
Hans Boehm
boehm at acm.org
Wed Aug 13 23:11:08 UTC 2014
To me, this this is really a somewhat different case. Especially for the
second version (what do lambdas actually capture in the first case?), it
seems much more clear that I have an independent use of the resource, which
may persist after the holder goes away. That will always be an issue; I
can store a copy of r in a second object and access it after the finalizer
runs. In the second version, I can't even write the "reachability fence"
without some sort of signature change, which really suggests this is a
different case, and the user has no reason to expect the ResourceHolder
object to still be around. If I had a reference around to write the
"reachability fence", I would have no reason to actually write it with the
more aggressive proposal that logically keeps all references live.
You're suggesting that if a method accesses an @Finalized field (including
passing it to other methods), then all reference variables in that method
are kept around until the end of their scope? Presumably returning an
@Finalized field from a public method yields at least a warning?
Unfortunately declaring a finalizer without an @Finalized field probably
has to be OK to cover the finalizers that just report leaks.
That sounds like it might be workable, if this is considered an acceptable
use of annotations. Leaving it off subtly breaks code. If this were C++
instead of Java, we would have to use a proper language construct instead
of an attribute.
We would still be left with a major education problem. But at least it
would be possible to fix code without uglifying it.
I would be opposed to making this only an IDE feature and just putting a
reachability fence in the language. I think it would have to be in the
language.
Hans
On Wed, Aug 13, 2014 at 7:52 AM, Doug Lea <dl at cs.oswego.edu> wrote:
> On 08/13/2014 08:38 AM, Doug Lea wrote:
>
>> Consider:
>>
>> class ResourceHolder {
>> Resource r = createResource();
>> void f() { new Thread(() -> { use(r); }).start(); }
>> void finalize() { destroyResource(); }
>> static void use(Resource r) { ... } // maybe native or JNI
>> }
>>
>
> To avoid details of compilation of lambdas (thanks Jan!),
> I should have written it out in a more awkward unambiguous form:
>
>
>
> class ResourceHolder {
> Resource r = createResource();
> void f() { new Thread(new Use(r)).start(); }
>
> void finalize() { destroyResource(); }
> static void use(Resource r) { ... } // maybe native or JNI
>
> static final class Use implements Runnable {
> final Resource resource;
> Use(Resource r) { this.resource = r; }
> public void run() { ResourceHolder.use(resource); }
>
> }
> }
>
>
>
>
>
>
>> This seems to be the clearest case in which a ResourceHolder can go
>> out of scope (or otherwise become unreachable) while its resource is
>> being used. But under GC/finalization rules that are friendly to
>> concurrent collectors (i.e., where the collector, not just the usage,
>> may be concurrent), many different-looking constructions can be just
>> as problematic. The use of explicit reachability fences (above,
>> inserting one after "use(r)") seems to be applicable to all of them:
>> Ensure that every access of a destroyable resource, or use as an an
>> argument in another method that does is followed by a reachability
>> fence. This might be a candidate for language integration if applied
>> to (only) the associated fields of finalizable classes. But we have no
>> way of identifying the relevant fields and usages. This does suggest
>> an extension of the tool/IDE approach though: Support a @Finalized
>> annotation for fields, and let tools take it from there. Possibly
>> even source compilers (javac) themselves, but I can easily imagine
>> cases where programmers would want to hand-craft specializations. The
>> annotation approach also seems to fit other scenarios discussed below.
>>
>> -Doug
>>
>> On 08/08/2014 07:54 PM, Hans Boehm wrote:
>>
>>> On Wed, Aug 6, 2014 at 2:45 PM, John Rose <john.r.rose at oracle.com>
>>> wrote:
>>>
>>>>
>>>> On Aug 5, 2014, at 3:45 PM, Hans Boehm <boehm at acm.org> wrote:
>>>>
>>>> Native resource allocation makes for good examples here, but not all
>>>>>
>>>> use cases involve native memory allocation. If I want to attach
>>> additional
>>> fields f to a small subset of objects of type T, it makes sense to use an
>>> IdentityHashMap to maintain the extra values, rather than adding a field
>>> to
>>> the class T. Again, in the general case, it seems I need a finalizer to
>>> clean up the IdentityHashMap entries when a T object is no longer needed.
>>> This has the same problem. The IdentityHashMap entry may be reclaimed
>>> by
>>> the finalizer while I'm still accessing it. And this doesn't sound to me
>>> like it should be that advanced a use case. (The results are likely to
>>> be
>>> a much more debuggable null pointer exception or the like. But that's
>>> still not good.
>>>
>>>>
>>>> OK, that's a bad one.
>>>>
>>>> It cannot be a common case, partly because it will fail if
>>>> stress-tested.
>>>>
>>> Also, the author of class T has to bypass two mental warning signs in
>>> order to get into that bad spot. First, define some complicated
>>> IdentityHashMap stuff outside of T, consciously choosing to do that
>>> instead
>>> of the more obvious and natural route of defining T.extensionSlotTable.
>>> Second, fix the storage leak by defining T.finalize(), which should set
>>> off alarm bells, since finalizers are mysterious. Note that the user has
>>> to have write access to T's definition in order to do the second step, so
>>> the first step could have been done more naturally.
>>>
>>> [HB] I'm not sure what you're proposing here. You want to put the
>>> extensionSlotTable reference in each object? That already adds to the
>>> size
>>> of all objects, when I only want to add the optional field to a few.
>>>
>>>
>>>> But, some kind of patch like I suggested earlier would apply to this
>>>> case
>>>>
>>> also: Somehow tell the JVM, either by a change to bytecode semantics, or
>>> by IDE-hinted insertion of fences, that variables (or bytecode locals) of
>>> static type T where T is a "drop-sensitive" type (T.finalize() is not
>>> Object.finalize(), or perhaps other conditions) extend to either
>>> reassignment or to the end of the enclosing scope (block, method, or
>>> constructor). Every non-virtual method (or constructor) of T has 'this'
>>> in
>>> scope, and therefore requires a fence (implicit or explicit) at the end
>>> of
>>> the method (or constructor) to keep 'this' live.
>>>
>>> [HB] Right. The trick is determining when something is "drop-sensitive".
>>> AFAICT, we've been telling people to use java.lang.ref instead of
>>> finalize(), since finalize() doesn't handle dependencies between
>>> finalizable objects correctly, etc. But java.lang.ref makes it extremely
>>> difficult for the implementation, IDE (or reader) to identify
>>> "drop-sensitive" objects.
>>>
>>>
>>>> I do not see how to hide this behind some sort of native pointer
>>>>>
>>>> wrapper class.
>>>
>>>>
>>>> I suppose the user would have to cut a wrapper object into T, in
>>>> addition
>>>>
>>> to adding T.finalize(). In fact, the wrapper object would be
>>> T.extensionSlotTable, and the finalizer is not needed. The finalizer is
>>> a
>>> lot of trouble just to save a word of storage. (Note that finalizers may
>>> have hidden storage costs, depending on JVM implementation.)
>>>
>>> [HB] But if I use java.lang.ref "finalization", I can avoid the
>>> finalization overhead on objects without the field. Another one of those
>>> finalization deficiencies that java.lang.ref avoids, at the
>>> not-strictly-necessary cost of making potentially finalizable objects
>>> difficult to identify.
>>>
>>>
>>>> Somehow the programmer still needs to specify that a particular object
>>>>>
>>>> shouldn't be finalized until a particular point in the code. And the
>>> need
>>> for that annotation seems extremely surprising to most people, even
>>> experts.
>>>
>>>>
>>>> Granted. And if (1) education will not help much, then (2) an IDE
>>>>
>>> warning or (3) a tweak to bytecode semantics will help more, if there is
>>> a
>>> likely way to detect problematic drops.
>>>
>>>>
>>>> Or, (4) create some sort of fail-fast behavior, where (in some kind of
>>>>
>>> test scenario) we have the JVM aggressively drop dead finalized variables
>>> and call the GC and finalizers as quickly as possible, at least many
>>> times
>>> a second. That is the opposite of the IDE warning or bytecode tweak. It
>>> would move the surprise up early in the development cycle.
>>>
>>> [HB] Interesting idea, but I also don't know how to implement that one
>>> with
>>> decent efficiency. We'd need a really stupid reference count collector
>>> that kicks the finalization thread immediately when something is dropped.
>>> In a way that doesn't result in deadlocks and somehow magically deals
>>> with
>>> cycles without delay. (RC cycle collectors usually introduce a delay.)
>>> Except when locking requires otherwise.
>>>
>>> It may be easier to extend a race detector to catch those. But then we
>>> have to get everyone to actually run one of those, in spite of the 10X or
>>> so slowdown.
>>>
>>>
>>>> I also don't see how to usefully factor BigInteger to separate out this
>>>>>
>>>> problem. It is still the case that when I call multiply some function
>>> between the one I call and the native one has to explicitly ensure that
>>> the
>>> referenced Java BigInteger objects stay around. That seems to
>>> unavoidably
>>> require substantial amounts of boilerplate code, at least in the absence
>>> of
>>> something like AspectJ.
>>>
>>>>
>>>> Put all accesses to the native data inside the wrapper object W, and
>>>> have
>>>>
>>> the wrapper object (written by a finalizer expert) include suitable
>>> fences
>>> so that W.this does not go dead until the native access (read or write)
>>> is
>>> done. The BigInteger has a W field. The BigInteger can go dead, but
>>> the W
>>> will stay live until it finishes its access. If there is another access
>>> to
>>> do, the W will be live as a local inside the BigInteger method, even if
>>> the
>>> BigInteger is dead. The user of the BigInteger doesn't have to know
>>> anything about finalizers or native pointers.
>>>
>>> [HB] My concern is that if all native accesses go inside W, then W has to
>>> wrap all arithmetic operations etc. on BigIntegers. W becomes
>>> essentially
>>> what BigInteger is now in such implementations. The wrapper W is very
>>> BigInteger specific and has to know about all BigInteger operations.
>>>
>>>
>>>> Again, this does not just impact finalizers. It also impacts
>>>>>
>>>> java.lang.ref. BigInteger could be using reference queues rather than
>>> finalizers. Thus there doesn't appear to be easy way for an IDE or
>>> compiler to tell whether an object is "finalizable", and hence needs
>>> attention.
>>>
>>>>
>>>> Yuck. More "drop-sensitive" types for the heuristic would be any of the
>>>>
>>> java.lang.ref types.
>>>
>>>>
>>>> My patch begins to fall apart, since drop-sensitivity might arguably
>>>>
>>> depend recursively on drop-sensitivity of child objects (e.g., a
>>> WeakReference field of the parent object). And of course you can create
>>> scenarios where a drop-sensitive object is obscured by a normal static
>>> type
>>> like Object.
>>>
>>> [HB] Right. Creating a WeakReference to an object may make the
>>> constructor
>>> argument "drop-sensitive". Which means we need something like a whole
>>> program flow analysis to identify "drop-sensitivity". I think in
>>> practice
>>> everything whose lifetime can't be effectively bounded by escape analysis
>>> has to be considered "drop-sensitive".
>>>
>>>
>>>> And the "this" object is not really different form other parameters or
>>>>>
>>>> local variables.
>>>
>>>>
>>>> Agreed. In bytecode, 'this' is just another local (usually #0). Static
>>>>
>>> heuristics for detecting drop-sensitivity should extend to all locals,
>>> not
>>> just 'this'.
>>>
>>>>
>>>> DS ds = makeDropSensitive();
>>>> workOn(ds);
>>>> // IDE warning: missing reachabilityFence(ds)
>>>> return;
>>>>
>>>> It seems to me that this is another case, like OOTA values, where there
>>>>>
>>>> are no easy solutions.
>>>
>>>>
>>>> Yes. Surprise is hard to define and therefore hard to outlaw. Maybe
>>>>
>>> surprise and non-enumerable properties are related. Busy-beaver Turing
>>> machines and Chaitin's constant are somehow surprising and also
>>> non-enumerable. So are race conditions.
>>>
>>>>
>>>> My sense is that in both cases, we're better off going with the
>>>>>
>>>> performance hit than the questionable/unusable semantics that we now
>>> have.
>>> In the finalizer case, we can punt it back to the programmer, but I
>>> think
>>> we've seen in the last ten tears that we will not magically get correct
>>> code if we do. This problem isn't just overlooked by average
>>> programmers.
>>> All of the code in which I've been finding these problems was written
>>> by
>>> experts, and has been examined by many people.
>>>
>>>>
>>>> The worst outcome would be to lose performance everywhere and still have
>>>>
>>> surprises somewhere. Are there proposals on the table which would
>>> absolutely rule out the sort of surprise scenarios we have been
>>> discussing?
>>> Probably not, if we cannot define what surprise is (with decidability).
>>>
>>>>
>>>> If instead we are looking for heuristic patches, it seems like we have
>>>> to
>>>>
>>> tune the heuristics to compromise performance only in the surprise cases
>>> we
>>> can predict. It seems like an IDE warning would cover the cases we have
>>> discussed so far. An IDE warning has this big advantage over bytecode
>>> semantic tweaks: It teaches the user about the danger area.
>>>
>>> [HB] Agreed. But I'm not sure we can do sufficient analysis in an IDE.
>>> And we don't have much control over which IDEs people do and don't use.
>>>
>>> I think we could effectively preclude surprises by guaranteeing that if
>>> object x was "naively reachable" at the end or a block of full expression
>>> (in the C sense) B, then (the end of) B synchronizes with x's finalizer.
>>> And I think that's implicitly enforced in current systems so long as we
>>> refrain from eliminating dead references sufficiently. My guess is that
>>> involves a measurable space cost, but otherwise only a small time cost.
>>> We
>>> have to add some otherwise gratuitous spills to the stack. But
>>> almost-always-dead stores to the stack should be fairly cheap. And
>>> escape
>>> analysis might help some. It's not free, but I think we're talking low
>>> single-digit percent, as for the OOTA problem.
>>>
>>> Hans
>>>
>>>
>>>> — John
>>>>
>>>
>>>
>>
>>
>>
>
>
More information about the jmm-dev
mailing list