Finalization and dead references: another proposal

Fri Dec 8 00:01:14 UTC 2017

Some sort of zero cost wrapper type for native pointers, with semantics
similar to the annotation, would certainly be better than just the
annotation. My best guess is that it would cover something like 90% of use
cases. I certainly don't want to preclude it, once we have such a mechanism.

But I'm not yet convinced that a "native pointer" type covers enough of the
use cases that it can serve as the only mechanism. Not all handles
requiring some sort of deletion are represented as native pointers.
Sometimes this problem does arise in pure Java contexts, e.g.
FinalizableDelegatedExecutorService. And there are cases in which the Java
object being cleaned owns the native pointer indirectly, via another Java
object.

On Wed, Dec 6, 2017 at 6:27 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:

> On Wed, Dec 6, 2017 at 7:38 PM Hans Boehm <hboehm at google.com> wrote:
>
>> We're still trying to deal with a fair amount of code that implicitly
>> assumes that finalization or similar clean-up will not occur while a
>> pointer to the affected object is in scope. Which is of course not true.
>>
>> As a reminder, the canonical offending usage (anti-)pattern with
>> (deprecated, but easier to write) finalizers is
>>
>> class Foo {
>>     private long mPtrToNativeFoo;
>>
>>     private static native void nativeDeallocate(long nativePtr);
>>     private static native void nativeDoSomething(long nativePtr, long
>> anotherNativePtr);
>>
>>     protected void finalize() { ... nativeDeallocate(mPtrToNativeFoo);
>> ... }
>>
>>     public void doSomething(Foo another) { ...
>> nativeDoSomething(mPtrToNativeFoo, another.mPtrToNativeFoo) ... }
>>     ...
>> }
>>
>> This is subtly incorrect in that, while executing the final call to
>> doSomething() on a particular object, just after retrieving
>> mPtrToNativeFoo
>> and another.mPtrToNativeFoo, but before invoking nativeDoSomething(), the
>> garbage collector may run, and "this" and "another" may be finalized,
>> deallocating the native objects their mPtrToNativeFoos refer to.
>> When nativeDoSomething() finally does run, it may see dangling pointers.
>>
>> Examples using java.lang.ref or Cleaners (or even WeakHashMap, if you
>> must)
>> instead of finalize() are as easy to construct, but a bit longer. (The
>> finalizer version is also arguably incorrect in other ways that are not
>> relevant here. Pretend this were written in terms of PhantomReferences.)
>>
>> It is easily possible to construct 100% Java code with the same problem.
>> Instead of mPtrToNativeFoo, each object stores an integer handle that is
>> used to access additional Java state logically associated with the object.
>> But the native pointer case seems to dominate in practice.
>>
>> Various solutions to this have been proposed, but none seem quite
>> attractive enough that I actually feel comfortable asking people to update
>> their code to use them. Noteworthy proposals include:
>>
>> 0) Explicitly call close() on such objects. Great idea when it works. In
>> general it doesn't, since the code needs to know when the enclosing Java
>> object is no longer needed. If we always knew that we wouldn't need a GC.
>> 1) Various hacks to keep variables live, e.g. the one based on
>> synchronized
>> blocks I advocated in my 2004 JavaOne talk. These are slow and ugly, as
>> we've always admitted. Nobody used them. Incorrect won over slow, ugly,
>> and
>> complicated ~100% of the time.
>> 2) Java 9's reachabilityFence(). This is better. It can be implemented so
>> that it's no longer slow. But in many common cases, it's still quite ugly.
>> For example, the call to nativeDoSomething() above must be followed by two
>> reachabilityFences, one on this and one on another. And to do this really
>> correctly, the whole thing would often need to be in a try...finally
>> block.
>> And in reality code following this pattern usually doesn't have just a
>> single doSomething method that needs this treatment, but may easily have
>> dozens. And the rules for placing reachabilityFences can become quite
>> subtle if there are e.g. locally allocated objects involved. My assessment
>> is that this isn't good enough. People may no longer write incorrect code
>> 100% of the time, but I'd bet on 70%+.
>> 3) JNI functions can be rewritten, so that the containing Java object is
>> passed in addition to the native pointers. Somewhat accidentally, this
>> happens to be roughly free for single argument functions. (Delete
>> "static".) It adds overhead in other cases, like the one above, and the
>> rewriting can get somewhat convoluted. In some cases, it doesn't work at
>> all. AFAIK, it's never actually guaranteed to be correct; it works because
>> standard implementations don't optimize across the language boundary.
>> That's not too likely to change. Maybe.
>> 4) We could change the language spec to always prohibit premature
>> finalization/cleaning in cases like the above. I could personally live
>> with
>> that solution, and proposed it internally here in the past. But it doesn't
>> seem to go over well among implementers. And AFAICT, doing it well
>> requires
>> significant tooling changes, in that we do want to reliably treat local
>> variables as dead once they go out of scope, a piece of information that
>> doesn't seem to be reliably preserved in class files. One could argue that
>> the current class file design implicitly assumes that we can do dead
>> variable elimination.
>>
>> After going back and forth on this, my conclusion is that we need a
>> linguistic mechanism for identifying the case in which the garbage
>> collector is being used to managed external resources associated with a
>> field.
>
> So kind of the opposite of WeakReference - a SuperStrongReference :).
>
> Kidding aside, it seems like the way you’d want to encapsulate this at the
> language level is via a type that the JVM intrinsically knows about; in
> this way it’s similar to the reference types today.
>
> An annotation probably does the trick when the value doesn’t escape from
> the enclosing instance but I’ve no idea if that assumption covers enough
> code to warrant this approach.  AFAICT, if the value escapes into an
> instance of another type that doesn’t annotate its field then all bets are
> off.
>
> Having a wrapper type would at least make it harder to leak the  native
> handle vs the annotation approach.  But of course the wrapper comes with
> footprint and indirection costs.  Maybe Valhalla could allow exposing some
> magic value type that’s a zero-cost wrapper but preserves the type
> information the JIT can track?
>
>> A (still slowly evolving) proposal to add an annotation to do so is
>> at
>>
>> https://docs.google.com/document/d/1yMC4VzZobMQTrVAraMy7xBdWPCJ0p
>> 7G5r_43yaqsj-s/edit?usp=sharing
>>
>> In many ways, this is inherently a compromise. But in the vast majority of
>> cases, it greatly reduces the required source code changes over all but
>> (4)
>> above. And I think I could usually explain to an author of currently
>> broken
>> code in under 5 minutes exactly what they need to do to fix it. And I
>> wouldn't have to lie much to do so. I don't think (0)-(3) above share that
>> property.
>>
>> This has already benefited from comments provided by a few people. (Thanks
>> to Doug, Jeremy, and Martin, among others.) But I would really like more
>> feedback, including suggestions about how to proceed.
>>
>> Hans
>>
> --
> Sent from my phone
>