JNI WeakGlobalRefs

Thu Jul 22 08:20:36 UTC 2021

Hi Hans,

So you are saying that a jweak allows you to peek into the finalizer graph of a
finalizing object from the outside due to having phantom semantics, and that
is sometimes bad. I agree. As you noted, if you don't use finalizers, there is no
problem here.

But let's assume we do what you propose, and make jweak have weak semantics,
instead of phantom semantics. First of all, have we now removed all gimmicks
that allow you to peek through finalizers? I would say no.

Consider the following (same notation as your example):

F ---> ... ---> F

Here we have an object with a finalizer that can reach another object with a
finalizer. They can both get finalized by the same GC cycle. So depending on who
is enqueued for finalization first, you may or may not be able to peek into an
already finalized object here, from the outside, through another finalizer. 
One might argue about the probability of users using an object behind a jweak
vs finalizer, without knowing their implementation. But it's bad news for both.
Once again, if finalizers are not used, there is no problem.

Secondly, as for when you would ever want jweak to have phantom semantics...
I don't think this is uncommon, in particular for native code. It is not uncommon
to use JNI to implement a native mirror component for a Java object. The Java
object and its native data structure are to be considered as one logical unit that
stick together. The Java object has a pointer to its native part, and the native part
wants a pointer back to its Java object, so they can talk to each other seamlessly.
You don't want the back pointer to be strong because it would create memory leaks,
so you use a jweak. The expectation is that surely, as long as this Java object is
around, you can access it through the jweak. Especially since the spec says so, it's
a very sound assumption. Then some cleaner will delete the data structure when
it is no longer relevant. So in contexts where you are in the native data structure
and haven't passed around the object reference, you can always get the object
through the jweak as long as it isn't dead.

Now if we change the semantics of jweak to weak instead, then every time the
Java object is reachable through a finalizer only, the logic will be wrong. The native
part thinks the object is dead, but it isn't. So if anyone uses this object through
a finalizer, bad things can happen. Kind of like a native object monitor not working
properly in finalizers. You essentially no longer can have an object with a native
component point at each other, without running into a bunch of issues. Either
introducing a memory leak, or having the object be crippled when reachable from
finalizers. For similar reasons, all native weak references used internally in HotSpot,
do have phantom semantics, and rely on that being the case.

So it's a two edged sword I think. What is true for this whole discussion though is
that if we don't have finalizers, then there isn't really a problem. Finalizers are
deprecated, so hopefully its use will die out over time. What is also true is that
by changing the semantics of jweak, you trade one problem for another one.
This seems particularly nasty to change, as the spec clearly states what users
can expect from this API.

Hope this helps.

/Erik

> [ Moving here from core-libs-dev on David Holmes' recommendation. ]
> 
> I'm concerned that the current semantics of JNI WeakGlobalRefs are still
> dangerous in a very subtle way that is hidden in the spec. The current
> (14+) spec says:
> 
> “Weak global references are related to Java phantom references
> (java.lang.ref.PhantomReference). A weak global reference to a specific
> object is treated as a phantom reference referring to that object when
> determining whether the object is phantom reachable (see java.lang.ref).
> ---> Such a weak global reference will become functionally equivalent to
> NULL at the same time as a PhantomReference referring to that same object
> would be cleared by the garbage collector. <---”
> 
> (This was the result of JDK-8220617, and is IMO a large improvement over the
> prior version, but ...)
> 
> Consider what happens if I have a WeakGlobalRef W that refers to a Java
> object A which, possibly indirectly, relies on an object F, where F is
> finalizable, i.e.
> 
> W - - -> A -----> ... -----> F
> 
> Assume that F becomes invalid once it is finalized, e.g. because the finalizer
> deallocates a native object that F relies on. This seems to be a very common
> case. We are then exposed to the following scenario:
> 
> 0) At some point, there are no longer any other references to A or F.
> 1) F is enqueued for finalization.
> 2) W is dereferenced by Thread 1, yielding a strong reference to A and
> transitively to F.
> 3) F is finalized.
> 4) Thread 1 uses A and F, accessing F, which is no longer valid.
> 5) Crash, or possibly memory corruption followed by a later crash elsewhere.
> 
> (3) and (4) actually race, so there is some synchronization effort and cost
> required to prevent F from corrupting memory. Commonly the implementer
> of W will have no idea that F even exists.
> 
> I believe that typically there is no way to prevent this scenario, unless the
> developer adding W actually knows how every class that A could possibly rely
> on, including those in the Java standard library, are implemented.
> 
> This is reminiscent of finalizer ordering issues. But it seems to be worse, in
> that there isn't even a semi-plausible workaround.
> 
> I believe all of this is exactly the reason PhantomReference.get() always
> returns null, while WeakReference provides significantly different semantics,
> and WeakReferences are enqueued when an object is enqueued for
> finalization.
> 
> The situation improves, but the problem doesn't fully disappear, in a
> hypothetical world without finalizers. It's still possible to use WeakGlobalRef
> to get a strong reference to A after a WeakReference to A has been cleared
> and enqueued. I think the problem does go away if all cleanup code were to
> use PhantomReference-based Cleaners.
> 
> AFAICT, backward-compatibility aside, the obvious solution here is to have
> WeakGlobalRefs behave like WeakReferences. My impression is that this
> would fix significantly more broken clients than it would break correct ones,
> so it is arguably still a viable option.
> 
> There is a case in which the current semantics are actually the desired ones,
> namely when implementing, say, a String intern table. In this case it's
> important the reference not be cleared even if the referent is, at some
> point, only reachable via a finalizer. But this use case again relies on the
> programmer knowing that no part of the referent is invalidated by a finalizer.
> That's a reasonable assumption for the Java-implementation-provided String
> intern table. But I'm not sure it's reasonable for any user-written code.
> 
> There seem to be two ways forward here:
> 
> 1) Make WeakGlobalRefs behave like WeakReferences instead of
> PhantomReferences, or
> 2) Add strong warnings to the spec that basically suggest using a strong
> GlobalRef to a WeakReference instead.
> 
> Has there been prior discussion of this? Are there reasonable use cases for
> the current semantics? Is there something else that I'm overlooking? If not,
> what's the best way forward here?
> 
> (I found some discussion from JDK-8220617, including a message I posted.
> Unfortunately, it seems to me that all of us overlooked this issue?)
> 
> Hans