Ephemerons

Sun Jan 24 20:14:18 UTC 2016

Hi Gil,

The algorithm complexity has been on my mind too and I've been thinking 
of how to maintain a key (or it's address) -> ephemeron mapping most 
efficiently. As I have written in a reply to Mandy, if we say that 
individual phase2 of processing of the normal References has complexity 
O(N) where N is the number of discovered references, then the equivalent 
phase2 processing of Ephemerons has comparable complexity O(N*d) where N 
is the number of discovered Ephemerons and d is the maximum number of 
"hops" in object graph where a "hop" is defined as a point in chain 
where the value of some ephemeron refers directly or indirectly to the 
key of some other ephemeron and the value of that other ephemeron 
continues the chain. Let's hope that in practical data structures, d is 
not large. The maximum value of d is N, as in one of the tests. The 
algorithm in the prototype tries to reduce the number of outer 
iterations to much less than 'd' in many situations by alternating the 
direction of scanning in each pass. The value(ephemeronX) -> 
key(ephemeronY) links could be arranged in a zig-zag pattern relative to 
the order of discovered references in the list though which would be the 
worst case for the algorithm, but I think the chances of that happening 
in a real-world data structure are low. But we have to think of possible 
denial-of-service attacks too, so I agree that worst-case scenario has 
to be improved.

Presently, Reference objects are discovered and hooked on single-linked 
_discoveredXXX lists (using Reference.discovered field to point to the 
next in list). Each reference type has it's own set of lists. Multiple 
lists in a set enable parallel (concurrent?) discovery without 
contention (a list per thread).

So I was thinking that one possibility would be for ephemerons to have a 
bigger table of discovered lists - not just one per discovery thread, 
but quite bigger and that the index of a list to which discovered 
ephemeron is added would be computed from the ephemeron's key. A 
classical hash-table with buckets. Big enough table would minimize 
contention when discovery is parallelized and enable the following 
algorithm (which could also be parallelized):

Let each bucket (each head of the list) have 2 boolean flags: include & 
pending.
At the start, set include flags of all buckets to true and then enter 
the loop:
do {
     reset pending flags of all buckets to false;
     for each bucket with (include == true), do {
         scan the ephemerons and for those with live keys, mark value 
and transitive closure as live.
         while marking the value and transitive closure as live, for 
each object that was newly marked alive,
         compute the index into the ephemeron buckets as though such 
object was ephemeron's key
         and set the pending flag of that bucket.
     }
     set the include flags of all buckets from the pending flags of the 
buckets and count # of pending buckets
} while (# of pending buckets > 0);

Hm, ....

Regards, Peter

On 01/24/2016 06:43 PM, Gil Tene wrote:
> A note on implementation possibilities:
>
> If I read the implementation correctly, a "weakness" of the current 
> implementation approach for making sure value-referents (and their 
> transitively reachable) objects are kept alive if key referents are 
> alive is that it requires multiple passes through the discovered 
> Ephemeron list, with the passes terminating when the list stabilizes. 
> While I think that this is sound (in the sense that it will work), it 
> carries a potentially high cost when large sets of Ephemerons exist in 
> the heap. E.g. if the Ephemerons are linked in a k-v list (where the 
> value referent of one ephemeron is the key of another, in a chain), as 
> in your code example, there is an N^2 scanning thing going on. And 
> e.g. if a large set of ephemeron keys become weakly reachable in a 
> single cycle (e.g. a large cache was discarded) while other ephemeron 
> participate in some linked list relationship, the entire list of 
> [stably] weak-keyed ephemerons has to be traversed in each pass (in 
> case one of them has become live in a previous pass). I'd worry that 
> these computational complexity issues could become prohibitive enough 
> in GC cost that you there would be significant resistance to their 
> adoption.
>
> Note that in comparison (to my understanding), current ref processing 
> work involved in GC handling soft/weak/final/phantom refs remains 
> linear to the number of refs, and does not have an O(N^2) component.
>
> I believe that there is a relatively simple way to bring Ephemeron 
> processing to O(N) by establishing reverse mapping during the scan 
> (the below description assumes STW during the scan):
>
> 1. Start ref processing with no reverse mapping table established.
>
> 2. During ref processing, establish an EphemeronKeyReverseMapping 
> table (logically a Map<Address, List<Ephemerons>>) which would map 
> individual heap addresses to ephemerons who's key referent points to 
> those addresses.
>
> 3. Note that since each heap address can show up in multiple key 
> referents, the map needs to return a (potentially empty) list of 
> Ephemerons who's keys refer to the address.
>
> 4. Specifically, starting with an empty list, and for each discovered 
> Ephemeron, add a reverse-mapping entry to the 
> EphemeronKeyReverseMapping, mapping from the key referent address to 
> the Ephemeron.
>
> 5. During Ephemeron processing (under the case where the referent is 
> found to be alive and the ephemeron then needs to keep the value 
> referent alive) mark down the value referent path using a 
> special ephemeron_keep_alive OopClosure (or a mode flag that affects 
> the normal keep_alive behavior) which, when reaching a 
> not-yet-marked-live object [in addition to marking it live and 
> traversing it as keep_alive would normally do] would look up the 
> object's address in the EphemeronKeyReverseMapping to get a list of 
> Ephemerons to traverse, and traverse each of the mapped Ephemerons' 
> value referent with the same ephemeron_keep_alive closure.
> Note: doing reverse-mapping lookups on each not-yet-marked object in a 
> keep_alive closure will add cost, which is why 
> this ephemeron_keep_alive pass should probably be done after regular 
> keep_alive passes have had their chance to mark objects live. This way 
> only paths that become newly-live via ephemeron processing are subject 
> to the extra reverse-mapping-lookip cost.
>
> While I haven't been poking at this too long to see if it has holes, I 
> think it can produce a reliable result, and is O(N) on the count of 
> Ephemerons.
>
> — Gil.
>
>
>> On Jan 24, 2016, at 2:52 AM, Peter Levart <peter.levart at gmail.com 
>> <mailto:peter.levart at gmail.com>> wrote:
>>
>> Hi Gil,
>>
>> I totally agree with your assessment. We should not introduce another 
>> way of reviving the almost collectable objects and I fully support 
>> tightening the specification so that soft and weak references to the 
>> same referent and to other referents from which this referent is 
>> reachable are required to be cleared together atomically.
>>
>> I modified the prototype to (hopefully) adhere to this new Ephemeron 
>> specification that Gil and I agreed upon. Anyone interested in 
>> experimenting can find it here:
>>
>> http://cr.openjdk.java.net/~plevart/misc/Ephemeron/webrev.jdk.02/
>> http://cr.openjdk.java.net/~plevart/misc/Ephemeron/webrev.hotspot.02/
>>
>> It is rebased to current tip of jdk9-dev repositories (after the bulk 
>> of merges for jdk-9+102), but still contains the change to remove the 
>> Cleaner reference type as it has not yet managed to get in...
>>
>> I have also added a test that is a start for verifying the functionality.
>>
>> Regards, Peter
>>
>> On 01/23/2016 07:25 PM, Gil Tene wrote:
>>>
>>>> On Jan 23, 2016, at 5:14 AM, Peter Levart <peter.levart at gmail.com> 
>>>> wrote:
>>>>
>>>> Hi Gil, it's good to have this discussion. See comments inline...
>>>>
>>>> On 01/23/2016 05:13 AM, Gil Tene wrote:
>>>> ....
>>>>>> On Jan 22, 2016, at 2:49 PM, Peter Levart 
>>>>>> <peter.levart at gmail.com> wrote:
>>>>>>
>>>>>> Ephemeron always touches definitions of at least two consecutive 
>>>>>> strengths of reachabilities. The prototype says:
>>>>>>
>>>>>>  * <li> An object is <em>weakly reachable</em> if it is neither
>>>>>>  * strongly nor softly reachable but can be reached by traversing a
>>>>>>  * weak reference or by traversing an ephemeron through it's 
>>>>>> value while
>>>>>>  * the ephemeron's key is at least weakly reachable.
>>>>>>
>>>>>>  * <li> An object is <em>ephemerally reachable</em> if it is neither
>>>>>>  * strongly, softly nor weakly reachable but can be reached by 
>>>>>> traversing an
>>>>>>  * ephemeron through it's key or by traversing an ephemeron 
>>>>>> through it's value
>>>>>>  * while it's key is at most ephemerally reachable. When the 
>>>>>> ephemerons that
>>>>>>  * refer to ephemerally reachable key object are cleared, the key 
>>>>>> object becomes
>>>>>>  * eligible for finalization.
>>>>>
>>>>> Looking into this a bit more, I don't think the above is quite 
>>>>> right. Specifically, If an ephemeron's key is either strongly of 
>>>>> softly reachable, you want the value to remain appropriately 
>>>>> strongly/softly reachable. Without this quality, Ephemeron value 
>>>>> referents can (and will) be prematurely collected and finalized 
>>>>> while the keys are not. This (IMO) needed quality not provided by 
>>>>> the behavior you specify…
>>>>
>>>> This is not quite true. While ephemeron's value is weakly or even 
>>>> ephemerally-reachable, it is not finalizable, because 
>>>> ephemeraly-reachable is stronger than finaly-reachable. After 
>>>> ephemeron's key becomes ephemeraly-reachable, the ephemeron is 
>>>> cleared by GC which sets it's key *and* value to null atomically. 
>>>> The life of key and value at that moment becomes untangled. Either 
>>>> of them can have a finalizer or not and both of them will 
>>>> eventually be collected if not revived by their finalize() methods. 
>>>> But it can never happen that ephemeron's value is finalized or 
>>>> collected while it's key is still reachable through the ephemeron 
>>>> (while the ephemeron is not cleared yet).
>>>>
>>>> But I agree that it would be desirable for ephemeron's value to 
>>>> follow the reachability of it's key. In above specification, if the 
>>>> key is strongly reachable, the value is weakly reachable, so any 
>>>> WeakReferences or SoftReferences pointing at the Ephemeron's value 
>>>> can already be cleared while the key is still strongly reachable. 
>>>> This is arguably no different than current specification of Soft 
>>>> vs. Weak references. A SoftReference can already be cleared while 
>>>> its referent is still reachable through a WeakReference,
>>>
>>> We seem to agree about the cleaner behavior specification (in both 
>>> of our texts below), so the these next paragraphs are really about 
>>> arguing for why this is an important design choice if/when adding 
>>> Ephemerons to Java:
>>>
>>> It is true the [current] spec allows for soft references to an 
>>> object to be cleared while weak references to the same object are 
>>> not: the "determines" in "Suppose that the garbage collector 
>>> determines at a certain point in time hat an object is RRRR 
>>> reachable..." part [for RRRR = {soft, weak}] does not have to happen 
>>> at the same "certain point in time".
>>>
>>> However, to my knowledge all current implementations present as if 
>>> this determination is happening at the same "point in time" for all 
>>> weakly and softly reachable objects combined. Specifically [in 
>>> implementations]: if soft reachability is determined for an object 
>>> at some point in time, then weak reachability for that object is 
>>> determined at the same point in time. And the weak reachability 
>>> determination for an object depends on whether the collector chose 
>>> to clear existing soft references to that object at that same point 
>>> in time, with the appearance of the choice to clear (or not to 
>>> clear) soft references to a given object atomically affecting the 
>>> determination of it's weak reachability. Since the collector is 
>>> *required* to act on a weak determination when it is made, while it 
>>> *may* act on a soft determination when it is made, making the 
>>> combined determination at the same "point in time" eliminates an 
>>> obviously confusing situation that is not prohibited by the spec: if 
>>> the determination for weak and soft reachability was not done at the 
>>> same point in time, then an object that was softly reachable and had 
>>> it's soft references cleared and queued could later become strongly 
>>> reachable, and even softly reachable again. When reference 
>>> processing is done as a STW thing, this "combined determination" 
>>> effect is a trivial side-effect of STW. When it is done concurrently 
>>> (or incrementally?), implementations still work to maintain the 
>>> appearance of combined atomic determination of soft and weak 
>>> reachability. I know ours does. In our case, we do it because we had 
>>> no desire to be the ones to argue "I know that all implementations 
>>> did this atomically because they were STW, but the spec allows us to 
>>> add this bug to your program…".
>>>
>>> So in actual implementations (to my knowledge), finalization is 
>>> currently the only mechanism that can create this "strange 
>>> situation" where an object was no longer strongly reachable, had 
>>> actions triggered as a result from loss of strong reachability (i.e. 
>>> actually observed by the program as "known to not be strongly 
>>> reachable"), and later became strongly reachable again. E.g. a 
>>> finalizer can propagate a strong reference to a previously 
>>> non-strongly reachable object ('this' in the finalizer, or anything 
>>> that 'this' transitively refers and was not otherwise reachable when 
>>> the finalizer was called).. This is one of those "undesired" things 
>>> that the introduction of Reference types was meant to deal with 
>>> (Reference types were introduced in 1.2, after finalization was 
>>> unfortunately already included and spec'ed. And phantom refs were 
>>> meant to allow for a cleaner form that could replace finalization). 
>>> And while the specifications of SoftReference and WeakReference do 
>>> not prohibit it, implementations are not required to allow it, and 
>>> in practice non of them do (I think), as doing so would most likely 
>>> expose some "interesting" spec-allowed-but-extremely-surprising 
>>> things/bugs that none of us want to have to defend...
>>>
>>> In this context, it would be a "highly undesirable" design choice to 
>>> introduce Ephemerons in a way that would them to return a strong 
>>> reference to an object that has previously been determined to no 
>>> longer be strongly reachable. Structuring the spec to prohibit this 
>>> is a better design choice.
>>>
>>> To highlight the design choice here, let me describe a specific 
>>> problem scenario for which the previous (above) spec would cause 
>>> "re-strengthening" behavior that would break assumptions that are 
>>> allowed under the current spec: in the above/previously specified 
>>> behavior an object V that is known to have no finalizers, but has 
>>> e.g. 3 WeakReference objects that refer to it, can become weakly 
>>> reachable while both a key referent object K in some ephemeron E 
>>> with a value referent of V remain strongly reachable. At such a 
>>> point (V is weakly reachable, K and E are strongly reachable), the 
>>> collector may determine weak reachability for V, [atomically] clear 
>>> all weak references to V, and enqueue those weak reference objects 
>>> on their respective queues. While V is still ephemerally reachable 
>>> under your previous definition, there are no references to it 
>>> anywhere other than in ephemeron value referent fields, and weak 
>>> references that did refer to it have been cleared and queued. Since 
>>> the ephemeron is still there, and the key is still there, and the 
>>> ephemeron has not been cleared, an Ephemeron.getValue() call would 
>>> create a strong reference to an object that was previously 
>>> determined to not be weakly reachable. Re-creating a strong 
>>> reference to V after the point where weak references to V were 
>>> cleared and the weak refs to it were enqueued would be "surprising" 
>>> to current weak reference based code (the only thing that could 
>>> cause this under the current spec would be a finalizer), so allowing 
>>> that (jn the spec) is likely to break all kinds of logic that 
>>> depends on currently spec'ed weak reference behaviors.
>>>
>>> The spec'ed behavior we seem to be agreeing on (below) would 
>>> prohibit this loophole and would [I think] maintain any 
>>> reachability-based expectations that current weak-ref based logic 
>>> can make under the current spec. Maintaining this continuity is an 
>>> important design choice for adding Ephemerons into the current set 
>>> of Reference behaviors.
>>>
>>> And since I suspect that all implementations will continue to choose 
>>> to do the "determination" of soft and weak reachability at the same 
>>> "point in time", this will fit well with how people would build this 
>>> stuff anyway.
>>>
>>> Separate note: It would be separately interesting to consider 
>>> narrowing the SoftRef spec to require JVM implementations to 
>>> atomically clear all soft *and* weak references to an object at the 
>>> same time. I.e. if the garbage collector chooses to clear a soft 
>>> reference to an object that would become weakly reachable as a 
>>> result, then all weak references to that object must be [atomically] 
>>> cleared at the same time. Since I suspect that all current JVM 
>>> implementations actually adhere to this stronger requirement 
>>> already, this would not "hurt" anything or require extra work to 
>>> comply with. [Anyone from Metronome or some other non-STW reference 
>>> processing implementations want to chime in?].
>>>
>>>> but for Ephemeron's value this might be confusing. The easier to 
>>>> understand conceptual model for Ephemerons might be a pair of 
>>>> (WeakReference<K>, WeakReference<V>) where the key has a virtual 
>>>> strong reference to the value. And this is what we get if we say 
>>>> that reachability of the value follows reachability of the key.
>>>>
>>>>>
>>>>> For a correctly specified behavior, I think all strengths (from 
>>>>> strong down) need to be affected by key/value Ephemeron 
>>>>> relationships, but without adding an "ephemerally reachable" 
>>>>> strength. E.g. I think you fundamentally need something like this:
>>>>>
>>>>> - "An object is <em>strongly reachable</em> if it can be reached 
>>>>> by (a) some thread without traversing any reference objects, or by 
>>>>> (b) traversing the value of an Ephemeron whose key is strongly 
>>>>> reachable. A newly-created object is strongly reachable by the 
>>>>> thread that created it"
>>>>>
>>>>> - "An object is <em>softly reachable</em> if it is not 
>>>>> strongly reachable but can be reached by (a) traversing a soft 
>>>>> reference or by (b) traversing the value of an Ephemeron whose key 
>>>>> is softly reachable.
>>>>>
>>>>> - "An object is <em>weakly reachable</em> if it is neither 
>>>>> strongly nor softly reachable but can be reached by (a) traversing 
>>>>> a weak reference or by (b) traversing the value of an ephemeron 
>>>>> whose key is weakly reachable.
>>>>
>>>> ...and that's where we stop, because when we make Ephemeron just a 
>>>> special kind of WeakReference, the next thing that happens is:
>>>>
>>>>  * <p> Suppose that the garbage collector determines at a certain 
>>>> point in time
>>>>  * that an object is <a href="package-summary.html#reachability">weakly
>>>>  * reachable</a>.  At that time it will atomically clear all weak 
>>>> references to
>>>>  * that object and all weak references to any other 
>>>> weakly-reachable objects
>>>>  * from which that object is reachable through a chain of strong 
>>>> and soft
>>>>  * references.  At the same time it will declare all of the formerly
>>>>  * weakly-reachable objects to be finalizable.  At the same time or 
>>>> at some
>>>>  * later time it will enqueue those newly-cleared weak references 
>>>> that are
>>>>  * registered with reference queues.
>>>>
>>>> ...where "clearing of the WeakReference" means reseting the key 
>>>> *and* value to null in case it is an Ephemeron; and
>>>> "all weak references to some object" means Ephemerons that have 
>>>> that object as a key (but not those that only have it as a value!) 
>>>> in case of ephemerons
>>>>
>>>> ...
>>>>> I still think that Ephemeron<K, V> should extend WeakReference<K>, 
>>>>> since that places already established rules and expectation on (a) 
>>>>> when it will be enqueued, (b) when the collector will clear it 
>>>>> (when the the collector encounters the <K> key being weakly 
>>>>> reachable), and (c) that clearing of all Ephemeron *and* 
>>>>> WeakReference instances who share an identical key value is done 
>>>>> atomically, along with (d) all weak references to to any other 
>>>>> weakly-reachable objects from which that object is reachable 
>>>>> through a chain of strong and soft references. These last (c, d) 
>>>>> parts are critically captured since an Ephemeron *is a* 
>>>>> WeakReference, and the statement in WeakReference that says that 
>>>>> "… it will atomically clear all weak references to that object and 
>>>>> all weak references to any other weakly-reachable objects from 
>>>>> which that object is reachable through a chain of strong and soft 
>>>>> references." has a clear application.
>>>>>
>>>>> Here are some suggested edits to the JavaDoc to go with this 
>>>>> suggested spec'ed behavior:
>>>>> /**
>>>>>   * Ephemeron<K, V> objects are a special kind of WeakReference<K> 
>>>>> objects, which
>>>>>   * hold two referents (a key referent and a value referent) and 
>>>>> do not prevent their
>>>>>   * referents from being made finalizable, finalized, and then 
>>>>> reclaimed.
>>>>>   * In addition to the key referent, which adheres to the referent 
>>>>> behavior of a
>>>>>   * WeakReference<K>, an ephemeron also holds a value referent 
>>>>> whose reachabiliy
>>>>>   * strength is affected by the reachability strength of the key 
>>>>> referent:
>>>>>   * The value referent of an Ephemeron instance is considered:
>>>>>   * (a) strongly reachable if the key referent of the same Ephemeron
>>>>>   * object is strongly reachable, or if the value referent is 
>>>>> otherwise strongly reachable.
>>>>>   * (b) softly reachable if it is not strongly reachable, and (i) 
>>>>> the key referent of
>>>>>   * the same Ephemeron object is softly reachable, or (ii) if the 
>>>>> value referent is otherwise
>>>>>   * softly reachable.
>>>>>   * (c) weakly reachable if it is not strongly or softly 
>>>>> reachable, and (i) the key referent of
>>>>>   * the same Ephemeron object is weakly reachable, or (ii) if the 
>>>>> value referent is otherwise
>>>>>   * weakly reachable.
>>>>>   * <p> When the collector clears an Ephemeron object instance 
>>>>> (according to the rules
>>>>>   * expressed for clearing WeakReference object instances), the 
>>>>> Ephemeron instance's
>>>>>   * key referent value referent are simultaneously and atomically 
>>>>> cleared.
>>>>>   * <p> By convenience, the Ephemeron's referent is also called 
>>>>> the key, and can be
>>>>>   * obtained either by invoking {@link #get} or {@link #getKey} 
>>>>> while the value
>>>>>   * can be obtained by invoking {@link #getValue} method.
>>>>>   *...
>>>>
>>>>
>>>> Thanks, this is very nice. I do like this behavior more.
>>>>
>>>> Let me see what it takes to implement this strategy...
>>>>
>>>> Regards, Peter
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20160124/d30d1fbd/attachment.htm>