<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body style="background-color: rgb(255, 255, 255); color: rgb(0, 0,
0);" bgcolor="#FFFFFF" text="#000000">
Hi Kim,<br>
<br>
<div class="moz-cite-prefix">On 06/09/2015 07:44 AM, Peter Levart
wrote:<br>
</div>
<blockquote cite="mid:55767D37.4080100@gmail.com" type="cite"><!--[if !IE]><DIV style="border-left: 2px solid #009900; border-right: 2px solid #009900; padding: 0px 15px; margin: 2px 0px;"><![endif]-->
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
Hi Kim,<br>
<br>
Thanks for taking a look at this code.<br>
<br>
<div class="moz-cite-prefix">On 06/09/2015 04:03 AM, Kim Barrett
wrote:<br>
</div>
<blockquote
cite="mid:A71299B6-5F49-458D-8A12-ABCAADC37F5B@oracle.com"
type="cite"><!--[if !IE]><DIV style="border-left: 2px solid #009900; border-right: 2px solid #009900; padding: 0px 15px; margin: 2px 0px;"><![endif]--><!--[if !IE]><DIV style="border-left: 2px solid #009900; border-right: 2px solid #009900; padding: 0px 15px; margin: 2px 0px;"><![endif]-->
<pre wrap="">On May 31, 2015, at 3:32 PM, Peter Levart <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:peter.levart@gmail.com"><peter.levart@gmail.com></a> wrote:
</pre>
<blockquote type="cite"><!--[if !IE]><DIV style="border-left: 2px solid #009900; border-right: 2px solid #009900; padding: 0px 15px; margin: 2px 0px;"><![endif]--><!--[if !IE]><DIV style="border-left: 2px solid #009900; border-right: 2px solid #009900; padding: 0px 15px; margin: 2px 0px;"><![endif]-->
<pre wrap="">So, for the curious, here's the improved prototype:
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/%7Eplevart/misc/JEP132/ReferenceHandling/webrev.02/">http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/webrev.02/</a>
And here are the improved benchmarks (with some results inline):
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/%7Eplevart/misc/JEP132/ReferenceHandling/refproc/">http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/refproc/</a>
</pre>
<!--[if !IE]></DIV><![endif]--><!--[if !IE]></DIV><![endif]--></blockquote>
<pre wrap="">While perusing the offered code (webrev.02) (not a careful review
yet), I think I've found a serious bug:
In Reference.java
in unhookPendingChunk
253 // assert r.next == r;
254 // move a chunk of pending/discovered references to a
255 // temporary local r/next chain
...
262 rd.next = r;
I think that assertion suggested at line 253 does not hold, and line
262 may damage a queue.
The problem is that the application could explicitly enqueue a
reference while it is sitting in the pending list, waiting to be
processed. So the enqueue places the reference on the associated
queue's list, linked through the reference's next field. Then unhook
comes along and clobbers the reference's next field. Oops!
handlePendingChunk has similar problems with an application enqueue
before the reference has been "handled".
I haven't looked carefully at webrev.03, but at a quick glance it
looks like it has the same problem. (Which is what I expected, given
the description of the differences between webrev.02 and webrev.03.)
I'm not sure why unhook is using the next field at all, rather than
just continuing to use the discovered field. I've not studied this
idea carefully, but I think it might work to leave the chunks linked
through the discovered field until addChunk, with that being
responsible for self-looping the discovered field and transferring
linkage to the next field. That is, treat chunks as extensions of the
pending list until addChunk-time. There might be some that makes using
the discovered field that way a problem, but I'm not thinking of
anything right now.
Of course, this problem doesn't exist for FinalReference because
application code doesn't have access to them to perform the
problematic explicit enqueue.
</pre>
<!--[if !IE]></DIV><![endif]--><!--[if !IE]></DIV><![endif]--></blockquote>
<br>
Ops, you're right. Explicit enqueue-ing is what skipped my mind
since I've been 1st working on finalizers and then generalized the
solution... I'm afraid to use the 'discovered' field since it is
in the domain of VM and I think Java side can only reset it to
null while holding the lock. I also would not like to add another
field to Reference just to support "chunking". Perhaps the chunk
could be formed as a re-usable array of references. Let me think
about this some more to figure out how to solve it most
elegantly...<br>
<br>
I'll follow-up when I have something.<br>
<br>
Regards, Peter<br>
<br>
<!--[if !IE]></DIV><![endif]--></blockquote>
<br>
I think I was too afraid to use the 'discovered' field. I now think
the 'discovered' field is the right field to use to form "chunks" of
pending references. If I read the VM reference handling code
correctly, then 'discovered' field is treated as a normal instance
field as far as GC is concerned as soon as the Reference is not
active any more (reference.next != null). All pending references
found by Java code navigating 'pending' chain are already
"de-activated" by GC when discovered (reference.next == reference).
So it is safe to manipulate the 'discovered' field in Java code.
Here's the modified prototype that uses 'discovered' field to form
chunks of references:<br>
<br>
<a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/webrev.05/">http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/webrev.05/</a><br>
<br>
Note that enqueue-ing a chunk or a single Reference is a two-phase
process...<br>
<br>
1st the reference is ReferenceQueue.markEnqueued(Reference) which is
a serialization point. After that point the Reference's .next field
is only manipulated by a single thread. This happens in two places:<br>
- ReferenceQueue.enqueue(reference), line 60 - called by public
Reference.enqueue()<br>
- Reference.handlePendingChunk(chunk), line 317<br>
<br>
2nd the chain of .next-field-connected References is added
(RefereceQueue.addChunk(head, tail), line 83) to the queue.<br>
<br>
With this arrangement, I get even less CPU overhead when running the
benchmark. Most probably because unhooking the chunk of pending
references now doesn't involve writes to many fields - just two
writes per chunk: the 'discovered' field of the last Reference in
chunk is reset to null and the 'pending' static field is set to
point to the 1st of the remaining References:<br>
<br>
<br>
<tt>Finalization throughput, ORIGINAL</tt><tt><br>
</tt><tt><br>
</tt><tt> construction threads: 1</tt><tt><br>
</tt><tt> instances per thread: 10000000</tt><tt><br>
</tt><tt> cons. per 1 ms sleep: 100</tt><tt><br>
</tt><tt> ~5s interval
cumulative active pool queued</tt><tt><br>
</tt><tt> t[ms] construct. destruct. in-flight con./ms des./ms
con./ms des./ms thr. size tasks</tt><tt><br>
</tt><tt> ------- ---------- ---------- ---------- ------- -------
------- ------- ----- ----- ----------</tt><tt><br>
</tt><tt> 5000 441600 0 441600 88
0 88 0</tt><tt><br>
</tt><tt> 10002 883400 0 883400 88
0 88 0</tt><tt><br>
</tt><tt> 15005 1301400 244748 1056652 83
48 86 16</tt><tt><br>
</tt><tt> 20007 1747100 244748 1502352 89
0 87 12</tt><tt><br>
</tt><tt> 25009 2192600 244748 1947852 89
0 87 9</tt><tt><br>
</tt><tt> 30011 2625000 488320 2136680 86
48 87 16</tt><tt><br>
</tt><tt> 35013 3069900 488320 2581580 88
0 87 13</tt><tt><br>
</tt><tt> 40015 3514600 488320 3026280 88
0 87 12</tt><tt><br>
</tt><tt> 45017 3956200 488320 3467880 88
0 87 10</tt><tt><br>
</tt><tt> 50018 4397600 488320 3909280 88
0 87 9</tt><tt><br>
</tt><tt> 55020 4703200 4501263 201937 61
802 85 81</tt><tt><br>
</tt><tt> 60022 5152200 4501263 650937 89
0 85 74</tt><tt><br>
</tt><tt> 65023 5604600 4501263 1103337 90
0 86 69</tt><tt><br>
</tt><tt> 70024 6051400 4501263 1550137 89
0 86 64</tt><tt><br>
</tt><tt> 75025 6503100 4501263 2001837 90
0 86 59</tt><tt><br>
</tt><tt> 80026 6880800 6831578 49222 75
466 85 85</tt><tt><br>
</tt><tt> 85027 7329700 6831578 498122 89
0 86 80</tt><tt><br>
</tt><tt> 90027 7775900 6831578 944322 89
0 86 75</tt><tt><br>
</tt><tt> 95028 8221800 6831578 1390222 89
0 86 71</tt><tt><br>
</tt><tt> 100029 8666400 6831578 1834822 88
0 86 68</tt><tt><br>
</tt><tt> 105030 9111000 6831578 2279422 88
0 86 65</tt><tt><br>
</tt><tt> 110031 9559200 6831578 2727622 89
0 86 62</tt><tt><br>
</tt><tt> 115032 10000000 6831578 3168422 88
0 86 59</tt><tt><br>
</tt><tt> 120792 10000000 10000000 0 0
550 82 82</tt><tt><br>
</tt><tt><br>
</tt><tt>real 2m0.962s</tt><tt><br>
</tt><tt>user 0m32.641s</tt><tt><br>
</tt><tt>sys 0m1.897s</tt><tt><br>
</tt><tt><br>
</tt><tt><br>
</tt><tt>Finalization throughput, PATCHED</tt><tt><br>
</tt><tt><br>
</tt><tt> construction threads: 1</tt><tt><br>
</tt><tt> instances per thread: 10000000</tt><tt><br>
</tt><tt> cons. per 1 ms sleep: 100</tt><tt><br>
</tt><tt> ~5s interval
cumulative active pool queued</tt><tt><br>
</tt><tt> t[ms] construct. destruct. in-flight con./ms des./ms
con./ms des./ms thr. size tasks</tt><tt><br>
</tt><tt> ------- ---------- ---------- ---------- ------- -------
------- ------- ----- ----- ----------</tt><tt><br>
</tt><tt> 5000 434800 0 434800 86
0 86 0 0 0 0</tt><tt><br>
</tt><tt> 10001 868600 0 868600 86
0 86 0 0 0 0</tt><tt><br>
</tt><tt> 15002 1294800 244498 1050302 85
48 86 16 0 0 0</tt><tt><br>
</tt><tt> 20005 1727700 244498 1483202 86
0 86 12 0 0 0</tt><tt><br>
</tt><tt> 25006 2162200 244498 1917702 86
0 86 9 0 0 0</tt><tt><br>
</tt><tt> 30009 2592400 489422 2102978 85
48 86 16 0 0 0</tt><tt><br>
</tt><tt> 35010 3026800 489422 2537378 86
0 86 13 0 0 0</tt><tt><br>
</tt><tt> 40011 3417000 3349549 67451 78
571 85 83 0 1 0</tt><tt><br>
</tt><tt> 45012 3851700 3349549 502151 86
0 85 74 0 0 0</tt><tt><br>
</tt><tt> 50013 4286900 3349549 937351 87
0 85 66 0 0 0</tt><tt><br>
</tt><tt> 55015 4713500 3611309 1102191 85
52 85 65 0 0 0</tt><tt><br>
</tt><tt> 60017 5147400 3611309 1536091 86
0 85 60 0 0 0</tt><tt><br>
</tt><tt> 65018 5582500 3611309 1971191 87
0 85 55 0 0 0</tt><tt><br>
</tt><tt> 70019 5981300 5656682 324618 79
409 85 80 0 0 0</tt><tt><br>
</tt><tt> 75019 6407400 6397329 10071 85
148 85 85 0 1 0</tt><tt><br>
</tt><tt> 80020 6844300 6397329 446971 87
0 85 79 0 0 0</tt><tt><br>
</tt><tt> 85021 7281100 6397329 883771 87
0 85 75 0 0 0</tt><tt><br>
</tt><tt> 90022 7702600 7407984 294616 84
202 85 82 0 0 0</tt><tt><br>
</tt><tt> 95022 8136700 7407984 728716 86
0 85 77 0 0 0</tt><tt><br>
</tt><tt> 100024 8560400 8412857 147543 84
200 85 84 0 1 0</tt><tt><br>
</tt><tt> 105024 8996200 8412857 583343 87
0 85 80 0 0 0</tt><tt><br>
</tt><tt> 110025 9430300 8412857 1017443 86
0 85 76 0 0 0</tt><tt><br>
</tt><tt> 115025 9853700 9480645 373055 84
213 85 82 0 0 0</tt><tt><br>
</tt><tt> 120026 10000000 9480645 519355 29
0 83 78 0 0 0</tt><tt><br>
</tt><tt> 125220 10000000 10000000 0 0
99 79 79 0 0 0</tt><tt><br>
</tt><tt><br>
</tt><tt>real 2m5.323s</tt><tt><br>
</tt><tt>user 0m13.868s</tt><tt><br>
</tt><tt>sys 0m0.971s</tt><tt><br>
</tt><tt><br>
</tt><tt>-----</tt><tt><br>
</tt><tt><br>
</tt><tt>WeakReference processing throughput, ORIGINAL</tt><tt><br>
</tt><tt><br>
</tt><tt> construction threads: 1</tt><tt><br>
</tt><tt> instances per thread: 10000000</tt><tt><br>
</tt><tt> cons. per 1 ms sleep: 100</tt><tt><br>
</tt><tt> ~5s interval
cumulative active pool queued</tt><tt><br>
</tt><tt> t[ms] construct. destruct. in-flight con./ms des./ms
con./ms des./ms thr. size tasks</tt><tt><br>
</tt><tt> ------- ---------- ---------- ---------- ------- -------
------- ------- ----- ----- ----------</tt><tt><br>
</tt><tt> 5000 450800 0 450800 90
0 90 0</tt><tt><br>
</tt><tt> 10004 902500 0 902500 90
0 90 0</tt><tt><br>
</tt><tt> 15006 1352000 204006 1147994 89
40 90 13</tt><tt><br>
</tt><tt> 20009 1807000 204006 1602994 90
0 90 10</tt><tt><br>
</tt><tt> 25011 2254600 409643 1844957 89
41 90 16</tt><tt><br>
</tt><tt> 30013 2708900 409643 2299257 90
0 90 13</tt><tt><br>
</tt><tt> 35015 3132900 2934351 198549 84
504 89 83</tt><tt><br>
</tt><tt> 40017 3588100 2934351 653749 91
0 89 73</tt><tt><br>
</tt><tt> 45018 4036100 3152591 883509 89
43 89 70</tt><tt><br>
</tt><tt> 50020 4491100 3152591 1338509 90
0 89 63</tt><tt><br>
</tt><tt> 55022 4946200 3152591 1793609 90
0 89 57</tt><tt><br>
</tt><tt> 60023 5376000 4974927 401073 85
364 89 82</tt><tt><br>
</tt><tt> 65025 5824300 5555450 268850 89
116 89 85</tt><tt><br>
</tt><tt> 70027 6279100 5555450 723650 90
0 89 79</tt><tt><br>
</tt><tt> 75029 6726400 6299987 426413 89
148 89 83</tt><tt><br>
</tt><tt> 80030 7171800 6988921 182879 89
137 89 87</tt><tt><br>
</tt><tt> 85032 7627000 6988921 638079 91
0 89 82</tt><tt><br>
</tt><tt> 90033 8073200 7561200 512000 89
114 89 83</tt><tt><br>
</tt><tt> 95034 8527900 7561200 966700 90
0 89 79</tt><tt><br>
</tt><tt> 100035 8971400 8502933 468467 88
188 89 84</tt><tt><br>
</tt><tt> 105036 9426300 8502933 923367 90
0 89 80</tt><tt><br>
</tt><tt> 110037 9880500 8502933 1377567 90
0 89 77</tt><tt><br>
</tt><tt> 115038 10000000 8502933 1497067 23
0 86 73</tt><tt><br>
</tt><tt> 120390 10000000 10000000 0 0
279 83 83</tt><tt><br>
</tt><tt><br>
</tt><tt>real 2m0.557s</tt><tt><br>
</tt><tt>user 0m17.642s</tt><tt><br>
</tt><tt>sys 0m1.998s</tt><tt><br>
</tt><tt><br>
</tt><tt><br>
</tt><tt>WeakReference processing throughput, PATCHED</tt><tt><br>
</tt><tt><br>
</tt><tt> construction threads: 1</tt><tt><br>
</tt><tt> instances per thread: 10000000</tt><tt><br>
</tt><tt> cons. per 1 ms sleep: 100</tt><tt><br>
</tt><tt> ~5s interval
cumulative active pool queued</tt><tt><br>
</tt><tt> t[ms] construct. destruct. in-flight con./ms des./ms
con./ms des./ms thr. size tasks</tt><tt><br>
</tt><tt> ------- ---------- ---------- ---------- ------- -------
------- ------- ----- ----- ----------</tt><tt><br>
</tt><tt> 5000 437800 0 437800 87
0 87 0 0 0 0</tt><tt><br>
</tt><tt> 10001 873900 0 873900 87
0 87 0 0 0 0</tt><tt><br>
</tt><tt> 15003 1308021 203402 1104619 86
40 87 13 0 0 0</tt><tt><br>
</tt><tt> 20004 1746100 203402 1542698 87
0 87 10 0 0 0</tt><tt><br>
</tt><tt> 25005 2180100 407360 1772740 86
40 87 16 0 0 0</tt><tt><br>
</tt><tt> 30007 2616800 407360 2209440 87
0 87 13 0 0 0</tt><tt><br>
</tt><tt> 35008 3026600 2902826 123774 81
499 86 82 0 1 0</tt><tt><br>
</tt><tt> 40009 3463600 2902826 560774 87
0 86 72 0 0 0</tt><tt><br>
</tt><tt> 45010 3900200 2902826 997374 87
0 86 64 0 0 0</tt><tt><br>
</tt><tt> 50011 4331300 3121066 1210234 86
43 86 62 0 0 0</tt><tt><br>
</tt><tt> 55012 4768314 3121066 1647248 87
0 86 56 0 0 0</tt><tt><br>
</tt><tt> 60013 5187400 4923274 264126 83
360 86 82 0 0 0</tt><tt><br>
</tt><tt> 65013 5619500 5579423 40077 86
131 86 85 0 1 0</tt><tt><br>
</tt><tt> 70014 6061800 5579423 482377 88
0 86 79 0 0 0</tt><tt><br>
</tt><tt> 75015 6497100 5688790 808310 87
21 86 75 1 1 0</tt><tt><br>
</tt><tt> 80015 6937500 6462159 475341 88
154 86 80 0 0 0</tt><tt><br>
</tt><tt> 85019 7378600 6462159 916441 88
0 86 76 0 0 0</tt><tt><br>
</tt><tt> 90019 7813800 7083305 730495 87
124 86 78 0 0 0</tt><tt><br>
</tt><tt> 95020 8250500 7083305 1167195 87
0 86 74 0 0 0</tt><tt><br>
</tt><tt> 100021 8683100 7630681 1052419 86
109 86 76 0 0 0</tt><tt><br>
</tt><tt> 105021 9122100 7630681 1491419 87
0 86 72 0 0 0</tt><tt><br>
</tt><tt> 110022 9554091 8565311 988780 86
186 86 77 0 0 0</tt><tt><br>
</tt><tt> 115023 9993400 8565311 1428089 87
0 86 74 0 0 0</tt><tt><br>
</tt><tt> 120023 10000000 8565311 1434689 1
0 83 71 0 0 0</tt><tt><br>
</tt><tt> 125372 10000000 10000000 0 0
268 79 79 0 0 0</tt><tt><br>
</tt><tt><br>
</tt><tt>real 2m5.518s</tt><tt><br>
</tt><tt>user 0m12.869s</tt><tt><br>
</tt><tt>sys 0m1.093s</tt><tt><br>
</tt><br>
<br>
The prototype contains other things as well - not all of them might
be suitable for inclusion in JDK9. They could be selectively removed
from it or tailored. But I think they deserve to be mentioned and
considered:<br>
<br>
Using FJPool as a means to execute Finalizer(s) and other
j.l.r.Cleaner(s) which enables easy scaling to more than one worker
thread with less overhead than ReferenceQueue based processing.<br>
<br>
Public j.l.r.Cleaner interface. Soft, Weak or Phantom Reference
implementing this interface is automatically "cleaned" by a
"finalization" thread. j.l.r.PhantomCleaner is an example
implementation of such PhantomReference which also maintains a
doubly-linked list of instances to prevent their reclamation before
they are executed. It's API is similar to sun.misc.Cleaner, but can
be used by user code since it is executed by same thread(s) as
Finalizer(s). Not all applications need their Cleaner Reference(s)
to maintain a doubly-linked-list of instances, hence the public
j.l.r.Cleaner interface which offers the "raw" functionality. For
example, one can imagine a WeakReference based keys of a
ConcurrentHashMap that get expunged automatically by finalization
thread(s) after their referents become weakly reachable. As CHM
already holds the WeakReference keys, no separate maintainance of a
doubly-linked list of keys is needed.<br>
<br>
Public j.l.r.Finalizator is an example of a FinalReference based
j.l.r.Cleaner. While [Phantom|Weak|Soft]Cleaner is more safe since
it doesn't allow access to it's referent when fired, some JDK code
(File[Input|Output]Stream for example) still uses finalization which
would require more effort to be converted to for example
PhantomCleaner. Finalizator may allow easier transition of such code
as it allows keeping the finalization logic access to the finalizee
state. So perhaps it could be made package-private and exposed
through SharedSecrets for internal use only?<br>
<br>
Are any of the features of this prototype interesting for you and
worth pursuing further?<br>
<br>
Regards, Peter<br>
<br>
</body>
</html>