[G1GC] Slow processing JNI Weak References

Keith L kelaban17 at gmail.com
Mon Oct 12 14:54:42 UTC 2015


I'm not too familiar with debugging native code, what is the best way to
try and find what objects are being weak/global referenced?

On Mon, Oct 12, 2015 at 10:48 AM Yu Zhang <yu.zhang at oracle.com> wrote:

> Keith,
>
> I am adding hotspot-gc-dev, hoping some one can give a better explanation.
>
> Yes, currently the JNI weak reference processing is single threaded.
> You mentioned JNI weak reference pause long for remark and mixed gc. For
> young gc, it is long as well. Just for young gc, the collected set is
> limited. My guess is you can not make much difference by tuning.
>
> There is a possibility for memory leak in the native code. A lot of Solr
> applications use native memory for cache.
>
> Thanks,
> Jenny
>
> On 10/7/2015 11:40 AM, Keith L wrote:
>
> Java:
> - JDK 1.8u40
> - UseG1GC, ParallelRefProcEnabled, Xmx12g,Xms12g
>
> Application:
> - Solr 4.10.4
>
> Application Details:
> Using Solr, which does not use JNI directly although it make heavy use of
> nio and MMaps.
> Possibly relevant is that we run our application with the yourkit agent
> with monitoring disabled by default.
>
> When using G1GC we are seeing very high processing times for JNI Weak
> Reference processing, in both GC Remak phase and during Mixed collections.
> Originally we saw high times during WeakReference processing but
> adding"-XX:+ParallelRefProcEnabled" flag did away with this. JNI Weak
> Reference processing we've seen take as much as 300+ seconds!, although
> more commonly we see it taking 20-60 seconds when it start becoming a
> problem. Currently JNI Weak Reference are not processed in parallel in
> HotSpot so adding the parallel flag has no affect.  (
> https://bugs.openjdk.java.net/browse/JDK-8072498)
>
> Generally we don't see the time for processing be a problem until a few
> days of uptime. At some point the JNI processing gets more expensive and
> time taken climbs up and is expensive nearly every there is a Remark phase.
>
> Possibly related: after a heap dump of a running application we noticed
> anywhere from a couple hundred to hundreds of thousands of unreferenced
> DirectByteBuffers.
>
> Doing a jstack I've seen as many 16 Million JNI Global References being
> reported. Although I haven't been able to find a way to get the JNI Weak
> Reference information to be displayed.
>
> Questions:
> 1) What is causing the JNI Weak References?
>   - Is it from nio? Usage of MMaps, DirectBuffers, etc?
> 2) Why does it become worse during the lifetime of the application?
>   - Is there a leak?
>     - are medium life objects being moved too early
>       - Tune G1NewSizePercent?
>       - Tune Survivor settings?
> 3) Are the JNI Weak References being visited but not cleaned during GC?
>   - Why not? tune G1HeapWastePercent?
> 4) How can more debugging be done to find where the references are coming
> from
>
>
> As of right now we are experimenting with different G1GC flags, however
> without understanding the root cause it sometimes requires days between
> experiments for the problem to present itself.
>
>
> We have not tuned any of the options above which disable the adaptive
> sizing policy, some options we are actively testing are:
>
> - Lowering InitiatingHeapOccupancyPercent
>  - To force dereferenced objects in tenured to be cleaned earlier.
>
> - Increasing MaxGCPauseMillis
>  - To encourage policy for the young generation size to be larger
> (discouraging premature promotion).
>
>
> Example of remark phases below (full gc log can be provided if required)
>
> One example log file (however we have seen worse), using flags in the
> beginning:
>
> 1348278.630: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
> Desired survivor size 41943040 bytes, new threshold 15 (max 15)
> - age   1:   36138128 bytes,   36138128 total
> 1348278.676: [SoftReference, 0 refs, 0.0086523 secs]1348278.684:
> [WeakReference, 7128 refs, 0.0047846 secs]1348278.689: [FinalReference, 414
> refs, 0.0063207 secs]1348278.695: [PhantomReference, 0 refs, 376 refs,
> 0.0129979 secs]1348278.709: [JNI Weak Reference, 0.1018689 secs], 0.1955299
> secs]
>    [Parallel Time: 39.7 ms, GC Workers: 23]
>       [GC Worker Start (ms): Min: 1348278633.2, Avg: 1348278633.7, Max:
> 1348278634.2, Diff: 1.0]
>       [Ext Root Scanning (ms): Min: 11.2, Avg: 11.7, Max: 14.9, Diff: 3.7,
> Sum: 269.2]
>       [Update RS (ms): Min: 9.9, Avg: 13.3, Max: 14.0, Diff: 4.1, Sum:
> 304.8]
>          [Processed Buffers: Min: 23, Avg: 66.9, Max: 117, Diff: 94, Sum:
> 1538]
>       [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6]
>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> Sum: 0.0]
>       [Object Copy (ms): Min: 11.8, Avg: 12.4, Max: 12.9, Diff: 1.1, Sum:
> 286.3]
>       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.3]
>       [GC Worker Other (ms): Min: 0.1, Avg: 0.3, Max: 0.7, Diff: 0.6, Sum:
> 6.2]
>       [GC Worker Total (ms): Min: 37.0, Avg: 37.7, Max: 38.6, Diff: 1.6,
> Sum: 867.4]
>       [GC Worker End (ms): Min: 1348278671.2, Avg: 1348278671.4, Max:
> 1348278671.8, Diff: 0.6]
>    [Code Root Fixup: 0.4 ms]
>    [Code Root Purge: 0.0 ms]
>    [Clear CT: 2.8 ms]
>    [Other: 152.6 ms]
>       [Choose CSet: 0.0 ms]
>       [Ref Proc: 137.2 ms]
>       [Ref Enq: 2.6 ms]
>       [Redirty Cards: 3.2 ms]
>       [Humongous Reclaim: 0.0 ms]
>       [Free CSet: 0.6 ms]
>    [Eden: 532.0M(532.0M)->0.0B(532.0M) Survivors: 80.0M->80.0M Heap:
> 6225.6M(12.0G)->5701.0M(12.0G)]
>  [Times: user=0.80 sys=0.04, real=0.20 secs]
> 1348278.827: [GC concurrent-root-region-scan-start]
> 1348278.828: Total time for which application threads were stopped:
> 0.2145574 seconds, Stopping threads took: 0.0012976 seconds
> 1348278.844: [GC concurrent-root-region-scan-end, 0.0170768 secs]
> 1348278.844: [GC concurrent-mark-start]
> 1348279.075: [GC concurrent-mark-end, 0.2306171 secs]
> 1348279.088: [GC remark 1348279.088: [Finalize Marking, 0.0051946 secs]
> 1348279.093: [GC ref-proc1348279.093: [SoftReference, 186 refs, 0.0089505
> secs]1348279.102: [WeakReference, 2668 refs, 0.0051839 secs]1348279.107:
> [FinalReference, 848 refs, 0.0088167 secs]1348279.116: [PhantomReference, 0
> refs, 3836 refs, 0.0130326 secs]1348279.129: [JNI Weak Reference,
> 18.9741270 secs], 19.0130579 secs] 1348298.106: [Unloading, 0.0177559
> secs], 19.0412558 secs]
>  [Times: user=19.59 sys=0.00, real=19.04 secs]
> 1348298.132: Total time for which application threads were stopped:
> 19.0566871 seconds, Stopping threads took: 0.0011100 seconds
> 1348298.147: [GC cleanup 5836M->4256M(12G), 0.0124121 secs]
>  [Times: user=0.11 sys=0.00, real=0.01 secs]
> ...
> 1348358.227: [GC remark 1348358.227: [Finalize Marking, 0.0042513 secs]
> 1348358.231: [GC ref-proc1348358.231: [SoftReference, 197 refs, 0.0085051
> secs]1348358.240: [WeakReference, 5962 refs, 0.0082517 secs]1348358.248:
> [FinalReference, 1881 refs, 0.0074058 secs]1348358.255: [PhantomReference,
> 0 refs, 3802 refs, 0.0116230 secs]1348358.267: [JNI Weak Reference,
> 29.3914835 secs], 29.4310975 secs] 1348387.662: [Unloading, 0.0179274
> secs], 29.4583167 secs]
> 1350907.355: [GC remark 1350907.355: [Finalize Marking, 0.0066394 secs]
> 1350907.362: [GC ref-proc1350907.362: [SoftReference, 189 refs, 0.0084702
> secs]1350907.370: [WeakReference, 2379 refs, 0.0046033 secs]1350907.375:
> [FinalReference, 779 refs, 0.0086055 secs]1350907.383: [PhantomReference, 0
> refs, 4349 refs, 0.0124574 secs]1350907.396: [JNI Weak Reference,
> 29.9550822 secs], 29.9923290 secs] 1350937.354: [Unloading, 0.0171185
> secs], 30.0213910 secs]
> ...
> ...
> ...
> 1377495.992: [GC remark 1377495.992: [Finalize Marking, 0.0039972 secs]
> 1377495.996: [GC ref-proc1377495.996: [SoftReference, 192 refs, 0.0084780
> secs]1377496.005: [WeakReference, 1171 refs, 0.0040591 secs]1377496.009:
> [FinalReference, 619 refs, 0.0068656 secs]1377496.016: [PhantomReference, 0
> refs, 4886 refs, 0.0094143 secs]1377496.025: [JNI Weak Reference,
> 48.3669099 secs], 48.3980356 secs] 1377544.394: [Unloading, 0.0159881
> secs], 48.4228951 secs]
> 1378034.868: [GC remark 1378034.868: [Finalize Marking, 0.0033629 secs]
> 1378034.871: [GC ref-proc1378034.871: [SoftReference, 193 refs, 0.0059063
> secs]1378034.877: [WeakReference, 1332 refs, 0.0036461 secs]1378034.881:
> [FinalReference, 611 refs, 0.0046526 secs]1378034.885: [PhantomReference, 0
> refs, 4254 refs, 0.0059437 secs]1378034.891: [JNI Weak Reference,
> 46.6392836 secs], 46.6624796 secs] 1378081.534: [Unloading, 0.0180802
> secs], 46.6893835 secs]
> 1378677.326: [GC remark 1378677.326: [Finalize Marking, 0.0039769 secs]
> 1378677.330: [GC ref-proc1378677.330: [SoftReference, 193 refs, 0.0080377
> secs]1378677.338: [WeakReference, 1719 refs, 0.0035972 secs]1378677.341:
> [FinalReference, 351 refs, 0.0063767 secs]1378677.348: [PhantomReference, 0
> refs, 4753 refs, 0.0103290 secs]1378677.358: [JNI Weak Reference,
> 43.9230851 secs], 43.9537699 secs] 1378721.284: [Unloading, 0.0150821
> secs], 43.9780128 secs]
> 1379626.009: [GC remark 1379626.009: [Finalize Marking, 0.0040175 secs]
> 1379626.013: [GC ref-proc1379626.013: [SoftReference, 186 refs, 0.0080680
> secs]1379626.021: [WeakReference, 955 refs, 0.0064521 secs]1379626.028:
> [FinalReference, 567 refs, 0.0098877 secs]1379626.038: [PhantomReference, 0
> refs, 4701 refs, 0.0104484 secs]1379626.048: [JNI Weak Reference,
> 51.3181908 secs], 51.3558998 secs] 1379677.369: [Unloading, 0.0207052
> secs], 51.3856987 secs]
> ...
> 1380251.013: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
> Desired survivor size 41943040 bytes, new threshold 1 (max 15)
> - age   1:   46612512 bytes,   46612512 total
> [SoftReference, 0 refs, 0.0074261 secs]1380251.048: [WeakReference, 5189
> refs, 0.0041909 secs]1380251.052: [FinalReference, 393 refs, 0.0042160
> secs]1380251.056: [PhantomReference, 0 refs, 83 refs, 0.0074198
> secs]1380251.064: [JNI Weak Reference, 3.0041020 secs], 3.0678505 secs]
>    [Parallel Time: 23.5 ms, GC Workers: 23]
>       [GC Worker Start (ms): Min: 1380251014.8, Avg: 1380251015.3, Max:
> 1380251015.9, Diff: 1.1]
>       [Ext Root Scanning (ms): Min: 6.7, Avg: 7.5, Max: 10.9, Diff: 4.2,
> Sum: 173.4]
>       [Update RS (ms): Min: 3.7, Avg: 6.5, Max: 7.0, Diff: 3.3, Sum: 150.5]
>          [Processed Buffers: Min: 27, Avg: 66.7, Max: 112, Diff: 85, Sum:
> 1534]
>       [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6]
>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> Sum: 0.0]
>       [Object Copy (ms): Min: 6.3, Avg: 6.7, Max: 7.3, Diff: 1.0, Sum:
> 154.9]
>       [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
>       [GC Worker Other (ms): Min: 0.1, Avg: 0.4, Max: 0.8, Diff: 0.8, Sum:
> 8.3]
>       [GC Worker Total (ms): Min: 20.4, Avg: 21.2, Max: 21.8, Diff: 1.4,
> Sum: 487.7]
>       [GC Worker End (ms): Min: 1380251036.2, Avg: 1380251036.5, Max:
> 1380251037.0, Diff: 0.8]
>    [Code Root Fixup: 0.4 ms]
>    [Code Root Purge: 0.0 ms]
>    [Clear CT: 2.9 ms]
>    [Other: 3041.1 ms]
>       [Choose CSet: 0.0 ms]
>       [Ref Proc: 3029.5 ms]
>       [Ref Enq: 2.7 ms]
>       [Redirty Cards: 2.7 ms]
>       [Humongous Reclaim: 0.0 ms]
>       [Free CSet: 0.3 ms]
>    [Eden: 532.0M(532.0M)->0.0B(548.0M) Survivors: 80.0M->64.0M Heap:
> 6142.5M(12.0G)->5620.7M(12.0G)]
>  [Times: user=3.46 sys=0.00, real=3.07 secs]
> 1380254.081: [GC concurrent-root-region-scan-start]
> 1380254.082: Total time for which application threads were stopped:
> 3.0847197 seconds, Stopping threads took: 0.0011519 seconds
> 1380254.092: [GC concurrent-root-region-scan-end, 0.0105392 secs]
> 1380254.092: [GC concurrent-mark-start]
> 1380254.275: [GC concurrent-mark-end, 0.1832853 secs]
> 1380254.289: [GC remark 1380254.290: [Finalize Marking, 0.0039580 secs]
> 1380254.294: [GC ref-proc1380254.294: [SoftReference, 183 refs, 0.0068857
> secs]1380254.300: [WeakReference, 1384 refs, 0.0043966 secs]1380254.305:
> [FinalReference, 579 refs, 0.0073469 secs]1380254.312: [PhantomReference, 0
> refs, 4327 refs, 0.0077366 secs]1380254.320: [JNI Weak Reference,
> 38.7211438 secs], 38.7506058 secs] 1380293.044: [Unloading, 0.0174879
> secs], 38.7765019 secs]
>  [Times: user=39.37 sys=0.00, real=38.78 secs]
> 1380293.067: Total time for which application threads were stopped:
> 38.7917823 seconds, Stopping threads took: 0.0011561 seconds
> 1380293.077: [GC cleanup 5659M->4167M(12G), 0.0120391 secs]
>  [Times: user=0.11 sys=0.00, real=0.01 secs]
>
>
>
> _______________________________________________
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20151012/ae6f54f0/attachment.html>


More information about the hotspot-gc-use mailing list