Crash with Solr on CentOS 7.4
Ere Maijala
ere.maijala at helsinki.fi
Thu Nov 30 10:58:14 UTC 2017
Aleksey Shipilev kirjoitti 28.11.2017 klo 18.04:
> On 11/28/2017 04:47 PM, Ere Maijala wrote:
>> Thanks for the information. I'll update and add ShenandoahVerify to the config. Meanwhiel here is
>> the hs_err log: <https://gist.github.com/EreMaijala/28c60588e5ffd2998eb8513e9c36ad0f>
>
> From a brief look, it seems that native GC thread SEGVed on accessing some native data structure. I
> suspect that is a taskqueue or GCThread itself. fastdebug build would give a telling stack-trace and
> maybe even a good assert message:
> https://builds.shipilev.net/openjdk-shenandoah-jdk8-fastdebug/
I haven't had this happen again with the newer OpenJDK builds. Then
again, it only happened once on one of three nodes to begin with, so
it's probably quite rate.
I'm seeing results that are contradictory to my expectations regarding
Shenandoah, though. I've been running G1GC where full GC has been quite
rare, but when it happens, it's long enough to bring the SolrCloud
cluster down. With Shenandoah, when doing some indexing work with Solr,
I'm getting frequent Pause Fulls like this:
--------------
2017-11-29T18:47:08.141+0200: 31981.188: Total time for which
application threads were stopped: 0.0002862 seconds, Stopping threads
took: 0.0001059 seconds
Concurrent marking triggered. Free: 3485M, Free Threshold: 3491M;
Allocated: 19343M, Alloc Threshold: 0M
2017-11-29T18:47:11.884+0200: 31984.931: [Pause Init Mark, 60.475 ms]
2017-11-29T18:47:11.945+0200: 31984.991: Total time for which
application threads were stopped: 0.0607656 seconds, Stopping threads
took: 0.0000775 seconds
2017-11-29T18:47:11.945+0200: 31984.991: [Concurrent
marking2017-11-29T18:47:14.173+0200: 31987.220: Total time for which
application threads were stopped: 0.0222083 seconds, Stopping threads
took: 0.0220003 sec
onds
Cancelling concurrent GC: Allocation Failure
26G->29G(31G), 3202.487 ms]
Adjusting free threshold to: 21% (6666M)
2017-11-29T18:47:15.147+0200: 31988.194: [Pause Final MarkAdaptive CSet
selection: free target = 8253M, actual free = 13878M; min cset = 0M, max
cset = 10408M
Total Garbage: 22774M
Immediate Garbage: 13878M, 1821 regions (61% of total)
Garbage to be collected: 8336M (36% of total), 1135 regions
Live objects to be evacuated: 317M
Live/garbage ratio in collected regions: 3%
Free: 0M, 0 regions (0% of total)
Cancelling concurrent GC: Out of memory for evacuation
, 2107.541 ms]
2017-11-29T18:47:17.255+0200: 31990.302: Total time for which
application threads were stopped: 2.1078017 seconds, Stopping threads
took: 0.0000444 seconds
Concurrent marking triggered. Free: 6M, Free Threshold: 6666M;
Allocated: 11M, Alloc Threshold: 0M
Adjusting free threshold to: 41% (13015M)
2017-11-29T18:47:19.158+0200: 31992.205: [Pause Full 29G->3579M(31G),
10726.904 ms]
2017-11-29T18:47:29.885+0200: 32002.932: Total time for which
application threads were stopped: 12.6193783 seconds, Stopping threads
took: 0.0000324 seconds
Periodic GC triggered. Time since last GC: 305058 ms, Guaranteed
Interval: 300000 ms
2017-11-29T18:47:29.895+0200: 32002.942: [Pause Init Mark, 59.263 ms]
--------------
The heap is quite large for this workload, but lowering it caused the
Pause Fulls to become even more frequent. The example above is from the
old JVM from CentOS 7.4 but it happens also with a nightly build from
Nov 29. I'd be happy to test with any additional GC options or logging
options that work with Java 8 to see if they affect the results but
decided not turn the tuning knobs blindly.
--Ere
More information about the shenandoah-dev
mailing list