g1: dealing with high rates of inter-region pointer writes

Wed Dec 29 22:29:34 UTC 2010

> Hi Peter, thanks for the details (not all of which I admit I fdully grokked,
> but that can wait for later). [Perhaps you have a test case that you are
> able to
> share so we can make these observations locally and see if we can help
> address the performance anomaly you have unearthed?)

It is http://github.com/scode/httpgctest - but before anyone spends
time on this let me come up with a different test case that is better
(an actual LRU, and for both immutable and non-immutable cases). I'll
try to do that in time for next week.

(If someone does want to look at it anyway, more information is at
http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2010-May/000642.html)

> We'll be adding that information and perhaps that will help clarify some
> of these. I have an RFE out that will be examined by our G1 engineers
> soon.

Yes, thanks! I saw your other E-Mail but opted to not respond to it separately.

>> then also expected to have the effect I am seeing (regions being
>> artificially too expensive to collect, leading to them not being
>> collected). My main question is whether there is some mechanism in
>
> I do not see that conclusion following obviously from the first.
> It is true that in a linear regression model for evacuation cost,
> the coefficient for coarser regions will be higher. It will also,
> I am guessing, be more "rough" because of possible variability in the
> number of objects that actually contain pointers into the region that
> is being evacuated. So depending on the regression technique being used
> to estimate these coefficients, I believe it is plausible that the costs
> for evacaution may be sometimes overestinated (but equally often probably
> underestimated as well).

So to clarify: I'm not saying the estimates are wrong. Unless I am
misremembering completely, I tested it by disabling the efficiency
check in the g1 collection policy so force it to keep doing partials
more aggressively. The results indicated that the rs scan time was in
fact in the ballpark of the estimates.

So it is not a matter of estimate vs. reality; but rather that reality
is too expensive for a given region, because entire other regions need
to be scanned rather than a bunch of specific cards (and this is
correctly reflected in predictions for the region).

(Btw, playing with -XX:G1RSetSparseRegionEntries might be a good way
to test that the behavior goes away in the lack of overflows. However,
very large values don't seem to scale well (or it was just so large
that something bugged out) because the JVM started spinning for
extended periods of time, seemingly making no progress, when I set it
really high. Up to 500 or so seems to be working well enough to at
least see that it leads to fewer overflows.)

With respect to the actual rs scan times: I thought I posted this but
I cannot for the life of me find my own post about testing it. But a
throw-away patch I posted in relation to it is at
http://distfiles.scode.org/mlref/g1/g1_region_live_stats_hack.patch -
searching for 'scodetodo' you'll find that I disabled the requirement
or (get_gc_eff_factor() * cur_efficiency < predict_young_gc_eff()).
The result was that I got real scan times that were indeed large, and
partial collections where much more aggressive - but it did not help
those regions which were *alone* so expensive to collect that pause
time goals would not be achieved.

The patch also prints some more detailed information about where
predicted eviction times are coming from for regions whose estimate
exceeds 10 ms, but that just boils down to "yeah, it's rs scan cost"
which is expected in my scenario (limited liveness -> small copy cost,
and the region traversal cost should under healthy conditions always
be low).

> I believe the regression model depends on assumptions of uniformity or an
> averaging effect in the large. That assumption breaks down as region sizes
> become larger and pause-time budget becomes small. I do not know whether
> the model makes allowances as we scale along those dimensions, but I am
> guessing not.

So far I have anecdotally felt that prediction works pretty well, at
least in terms of the actual pause times seen during collections.
Collection pauses tend to be higher than asked for (on my particular
hardware etc), but relatively consistently so, such that it can be
corrected for by decreasing the target. Seems to be the case when I've
been doing stress testing as well as in stuff like running IntelliJ
with it, or Cassandra.

> As I said earlier abive, it might work out to be most efficient if you
> were somehow able to share a test case with us. I'll contact you off-line
> to see if that is possible, and if so open a performance bug for this issue,
> which will be evaluated once the G1 engineers are back from vacation next
> week.

See above, but again I'll try to come up with something better now
that there is active interest.

-- 
/ Peter Schuller