RFR: 8134963 [Newtests] New tests for G1 remembered set up and down

Mon Nov 23 13:03:58 UTC 2015

Hi Dmitry,

On Fri, 2015-11-20 at 16:06 +0300, Dmitry Fazunenko wrote:
> Hi Thomas,
> 
> Thanks a lot, that you found time to look that code. Appreciate it very 
> much.
> Please see the comments inline
> 
> On 20.11.2015 13:41, Thomas Schatzl wrote:
> > Hi Dmitry,
> >
> > On Fri, 2015-11-13 at 18:15 +0300, Dmitry Fazunenko wrote:
> >> Hi everyone,
> >>
> >> Can I have a couple of reviews for the new test for G1 Remembered Set:
> >>
> >> https://bugs.openjdk.java.net/browse/JDK-8134963
> >> http://cr.openjdk.java.net/~dfazunen/8134963/
> >>
> >> The test creates a lot of cross region references (making RSet to grow
> >> up) and then clear that references up.
> >> The test doesn't check any particular assertions, it just expects that
> >> there will be no crash.
> > Some comments:
> >
> >    - the test does not stress the remembered set operations well:
> >
> >       - runtime is too short. In my test runs, not a single remembered
> > set cleanup occurs, i.e. the remembered set will never shrink.
> May be I need to play with a build where a new remsets is implemented.
> My assumption was that the new remsets implementation tracks not only 
> adding cross region references, but their removing as well:
> 
> objectInRegion1.field = objectInRegion2;  // region1 ---> region2
> objectInRegion1.field = null;                       // region1 -X-> 
> region2  (if there is no other references)
> 
> Perhaps, my assumption is wrong.

Yes, the assumption has been wrong. The main driver for this work is to
increase the average performance and decrease memory consumption to stay
at a much better perf/memory usage trade-off.

Current rsets tend to be either very slow without being really
memory-efficient, or very large on quite a few applications.

Adding null-write barriers would also significantly decrease performance
for many reasons.

> In this case, which event should 
> trigger rescan of the remembered set?

Marking is the usual trigger, eg. heap occupancy. The heap is full
enough in this case, but there is too little time to complete them.

> In my test there are no allocations during up and down phases, therefore 
> no GC.

Changing heap occupancy is also a good idea, as heap occupancy drives
remembered set sizes.

> >       - in general, due to the short runtime, g1 seems to be warming up.
> 
> The test allocates ~90% of heap. Is it still not enough to warm up?

Warm up as in generating a representative set of cross-references.
(Old-gen) Evacuations tend to mix up the remembered sets, and also cause
queuing of remembered set work.

The collector should of course not mess up the remembered sets.

> >       - the test does not stress the need for remsets being MT safe at
> > all. I.e. the main program uses one thread.
> Valid point. I think it should another (similar) test using MT.

Not sure if it is worth having two stress tests with potentially a
considerable runtime (i.e. in minutes). If the remembered set does not
handle multiple threads, the implementation is useless.

Unit tests should be sufficient to test single-thread operation. And
they can be made much more targeted to test certain patterns.

Also, if you needed a "ST test", you could just run the MT one with a
single mutator thread.

> >    - accidentally the test has been run in some jprt jobs. I got a few
> > out-of-native memory errors running it (that may be due to some other
> > changes in that build though).
> The test is moved to test/stress/
>    http://cr.openjdk.java.net/~dfazunen/8134963/webrev.02/
> To not be run in jprt.
> 
> Maybe the fact that it caught out-of-native memory errors proves that 
> the test is good, because it catches issues?

To me this just seemed to be a configuration problem with JPRT machines
(and the remset taking lots of memory if stressed), but I did not
investigate, and it's not important.

> > So I think while the test may be correct, it just does not seem to do a
> > good job on its task.
> >
> > I frequently use a test based on a contribution years ago (well,
> > basically a verbatim copy) when working on remembered sets (see the
> > attachment at
> > http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2012-July/004740.html),
> >running it for a few minutes. Of course, it is a lot more demanding
> >than thetest in this change. Maybe it can be scaled back to a useful size,
> >and potentially the object size used there varied a little.
> 
> Do you think that the test (TestUpAndDownRSet) is totally useless? Or it 
> makes sense to implement enhancement you suggested?

No. The existing test does not seem to run long enough, and actually
test enough application patterns.

Like:
 - "slow" ramping up of remembered set with "slow" decline of remembered
set size.
 - "fast" ramping up of remembered sets with "fast" decline of remembere
set
 - significant amount of perturbation of remembered sets by the GC
moving around data.
 - one/many thread(s) adding remset entries, one/many thread(s)
"removing" them
 - combinations of the above.

As mentioned, in the few runs I did for this test, not even a single
marking round completed, i.e. not any actual removal of remset entries
from the remsets.

They are just added and then made stale. I can almost assure you that
after a few of those 1000 rounds the test does that, the remembered sets
already contain so many entries that none are actually added any more.

> I looked at the BigRamTester you mentioned. It looks like a regular test 
> consuming a certain amount of heap. It's not clear to me why this test 
> is going to stress a new RemSet.

- it creates a lot of remembered set entries across the whole heap by
virtue of the (large) LRU cache.

- evacuations mix up the heap frequently, creating new remembered set
entries (e.g. to make sure that evacuations cause missed entries).

- the LRU cache makes sure that objects die frequently, so that
remembered set entries go stale. These entries are removed over time
(after marking).

- remembered sets contain lots and lots of stale entries.

BigRamTester is not multi-threaded either. But that can be something
added for a dedicated stress test. If BigRamTester runs long enough, GC
times (and overall remembered set sizes) stabilize, indicating that
there are distinct remembered set access patterns before and after. Both
are imo worth checking.

The existing remembered set implementation has the flaw that it does not
scale back used memory with less actual remembered set entries. It will
mostly just grow larger and larger. One goal of the new remembered sets
is to fix that, but this of course requires more (memory) management
activity, which is important to stress.

A designed stress test should at least exhibit above bullet points.

Thanks,
  Thomas