Integration of ParralellGC and CMS

Thu Jan 27 17:51:21 UTC 2011

Right. I'd suggest following precedent in this regard in some
earlier JVM's, and build up the remembered set for the space
being defragmented as part of the marking and precleaning cycles,
then rely on that information, along with the ModUnionTable to
reduce the cost of identifying pointers needing to be updated
when doing the actual compaction. Further, the defrag
should be run on a periodic basis based upon fragmentation metrics
collected at the most recent previous cycle.

As someone commented earlier, however, the existence of G1 probably
makes this CMS feature more of an academic/learning exercise with
no immediately obvious advantages vis-a-vis G1, thus reducing
greatly (IMHO) its chances of eventual productization (although a
legacy CMS user-base might well be an argument in favour of doing
something like this, but recall Jon's comment about opportunity cost
at least in the product group here; but the OpenJDK community is
much more than just the engineers that work at Oracle or other
direct OpenJDK code committers). Regardless of all that,
it might be a useful Master's project nonetheless, and who knows
you might learn a new thing or two that we didn't know about
(or had known and had forgotten about) CMS and incremental
compaction, in the process. So my advice to Clemens is, make sure to
look at the existing literature (you'll probably find a paper or two of past JVM's that
may have had those incremental defrag capabilities, study them), then see
what you can do in the context of HotSpot/CMS; go for it if you find it interesting,
but don't let eventual or guaranteed productization of what you build be
too important a factor in your choice of project :-) .

$0.02.
-- ramki

On 01/27/11 08:55, Tony Printezis wrote:
> Clemens,
> 
> Clemens Eisserer wrote:
>> If the stw compaction phase would be executed after a "normal" CMS
>> run, couldn't the marking results of the concurrent marking phase be
>> used? All created objects since then could be simply treated as alive.
>> Probably moving all objects (live or dead) would be an option too, but
>> of course would mean more work and worse compaction.
>>   
> 
> (If I understand your intent correctly and to expand a bit on what Jon 
> said) Simply relying on the marking information to compact objects (i.e. 
> move the live ones) is not sufficient. Finding which objects to move is 
> one part of this. But the harder part is to make sure that you find all 
> the references to the objects that are being moved and update them. If 
> you want to move a subset of the marked objects, unless you somehow keep 
> track of where the references into that subset are, you'd have to scan 
> the entire heap to to the reference updating and this can take a very 
> long time. If instead you want to perform a full parallel compaction at 
> the end of the CMS marking cycle using the marking information that CMS 
> obtained, then this stop-the-world Full GC pause could be shorter than 
> what say ParallelOldGC does (given that it does need to do the marking 
> phase). But it would be at the same order of magnitude as a 
> ParallelOldGC Full GC, so not a huge improvement. Typically initial-mark 
> / remark pauses in CMS are considerably shorter than Full GCs.
> 
> Anyway, hope this helps,
> 
> Tony
> 
>>>  Look at the
>>> serial old gen collector if you're interested in this.  The 
>>> ParallelOldGC
>>> will be somewhat more complicated.  We're not particularly
>>> interested in it because that's what G1 does.
>>>     
>>
>> Hmm, if it would work well and the code would be clean, small and
>> maintainable  - would there be a chance for integration?
>> I asked here, because it would be great if the stuff i produce during
>> working on a master thesis could be useful ;)
>>
>> Thanks, Clemens
>>