RFR: 8057632 - Remove auxiliary code used to handle the generations array

Fri Sep 19 13:18:43 UTC 2014

Hi Kim,

On Thu, 2014-09-18 at 14:56 -0400, Kim Barrett wrote:
> On Sep 4, 2014, at 7:54 PM, Jesper Wilhelmsson <jesper.wilhelmsson at oracle.com> wrote:
> > 
> > This is the next part of the generation array removal. I have split this change into several parts to ease the review. These webrevs build on top of each other.
> 
> Quoting from https://bugs.openjdk.java.net/browse/JDK-8055702
> 
>  Today GenCollectedHeap contains an array with room for ten
>  generations, and there is plenty of code that is written in a very
>  generic way to allow for this many generations. However, there is
>  also plenty of code that assumes we only use two generations and any
>  attempt to use more would fail with the current code. In practice we
>  only use two generations and there are no reason to maintain the
>  illusion that we could use more.
> 
> Has there been any discussion of whether removing support for many
> generations is indeed desirable, vs fixing the places where that
> possibility isn't properly supported?  It was a long time ago, and I

Only CMS and Serial GC use this abstraction, and only support two
generations. I do not know the exact background for why this is still
here.

> haven't even looked for references yet, but I vaguely recall seeing
> measurements showing some applications could benefit significantly
> from such.  I think it had to do with applications that undergo major
> phase changes, where substantial data structures lasted for much of
> the duration of a phase but then were dropped on those phase
> transitions - moving such data to some intermediate generation that
> was processed more often than "old" but less often than "young" was
> thought to be beneficial.  [It's also entirely possible that I'm
> completely misremembering, as this would have been from back in the
> early 1990s.]

You are correct that there is research about this.

More than the three generations (young, old, and until jdk8 perm) do not
seem to be practical.

The current framework only supports static sizing, which is already hard
for many people to get right with only the two/three generations. See
the bug tracker for requests to either automatize that or remove the
need for that.

It would be even harder with more generations, particularly as you
correctly point out that it would require the VM to detect program
phases, and dynamically size the generations on the fly to be efficient.
Both of which is not implemented.

Moving the generation boundaries is rather slow too (need to move data),
probably eating up a lot of the gains.

The young/old split works pretty well. Actually relatively long lived
data often accumulates in the "stable prefix" of the old gen, which
means it is skipped during compaction anyway.

This code only is like technical debt that regularly takes time either
when changing these collectors, or even only by requiring us to look at
false positives from static analysis tools.

> Maybe the answer to such an application now is G1?

It's pretty easy to define a set of regions as logically belonging to a
"generation". Implementing most of the gains (remembered sets etc) is
not very hard too.

I am sure there are implementations that do that for specialized
purposes out there.

Thanks,
 Thomas