RFR(M): 7163191: G1: introduce a "heap spanning table" abstraction

Thu Feb 21 08:50:58 UTC 2013

Hi,

  (cc'ed hotspot-gc-dev because it belongs there)

On Wed, 2013-02-20 at 15:51 -0800, John Cuthbertson wrote:
Hi Thomas,
> 
> Thanks (again)  for looking at the code changes. Detailed responses 
> inline...
> 
> >    - iirc in other gc code the naming of the "bump pointer"
> > allocation related pointers is bottom/top/end, not
> > bottom/end/max_end.
> 
> max_end is a high water mark. Pointer bumps are used when allocating 
> from a contiguous block of heap - not the heap itself. Remember with
> the heap we have the amount of space reserved for the heap and amount
> of space actually committed and that can change over the length of the
> JVM as we expand and shrink the heap.
> 

It is okay, I'm somewhat neutral on that: I was just thinking that
within this table, what you do is also some sort of allocating from the
heap - just in region quantities. Whether you allow backwards-movement
of the top pointer is a minor quirk imo.

My analogy may not be that good. :)

>    - not sure about that right now, but iirc the typical naming for
> the total/max size of arrays in hotspot code was "_size", not
> "_max_length".
> > While I know that heapRegionSeq originally used that, if you grep
> > the hotspot code for "_size" and "_max_length" you'll see that
> > mostly "_size" is used.
> 
> IME size has always been related to the memory size of a thing (i.e.
> how many words, how many bytes); length is the number of elements.
> Length is correct here.

Okay.

> > One option is to introduce DefaultValue only in the
> > G1HeapSpanningTable<T>, and subclasses must pass it in the
> > initialize method; maybe you could add a default value of "0" in the
> > method signature for that parameter. Another one is a virtual method
> > that returns an instance of DefaultValue and must be overriden by
> > descendents.
> 
> I like the idea of using the virtual method that gets overridden by 
> subclasses. If you are following the comments from Bengt he dislikes
> the inheritance relationship here. I'll have to check the patch that
> depends on this to see how much a pain merging it would be if
> inheritance was not used.
> 

I think it might be good to give an updated webrev so that everyone can
have a look at the current state.

> >    - the "shift_by" argument to initialize is superfluous, isn't it?
> > It can be derived from elem_size_bytes I assume.
> >
> >    - I'm not convinced that storing _elem_size_bytes buys you
> > anything, also in terms of convenience. It's only use is in
> > array_size_bytes() as a multiplicand. Why not use shift_by here?
> > This saves the extra maintenance and calculating/checking that value
> > everywhere. (It's also faster, but that's not the point here I
> > guess).
> 
> Not quite. For example the _elem_size_bytes for HeapRegioSeq would be
> 4 or 8 (size of HeapRegion*) while the shift_by amount would be 
> HeapRegion::LogOfHRGrainBytes. Better to keep them separate.

Sure, sorry, my misunderstanding. With the merge of the two classes,
sizeof(T) can be now used in the initializer instead of passing
elem_size_bytes.

> >    - I am not completely sure why the G1HeapSpanningTable<T> does
> > not already provide the biased pointer. I assume that every
> > descendent will want to use it for performance reasons. The
> > implemented HeapRegionSeq already does, and the fast cset table will
> > as well (it has "fast" in its name). I think that all these tables
> > are primarily used for fast lookup.
> > So why not put that functionality in G1HeapSpanningTable<T> for
> > everyone to use?
> 
> Not sure. But there's no real need to store it. It is easily
> calculated.

The point I wanted to make is that every subclass we know so far seems
to need it, so every subclass is going to implement that functionality
again. Even if it is easy to do, that seems unnecessary code
duplication.

E.g. the cset_fast_test uses the biased pointers in in_cset_fast_test()
that is used in the oop closures. Oop closures are typically a place
that are performance sensitive.

> >    - the code (at least HeapRegionSeq) allocates memory from the C
> > heap.
> > My personal guideline for such classes is to provide a virtual
> > destructor that automatically deallocates that memory it allocates.
> > Even if it is known that the class using it (G1CollectedHeap) is
> > never deinitialized anywhere.
> 
> There's no real policy for this in Hotspot. I've done both and
> removed both (at the request of reviewers). :)

:)

> >    - also not really sure that the watermark is really useful. It
> > implies that allocation in this "array" is always in a bump-pointer
> > style fashion. G1 may allocate/deallocate/map/unmap any region at
> > any time. So determining whether the access is to a valid array
> > element or not simply cannot be done at this place. Additionally
> > *all* code updating the size or accessing the array already has to
> > do that validity checking anyway at upper layers. Imho.
> 
> I didn't get pointer-bump from this use. There are (or were) a couple
> of places where a high water mark was used and then used in checks.

I meant, that a single watermark validates whether a region is within
[0, high_watermark[. I.e. if you check validity that way, you (imo)
implicitly state that all regions within that area are valid (and
accessible) when looking at that code.

Further my argument was that G1 may commit/uncommit any region within
the area; so a single watermark is not sufficient.

I had a look at g1 memory sizing, G1 does not seem to give back memory
to the OS like that at the moment so this is not an immediate problem.

However this imo needlessly complicates code in g1 because it needs to
preferentially allocate in lower address regions to be able to give back
memory to the OS (sort the free region list or so).

I am not sure if G1 actually manages to give back memory to the OS
because of that in a meaningful way.

Thomas