Request for review: 6976350 G1: deal with fragmentation while copying objects during GC

Tue Feb 26 19:01:22 UTC 2013

Hi Bengt,

When I was refactoring to get another layer of abstraction, I found 
there would be many duplicated naming and similar functionality in the 
existing class G1ParGCAllocBuffer and the new class 
MultiG1ParGCAllocBuffer, surely, in different abstraction level (e.g. 
allocate(), words_remaining(), retire(), set_word_size(), etc.).

I'd like to know whether we are talking about the same idea. What do you 
have in your mind to lay out the abstraction?

FYI, i prototyped the abstraction anyway as in diff.

Thanks.
Tao

diff --git a/src/share/vm/gc_implementation/g1/g1CollectedHeap.hpp 
b/src/share/vm/gc_implementation/g1/g1CollectedHeap.hpp
--- a/src/share/vm/gc_implementation/g1/g1CollectedHeap.hpp
+++ b/src/share/vm/gc_implementation/g1/g1CollectedHeap.hpp
@@ -1745,6 +1745,25 @@
    }
  };

+class MultiG1ParGCAllocBuffer {
+private:
+  enum GCAllocSeq {
+    GCAllocSeqStart = 0,
+    GCAllocSeq1 = GCAllocSeqStart,
+    GCAllocSeq2 = GCAllocSeqStart + 1,
+    GCAllocSeqCount = GCAllocSeqStart + 2
+  };
+  G1ParGCAllocBuffer _bufffer_sequence[GCAllocSeqStart];
+  GCAllocSeq _primary_buffer;
+public:
+  HeapWord* allocate(size_t word_sz);
+  size_t words_remaining();
+  void retire(bool end_of_gc, bool retain);
+  void set_word_size(size_t new_word_sz);
+  void set_buf(HeapWord* buf);
+  void switch_buf();
+};
+
  class G1ParScanThreadState : public StackObj {
  protected:
    G1CollectedHeap* _g1h;
@@ -1753,9 +1772,9 @@
    CardTableModRefBS* _ct_bs;
    G1RemSet* _g1_rem;

-  G1ParGCAllocBuffer  _surviving_alloc_buffer;
-  G1ParGCAllocBuffer  _tenured_alloc_buffer;
-  G1ParGCAllocBuffer* _alloc_buffers[GCAllocPurposeCount];
+  MultiG1ParGCAllocBuffer  _surviving_alloc_buffer;
+  MultiG1ParGCAllocBuffer  _tenured_alloc_buffer;
+  MultiG1ParGCAllocBuffer* _alloc_buffers[GCAllocPurposeCount];
    ageTable            _age_table;

    size_t           _alloc_buffer_waste;
@@ -1819,7 +1838,7 @@
    RefToScanQueue*   refs()            { return _refs;             }
    ageTable*         age_table()       { return &_age_table;       }

-  G1ParGCAllocBuffer* alloc_buffer(GCAllocPurpose purpose) {
+  MultiG1ParGCAllocBuffer* alloc_buffer(GCAllocPurpose purpose) {
      return _alloc_buffers[purpose];
    }

@@ -1849,7 +1868,7 @@
      HeapWord* obj = NULL;
      size_t gclab_word_size = _g1h->desired_plab_sz(purpose);
      if (word_sz * 100 < gclab_word_size * ParallelGCBufferWastePct) {
-      G1ParGCAllocBuffer* alloc_buf = alloc_buffer(purpose);
+      MultiG1ParGCAllocBuffer* alloc_buf = alloc_buffer(purpose);
        add_to_alloc_buffer_waste(alloc_buf->words_remaining());
        alloc_buf->retire(false /* end_of_gc */, false /* retain */);

@@ -1858,6 +1877,7 @@
        // Otherwise.
        alloc_buf->set_word_size(gclab_word_size);
        alloc_buf->set_buf(buf);
+      alloc_buf->switch_buf();

        obj = alloc_buf->allocate(word_sz);
        assert(obj != NULL, "buffer was definitely big enough...");



On 2/11/13 7:39 AM, Bengt Rutisson wrote:
>
> Hi Tao,
>
> On 2/8/13 1:31 AM, Tao Mao wrote:
>> a new webrev updated
>> http://cr.openjdk.java.net/~tamao/6976350/webrev.02/
>>
>> Jon, two suggestions are implemented.
>> (1) 2-dimensional buffers
>> (2) making for loop iterators more readable (But, the iterators are 
>> still defined as int as you can see; I have to override operator + 
>> for the two enum's. It's worth doing so. Maybe not.)
>>
>> Bengt,
>> This seems a good example for discussion on refactoring, as we tried 
>> to initiate at the offsite meeting.
>> "My turn": i can imagine that you wanted to wrap up 
>> buffers_for_tenured and buffers_for_survived, and don't expose the 
>> array structure when you acquire a buffer/buffers.
>
> Yes, this sounds about right. I think you got what I was thinking.
>
>> With the new design, what do you want to achieve? Clearer data 
>> structure? Readability? Extensibility? Or, other benefits?
>
> All of the above. :) I think this will look cleaner and be more 
> readable and sustainable.
>
> Best would be if you can put together an example of what it would look 
> like and we can compare the webrevs.
>
> Thanks,
> Bengt
>
>>
>> Thank you all.
>> Tao
>>
>>
>> On 2/7/2013 9:05 AM, Jon Masamitsu wrote:
>>>
>>>
>>> On 02/06/13 17:11, Tao Mao wrote:
>>>> Please see inline, Jon. Thanks.
>>>> Tao
>>>>
>>>> On 2/4/2013 9:46 PM, Jon Masamitsu wrote:
>>>>>
>>>>>
>>>>> On 2/4/2013 12:56 PM, Tao Mao wrote:
>>>>>> a new webrev created (I need more reviews!)
>>>>>> http://cr.openjdk.java.net/~tamao/6976350/webrev.01/
>>>>>>
>>>>>> I took Jon's suggestion for bracketing for loops.
>>>>>>
>>>>>> For the rest of suggestions, I thought of them when coding but I 
>>>>>> also had another concern to myself not to have them. So, I'd like 
>>>>>> to discuss before I do further changes.
>>>>>>
>>>>>> Please see inline and provide some insights on my concerns.
>>>>>>
>>>>>> Thank you.
>>>>>> Tao
>>>>>>
>>>>>> On 2/1/2013 2:58 PM, Jon Masamitsu wrote:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~tamao/6976350/webrev.00/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp.frames.html 
>>>>>>>
>>>>>>>
>>>>>>> line 4386
>>>>>>>
>>>>>>> Please add the brackets for the outer for loop.  I'm not used to 
>>>>>>> seeing it
>>>>>>> without the brackets even though they are not strictly needed.
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~tamao/6976350/webrev.00/src/share/vm/gc_implementation/g1/g1CollectedHeap.hpp.frames.html 
>>>>>>>
>>>>>>>
>>>>>>> line 1821 same comment about brackets for for-statement.
>>>>>>>
>>>>>>> line 1869
>>>>>>>
>>>>>>> You always retire alloc_buf1.  Did you consider retiring the 
>>>>>>> buffer that has the
>>>>>>> less words_remaining?
>>>>>> Why not to leverage info of words_remaining:
>>>>>> 1) simplicity.
>>>>>> 2) The current mechanism to use the alloc_buffers in order (in 
>>>>>> routine allocate() ) makes alloc_buf1 stochastically filled up 
>>>>>> first.
>>>>>> 3) The current changeset will make the situation better. Do we 
>>>>>> want an additional overhead (getting words_remaining) to trade 
>>>>>> for a little more benefits? Maybe...
>>>>>
>>>>> Ok.  In practical  terms you're probably right that the benefit 
>>>>> would be small.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> line 1762
>>>>>>>
>>>>>>> Can you make _alloc_buffers two dimensional?  Then you wouldn't 
>>>>>>> need the
>>>>>>> buf_idx() method.
>>>>>> I rarely saw any 2D array implementation at least within GC code 
>>>>>> base. I didn't know whether this was a coincidence or some 
>>>>>> "convention". So I was some reluctant to do so.
>>>>>
>>>>> I don't think there is a convention that says we shouldn't use 
>>>>> multidimensional arrays.
>>>>>
>>>>> If I'm wrong, someone please let me know.
>>>> I will make the array two-dimensional, if no one opposes to it or 
>>>> any convention rules here.
>>>
>>> Thanks.
>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> GCAllocPriority is an enum but you use them as integers in the 
>>>>>>> for loops.
>>>>>>> Can you make the for loop index variable an enum GCAllocPriority?
>>>>>> Making enum GCAllocPriority as for loop index is not a big deal. 
>>>>>> But the problem is that, for consistency, we should make 
>>>>>> GCAllocPurpose as for loop index as well, which has values of 
>>>>>> GCAllocForTenured and GCAllocForSurvived, and don't make as much 
>>>>>> sense as GCAllocPriority1 and GCAllocPriority2 do.
>>>>>
>>>>> True.   GCAllocPurpose's don't have an order the way the 
>>>>> GCAllocPriority
>>>>> but I think the code ends up more readable.
>>>> Then, the current two-layer for loop
>>>>
>>>> for (int ap = 0; ap < GCAllocPurposeCount; ++ap) {
>>>>   for (int pr = 0; pr < GCAllocPriorityCount; ++pr) {
>>>>   }
>>>> }
>>>>
>>>> would become
>>>>
>>>> for (int ap = GCAllocForTenured; ap < GCAllocPurposeCount; ++ap) {
>>>>   for (int pr = GCAllocPriority1; pr < GCAllocPriorityCount; ++pr) {
>>>>   }
>>>> }
>>>
>>> Wouldn't it become
>>>
>>> for (GCAllocPurpose ap = GCAllocPurposeStart; ap < 
>>> GCAllocPurposeCount; ++ap) {
>>>   for (GCAllocPriority pr = GCAllocPriorityStart: pr < 
>>> GCAllocPriorityCount; ++pr) {
>>>
>>> Where you would have to add the start values.  For example
>>>
>>> 1745   enum GCAllocPriority {
>>>          GCAllocPriorityStart=0,
>>> 1746     GCAllocPriority1 =GCAllocPriorityStart,
>>> 1747     GCAllocPriority2 =GCAllocPriority1  + 1,
>>> 1748     GCAllocPriorityCount =GCAllocPriority2  + 1
>>>        };
>>>
>>>
>>> Jon
>>>>
>>>> Do you still see the initial value for iterator "ap" is more 
>>>> readable? Or, you get puzzled about whether it's the true starting 
>>>> value?
>>>>
>>>>>
>>>>> Jon
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Jon
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 1/28/2013 12:21 PM, Tao Mao wrote:
>>>>>>>> 6976350 G1: deal with fragmentation while copying objects 
>>>>>>>> during GC
>>>>>>>> https://jbs.oracle.com/bugs/browse/JDK-6976350
>>>>>>>>
>>>>>>>> webrev:
>>>>>>>> http://cr.openjdk.java.net/~tamao/6976350/webrev.00/
>>>>>>>>
>>>>>>>> changeset:
>>>>>>>> Basically, we want to reuse more of par-allocation buffers 
>>>>>>>> instead of retiring it immediately when it encounters an object 
>>>>>>>> larger than its remaining part.
>>>>>>>>
>>>>>>>> (1) instead of previously using one allocation buffer per GC 
>>>>>>>> purpose, we use N(=2) buffers per GC purpose and modify the 
>>>>>>>> corresponding code. The changeset would easily scale up to 
>>>>>>>> whatever N (though Tony Printezis suggests 2, or 3 may be good 
>>>>>>>> enough)
>>>>>>>>
>>>>>>>> *(2) Two places of cleanup: allocate_during_gc_slow() is 
>>>>>>>> removed due to its never being called.
>>>>>>>>                                               access modifier 
>>>>>>>> (public) before trim_queue() is redundant.
>>>>>>>>
>>>>>>>>
>