Parallel GC and array object layout: way off the base and laid out in reverse?

Fri Sep 6 21:16:09 UTC 2013

Duh. Noob mistake. PS doesn't of course use oop_iterate_backwards()! 
It's objArrayKlass::oop_push_contents(...) that I think you need to 
change (again, nice localized change though).

Tony

On 9/6/13 11:09 PM, Tony Printezis wrote:
> I agree with Igor. This might be OK for a quick experiment. However, 
> it'd be best to provide a backwards iteration method (there's already 
> an oop_iterate_backwards(), isn't there? can't you override that for 
> obj arrays)?
>
> Tony
>
> On 9/6/13 9:11 PM, Igor Veresov wrote:
>> It's probably not a good idea to tweak it like that. It will affect 
>> all the collectors, for example those that still use BFS. The proper 
>> way would be to provide the separate "backward" iteration methods 
>> like we have for objects. Or, for PS, do it locally and call the 
>> chunked copier function for small arrays as well.
>>
>> Btw, any change in performance?
>>
>> igor
>>
>> On Sep 6, 2013, at 4:56 AM, Aleksey Shipilev 
>> <aleksey.shipilev at oracle.com> wrote:
>>
>>> Igor's suggestion seem to only touch the work-stealing part. For small
>>> arrays, this should also be done:
>>>
>>> $ hg diff
>>> diff -r 428025878417 src/share/vm/oops/objArrayKlass.cpp
>>> --- a/src/share/vm/oops/objArrayKlass.cpp    Wed Sep 04 12:56:03 
>>> 2013 -0700
>>> +++ b/src/share/vm/oops/objArrayKlass.cpp    Fri Sep 06 15:45:14 
>>> 2013 +0400
>>> @@ -412,11 +412,11 @@
>>>
>>> #define ObjArrayKlass_SPECIALIZED_OOP_ITERATE(T, a, p, do_oop) \
>>> {                                   \
>>> -  T* p         = (T*)(a)->base();   \
>>> -  T* const end = p + (a)->length(); \
>>> -  while (p < end) {                 \
>>> +  T* const b = (T*)(a)->base();     \
>>> +  T* p       = b + (a)->length();   \
>>> +  while (b < p) {                   \
>>> +    p--;                            \
>>>      do_oop;                         \
>>> -    p++;                            \
>>>    }                                 \
>>> }
>>>
>>> ...and also in a few other relevant places.
>>>
>>> This very limited and untested change "fixes" the layout in the 
>>> original
>>> test. I have submitted CR 8024394 to track this.
>>>
>>> Thanks,
>>> -Aleksey.
>>>
>>>
>>> On 09/05/2013 12:50 AM, Igor Veresov wrote:
>>>> For PS, look in psPromotionManager.cpp, here the kernel you need to
>>>> trivially tweak:
>>>>
>>>> template <class T> void PSPromotionManager::process_array_chunk_work(
>>>>                                                  oop obj,
>>>>                                                  int start, int end) {
>>>>   assert(start <= end, "invariant");
>>>>   T* const base      = (T*)objArrayOop(obj)->base();
>>>>   T* p               = base + start;
>>>>   T* const chunk_end = base + end;
>>>>   while (p < chunk_end) {
>>>>     if (PSScavenge::should_scavenge(p)) {
>>>>       claim_or_forward_depth(p);
>>>>     }
>>>>     ++p;
>>>>   }
>>>> }
>>>>
>>>> Like Tony and Thomas said before, you'll still be seeing 
>>>> "surprises" due
>>>> to array chunking and work stealing. Those, I guess, you'll just 
>>>> have to
>>>> live with.
>>>>
>>>> igor
>>>>
>>>> On Sep 4, 2013, at 1:34 PM, Aleksey Shipilev
>>>> <aleksey.shipilev at oracle.com <mailto:aleksey.shipilev at oracle.com>> 
>>>> wrote:
>>>>
>>>>> Here you have it, thanks Igor.
>>>>> Any reference to the relevant block of code?
>>>>> I can probably try to fix this in background.
>>>>>
>>>>> -Aleksey.
>>>>>
>>>>> On 05.09.2013, at 0:00, Igor Veresov <iggy.veresov at gmail.com
>>>>> <mailto:iggy.veresov at gmail.com>> wrote:
>>>>>
>>>>>> Yup, that's a depth-first array-scanning quirk. The work-stealing is
>>>>>> done using stacks, so in order to have the first fields followed
>>>>>> first the references need to be put of stack in reverse. That's done
>>>>>> for regular objects but for arrays it's not.
>>>>>>
>>>>>> igor
>>>>>>
>>>>>> On Sep 4, 2013, at 12:51 PM, Aleksey Shipilev
>>>>>> <aleksey.shipilev at oracle.com 
>>>>>> <mailto:aleksey.shipilev at oracle.com>> wrote:
>>>>>>
>>>>>>> Hi Jon,
>>>>>>>
>>>>>>> On 09/04/2013 10:19 PM, Jon Masamitsu wrote:
>>>>>>>> I haven't followed this thread carefully enough but the ParallelGC
>>>>>>>> collector uses a depth-first traversal while the other 
>>>>>>>> collectors use
>>>>>>>> a breadth-first. Would that explain the difference?
>>>>>>> The referenced objects in the array are the leaves in reachability
>>>>>>> graph. I thought there is no difference in depth- vs. 
>>>>>>> breadth-first in
>>>>>>> this case? It looks more like we record the traversed objects on 
>>>>>>> some
>>>>>>> LIFO structure, which polls the elements in the reverse order.
>>>>>>>
>>>>>>> -Aleksey.
>

-- 
Tony Printezis | Staff Software Engineer | Twitter

@TonyPrintezis
tprintezis at twitter.com