Parallel GC and array object layout: way off the base and laid out in reverse?

Fri Sep 6 21:52:46 UTC 2013

Aha. And if you look further, you will hit exactly the same macro I had
changed ;) I know that because I have discovered this by exactly going
objArrayKlass::oop_push_contents(...) route.

I'll get the second Igor's suggestion a try some time later, along with
the performance runs.

Thanks,
-Aleksey.

On 09/07/2013 01:16 AM, Tony Printezis wrote:
> Duh. Noob mistake. PS doesn't of course use oop_iterate_backwards()!
> It's objArrayKlass::oop_push_contents(...) that I think you need to
> change (again, nice localized change though).
> 
> Tony
> 
> On 9/6/13 11:09 PM, Tony Printezis wrote:
>> I agree with Igor. This might be OK for a quick experiment. However,
>> it'd be best to provide a backwards iteration method (there's already
>> an oop_iterate_backwards(), isn't there? can't you override that for
>> obj arrays)?
>>
>> Tony
>>
>> On 9/6/13 9:11 PM, Igor Veresov wrote:
>>> It's probably not a good idea to tweak it like that. It will affect
>>> all the collectors, for example those that still use BFS. The proper
>>> way would be to provide the separate "backward" iteration methods
>>> like we have for objects. Or, for PS, do it locally and call the
>>> chunked copier function for small arrays as well.
>>>
>>> Btw, any change in performance?
>>>
>>> igor
>>>
>>> On Sep 6, 2013, at 4:56 AM, Aleksey Shipilev
>>> <aleksey.shipilev at oracle.com> wrote:
>>>
>>>> Igor's suggestion seem to only touch the work-stealing part. For small
>>>> arrays, this should also be done:
>>>>
>>>> $ hg diff
>>>> diff -r 428025878417 src/share/vm/oops/objArrayKlass.cpp
>>>> --- a/src/share/vm/oops/objArrayKlass.cpp    Wed Sep 04 12:56:03
>>>> 2013 -0700
>>>> +++ b/src/share/vm/oops/objArrayKlass.cpp    Fri Sep 06 15:45:14
>>>> 2013 +0400
>>>> @@ -412,11 +412,11 @@
>>>>
>>>> #define ObjArrayKlass_SPECIALIZED_OOP_ITERATE(T, a, p, do_oop) \
>>>> {                                   \
>>>> -  T* p         = (T*)(a)->base();   \
>>>> -  T* const end = p + (a)->length(); \
>>>> -  while (p < end) {                 \
>>>> +  T* const b = (T*)(a)->base();     \
>>>> +  T* p       = b + (a)->length();   \
>>>> +  while (b < p) {                   \
>>>> +    p--;                            \
>>>>      do_oop;                         \
>>>> -    p++;                            \
>>>>    }                                 \
>>>> }
>>>>
>>>> ...and also in a few other relevant places.
>>>>
>>>> This very limited and untested change "fixes" the layout in the
>>>> original
>>>> test. I have submitted CR 8024394 to track this.
>>>>
>>>> Thanks,
>>>> -Aleksey.
>>>>
>>>>
>>>> On 09/05/2013 12:50 AM, Igor Veresov wrote:
>>>>> For PS, look in psPromotionManager.cpp, here the kernel you need to
>>>>> trivially tweak:
>>>>>
>>>>> template <class T> void PSPromotionManager::process_array_chunk_work(
>>>>>                                                  oop obj,
>>>>>                                                  int start, int end) {
>>>>>   assert(start <= end, "invariant");
>>>>>   T* const base      = (T*)objArrayOop(obj)->base();
>>>>>   T* p               = base + start;
>>>>>   T* const chunk_end = base + end;
>>>>>   while (p < chunk_end) {
>>>>>     if (PSScavenge::should_scavenge(p)) {
>>>>>       claim_or_forward_depth(p);
>>>>>     }
>>>>>     ++p;
>>>>>   }
>>>>> }
>>>>>
>>>>> Like Tony and Thomas said before, you'll still be seeing
>>>>> "surprises" due
>>>>> to array chunking and work stealing. Those, I guess, you'll just
>>>>> have to
>>>>> live with.
>>>>>
>>>>> igor
>>>>>
>>>>> On Sep 4, 2013, at 1:34 PM, Aleksey Shipilev
>>>>> <aleksey.shipilev at oracle.com <mailto:aleksey.shipilev at oracle.com>>
>>>>> wrote:
>>>>>
>>>>>> Here you have it, thanks Igor.
>>>>>> Any reference to the relevant block of code?
>>>>>> I can probably try to fix this in background.
>>>>>>
>>>>>> -Aleksey.
>>>>>>
>>>>>> On 05.09.2013, at 0:00, Igor Veresov <iggy.veresov at gmail.com
>>>>>> <mailto:iggy.veresov at gmail.com>> wrote:
>>>>>>
>>>>>>> Yup, that's a depth-first array-scanning quirk. The work-stealing is
>>>>>>> done using stacks, so in order to have the first fields followed
>>>>>>> first the references need to be put of stack in reverse. That's done
>>>>>>> for regular objects but for arrays it's not.
>>>>>>>
>>>>>>> igor
>>>>>>>
>>>>>>> On Sep 4, 2013, at 12:51 PM, Aleksey Shipilev
>>>>>>> <aleksey.shipilev at oracle.com
>>>>>>> <mailto:aleksey.shipilev at oracle.com>> wrote:
>>>>>>>
>>>>>>>> Hi Jon,
>>>>>>>>
>>>>>>>> On 09/04/2013 10:19 PM, Jon Masamitsu wrote:
>>>>>>>>> I haven't followed this thread carefully enough but the ParallelGC
>>>>>>>>> collector uses a depth-first traversal while the other
>>>>>>>>> collectors use
>>>>>>>>> a breadth-first. Would that explain the difference?
>>>>>>>> The referenced objects in the array are the leaves in reachability
>>>>>>>> graph. I thought there is no difference in depth- vs.
>>>>>>>> breadth-first in
>>>>>>>> this case? It looks more like we record the traversed objects on
>>>>>>>> some
>>>>>>>> LIFO structure, which polls the elements in the reverse order.
>>>>>>>>
>>>>>>>> -Aleksey.
>>
>