Parallel GC and array object layout: way off the base and laid out in reverse?

Igor Veresov iggy.veresov at gmail.com
Fri Sep 6 22:19:16 UTC 2013


On Sep 6, 2013, at 2:52 PM, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:

> Aha. And if you look further, you will hit exactly the same macro I had
> changed ;) I know that because I have discovered this by exactly going
> objArrayKlass::oop_push_contents(...) route.

Our collective point was that the change will affect other stuff. For example oop_update_pointers(), oop_adjust_pointers() that are used by the PS old gen collectors use the same macro. The array traversal loop kernel is also reused for the framework collectors. So there will be unintended side effects.

A proper fix would create a separate macro that does the reverse iteration, check out InstanceKlass::oop_push_contents().
The current change is cool however for testing the ordering effects after PS young GCs.

igor

> 
> I'll get the second Igor's suggestion a try some time later, along with
> the performance runs.
> 
> Thanks,
> -Aleksey.
> 
> On 09/07/2013 01:16 AM, Tony Printezis wrote:
>> Duh. Noob mistake. PS doesn't of course use oop_iterate_backwards()!
>> It's objArrayKlass::oop_push_contents(...) that I think you need to
>> change (again, nice localized change though).
>> 
>> Tony
>> 
>> On 9/6/13 11:09 PM, Tony Printezis wrote:
>>> I agree with Igor. This might be OK for a quick experiment. However,
>>> it'd be best to provide a backwards iteration method (there's already
>>> an oop_iterate_backwards(), isn't there? can't you override that for
>>> obj arrays)?
>>> 
>>> Tony
>>> 
>>> On 9/6/13 9:11 PM, Igor Veresov wrote:
>>>> It's probably not a good idea to tweak it like that. It will affect
>>>> all the collectors, for example those that still use BFS. The proper
>>>> way would be to provide the separate "backward" iteration methods
>>>> like we have for objects. Or, for PS, do it locally and call the
>>>> chunked copier function for small arrays as well.
>>>> 
>>>> Btw, any change in performance?
>>>> 
>>>> igor
>>>> 
>>>> On Sep 6, 2013, at 4:56 AM, Aleksey Shipilev
>>>> <aleksey.shipilev at oracle.com> wrote:
>>>> 
>>>>> Igor's suggestion seem to only touch the work-stealing part. For small
>>>>> arrays, this should also be done:
>>>>> 
>>>>> $ hg diff
>>>>> diff -r 428025878417 src/share/vm/oops/objArrayKlass.cpp
>>>>> --- a/src/share/vm/oops/objArrayKlass.cpp    Wed Sep 04 12:56:03
>>>>> 2013 -0700
>>>>> +++ b/src/share/vm/oops/objArrayKlass.cpp    Fri Sep 06 15:45:14
>>>>> 2013 +0400
>>>>> @@ -412,11 +412,11 @@
>>>>> 
>>>>> #define ObjArrayKlass_SPECIALIZED_OOP_ITERATE(T, a, p, do_oop) \
>>>>> {                                   \
>>>>> -  T* p         = (T*)(a)->base();   \
>>>>> -  T* const end = p + (a)->length(); \
>>>>> -  while (p < end) {                 \
>>>>> +  T* const b = (T*)(a)->base();     \
>>>>> +  T* p       = b + (a)->length();   \
>>>>> +  while (b < p) {                   \
>>>>> +    p--;                            \
>>>>>     do_oop;                         \
>>>>> -    p++;                            \
>>>>>   }                                 \
>>>>> }
>>>>> 
>>>>> ...and also in a few other relevant places.
>>>>> 
>>>>> This very limited and untested change "fixes" the layout in the
>>>>> original
>>>>> test. I have submitted CR 8024394 to track this.
>>>>> 
>>>>> Thanks,
>>>>> -Aleksey.
>>>>> 
>>>>> 
>>>>> On 09/05/2013 12:50 AM, Igor Veresov wrote:
>>>>>> For PS, look in psPromotionManager.cpp, here the kernel you need to
>>>>>> trivially tweak:
>>>>>> 
>>>>>> template <class T> void PSPromotionManager::process_array_chunk_work(
>>>>>>                                                 oop obj,
>>>>>>                                                 int start, int end) {
>>>>>>  assert(start <= end, "invariant");
>>>>>>  T* const base      = (T*)objArrayOop(obj)->base();
>>>>>>  T* p               = base + start;
>>>>>>  T* const chunk_end = base + end;
>>>>>>  while (p < chunk_end) {
>>>>>>    if (PSScavenge::should_scavenge(p)) {
>>>>>>      claim_or_forward_depth(p);
>>>>>>    }
>>>>>>    ++p;
>>>>>>  }
>>>>>> }
>>>>>> 
>>>>>> Like Tony and Thomas said before, you'll still be seeing
>>>>>> "surprises" due
>>>>>> to array chunking and work stealing. Those, I guess, you'll just
>>>>>> have to
>>>>>> live with.
>>>>>> 
>>>>>> igor
>>>>>> 
>>>>>> On Sep 4, 2013, at 1:34 PM, Aleksey Shipilev
>>>>>> <aleksey.shipilev at oracle.com <mailto:aleksey.shipilev at oracle.com>>
>>>>>> wrote:
>>>>>> 
>>>>>>> Here you have it, thanks Igor.
>>>>>>> Any reference to the relevant block of code?
>>>>>>> I can probably try to fix this in background.
>>>>>>> 
>>>>>>> -Aleksey.
>>>>>>> 
>>>>>>> On 05.09.2013, at 0:00, Igor Veresov <iggy.veresov at gmail.com
>>>>>>> <mailto:iggy.veresov at gmail.com>> wrote:
>>>>>>> 
>>>>>>>> Yup, that's a depth-first array-scanning quirk. The work-stealing is
>>>>>>>> done using stacks, so in order to have the first fields followed
>>>>>>>> first the references need to be put of stack in reverse. That's done
>>>>>>>> for regular objects but for arrays it's not.
>>>>>>>> 
>>>>>>>> igor
>>>>>>>> 
>>>>>>>> On Sep 4, 2013, at 12:51 PM, Aleksey Shipilev
>>>>>>>> <aleksey.shipilev at oracle.com
>>>>>>>> <mailto:aleksey.shipilev at oracle.com>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Jon,
>>>>>>>>> 
>>>>>>>>> On 09/04/2013 10:19 PM, Jon Masamitsu wrote:
>>>>>>>>>> I haven't followed this thread carefully enough but the ParallelGC
>>>>>>>>>> collector uses a depth-first traversal while the other
>>>>>>>>>> collectors use
>>>>>>>>>> a breadth-first. Would that explain the difference?
>>>>>>>>> The referenced objects in the array are the leaves in reachability
>>>>>>>>> graph. I thought there is no difference in depth- vs.
>>>>>>>>> breadth-first in
>>>>>>>>> this case? It looks more like we record the traversed objects on
>>>>>>>>> some
>>>>>>>>> LIFO structure, which polls the elements in the reverse order.
>>>>>>>>> 
>>>>>>>>> -Aleksey.
>>> 
>> 
> 




More information about the hotspot-gc-dev mailing list