Parallel GC and array object layout: way off the base and laid out in reverse?
Tony Printezis
tprintezis at twitter.com
Fri Sep 6 21:16:09 UTC 2013
Duh. Noob mistake. PS doesn't of course use oop_iterate_backwards()!
It's objArrayKlass::oop_push_contents(...) that I think you need to
change (again, nice localized change though).
Tony
On 9/6/13 11:09 PM, Tony Printezis wrote:
> I agree with Igor. This might be OK for a quick experiment. However,
> it'd be best to provide a backwards iteration method (there's already
> an oop_iterate_backwards(), isn't there? can't you override that for
> obj arrays)?
>
> Tony
>
> On 9/6/13 9:11 PM, Igor Veresov wrote:
>> It's probably not a good idea to tweak it like that. It will affect
>> all the collectors, for example those that still use BFS. The proper
>> way would be to provide the separate "backward" iteration methods
>> like we have for objects. Or, for PS, do it locally and call the
>> chunked copier function for small arrays as well.
>>
>> Btw, any change in performance?
>>
>> igor
>>
>> On Sep 6, 2013, at 4:56 AM, Aleksey Shipilev
>> <aleksey.shipilev at oracle.com> wrote:
>>
>>> Igor's suggestion seem to only touch the work-stealing part. For small
>>> arrays, this should also be done:
>>>
>>> $ hg diff
>>> diff -r 428025878417 src/share/vm/oops/objArrayKlass.cpp
>>> --- a/src/share/vm/oops/objArrayKlass.cpp Wed Sep 04 12:56:03
>>> 2013 -0700
>>> +++ b/src/share/vm/oops/objArrayKlass.cpp Fri Sep 06 15:45:14
>>> 2013 +0400
>>> @@ -412,11 +412,11 @@
>>>
>>> #define ObjArrayKlass_SPECIALIZED_OOP_ITERATE(T, a, p, do_oop) \
>>> { \
>>> - T* p = (T*)(a)->base(); \
>>> - T* const end = p + (a)->length(); \
>>> - while (p < end) { \
>>> + T* const b = (T*)(a)->base(); \
>>> + T* p = b + (a)->length(); \
>>> + while (b < p) { \
>>> + p--; \
>>> do_oop; \
>>> - p++; \
>>> } \
>>> }
>>>
>>> ...and also in a few other relevant places.
>>>
>>> This very limited and untested change "fixes" the layout in the
>>> original
>>> test. I have submitted CR 8024394 to track this.
>>>
>>> Thanks,
>>> -Aleksey.
>>>
>>>
>>> On 09/05/2013 12:50 AM, Igor Veresov wrote:
>>>> For PS, look in psPromotionManager.cpp, here the kernel you need to
>>>> trivially tweak:
>>>>
>>>> template <class T> void PSPromotionManager::process_array_chunk_work(
>>>> oop obj,
>>>> int start, int end) {
>>>> assert(start <= end, "invariant");
>>>> T* const base = (T*)objArrayOop(obj)->base();
>>>> T* p = base + start;
>>>> T* const chunk_end = base + end;
>>>> while (p < chunk_end) {
>>>> if (PSScavenge::should_scavenge(p)) {
>>>> claim_or_forward_depth(p);
>>>> }
>>>> ++p;
>>>> }
>>>> }
>>>>
>>>> Like Tony and Thomas said before, you'll still be seeing
>>>> "surprises" due
>>>> to array chunking and work stealing. Those, I guess, you'll just
>>>> have to
>>>> live with.
>>>>
>>>> igor
>>>>
>>>> On Sep 4, 2013, at 1:34 PM, Aleksey Shipilev
>>>> <aleksey.shipilev at oracle.com <mailto:aleksey.shipilev at oracle.com>>
>>>> wrote:
>>>>
>>>>> Here you have it, thanks Igor.
>>>>> Any reference to the relevant block of code?
>>>>> I can probably try to fix this in background.
>>>>>
>>>>> -Aleksey.
>>>>>
>>>>> On 05.09.2013, at 0:00, Igor Veresov <iggy.veresov at gmail.com
>>>>> <mailto:iggy.veresov at gmail.com>> wrote:
>>>>>
>>>>>> Yup, that's a depth-first array-scanning quirk. The work-stealing is
>>>>>> done using stacks, so in order to have the first fields followed
>>>>>> first the references need to be put of stack in reverse. That's done
>>>>>> for regular objects but for arrays it's not.
>>>>>>
>>>>>> igor
>>>>>>
>>>>>> On Sep 4, 2013, at 12:51 PM, Aleksey Shipilev
>>>>>> <aleksey.shipilev at oracle.com
>>>>>> <mailto:aleksey.shipilev at oracle.com>> wrote:
>>>>>>
>>>>>>> Hi Jon,
>>>>>>>
>>>>>>> On 09/04/2013 10:19 PM, Jon Masamitsu wrote:
>>>>>>>> I haven't followed this thread carefully enough but the ParallelGC
>>>>>>>> collector uses a depth-first traversal while the other
>>>>>>>> collectors use
>>>>>>>> a breadth-first. Would that explain the difference?
>>>>>>> The referenced objects in the array are the leaves in reachability
>>>>>>> graph. I thought there is no difference in depth- vs.
>>>>>>> breadth-first in
>>>>>>> this case? It looks more like we record the traversed objects on
>>>>>>> some
>>>>>>> LIFO structure, which polls the elements in the reverse order.
>>>>>>>
>>>>>>> -Aleksey.
>
--
Tony Printezis | Staff Software Engineer | Twitter
@TonyPrintezis
tprintezis at twitter.com
More information about the hotspot-gc-dev
mailing list