RFC: Parallel deferred updates
Thomas Schatzl
thomas.schatzl at oracle.com
Fri Aug 19 10:47:01 UTC 2022
Hi again,
On 19.08.22 12:42, Thomas Schatzl wrote:
> Hi,
>
> On 12.08.22 17:42, Nick Gasson wrote:
>> Hi Thomas,
>
> apologies for the late replies (and potentially some in the next
> time), as I'm on and off on vacation right now.
>
>>
>> Thanks for the feedback. I've created JDK-8292296 in the JBS for this.
>>
>> On 11/08/22 17:30 pm, Thomas Schatzl wrote:
>>>
>>> Sounds good, although I would make it just a little more complicated:
>>> maybe it's useful to actually know the number of valid RegionData and
>>> counting these when they are set to size the number of threads. And/or
>>> the number of object arrays (of a particular size, e.g. larger than the
>>> threshold to split them during marking?) as proxy of "lots of work for
>>> that object crossing the region"/"worth spinning up a thread". (if that
>>> is possible at all)
>>
>> One thing we could do is add a shared counter that's incremented
>> whenever RegionData::set_deferred_obj_addr() is called with a non-NULL
>> pointer. That could be used to decide how many threads to use for the
>> update processing, but it doesn't tell you how many of those have a
>> large number of embedded oops. For that you'd need to examine the class
>> and I'm not sure that's safe to do inside PSParallelCompact::fill_region
>> - what if the class pointer was spilled into the next region?
>
> During this phase the object's size can be inferred by the bitmaps;
> actually if you look e.g. at psParallelCompact.cpp:2900 it looks like
> that before getting into the "ParMarkBitMap::would_overflow" branch,
> that object's size is already calculated always. (I.e. that object's end
> searched for, only that final size calculation is conditional, but
> that's just a pointer_delta()).
>
> So it looks like we have the object sizes at our disposal after all.
I was wrong about that, the bitmap iteration call at line 2898 might
directly return ParMarkBitMap::would_overflow I guess, at least from
cursory browsing the code.
However that ParMarkBitmapClosure::do_addr called by
ParMarkBitmap::iterate already needs the size calculated, so maybe by
some refactoring that value could be reused.
>
>>>
>>> While in SPECjbb according to your description it seems fairly clear
>>> that it is useful to parallelize every time with all resources as
>>> each/most of these objects cause lots of work, it seems to be
>>> disadvantageous to spin up lots of threads if there is not.
>>>
>>> I would be interested in the Deferred Updates timing changes for the
>>> other benchmarks. Maybe there is nothing to see here, but idk whether
>>> you looked only at overall scores for them or did some more detailed
>>> analysis.
>>>
>>
>> OK sure, I can collected more data for benchmarks other than SPECjbb.
>
> Thanks, would be nice.
Thomas
More information about the hotspot-gc-dev
mailing list