Reduce SharedHeap::process_strong_roots time

Wed Jul 24 17:09:26 UTC 2013

On 07/24/2013 08:32 AM, Coleen Phillimore wrote:
>
> Ioi,
>
> Do you have any benchmarks showing the times spent processing #2?
Not yet :-(
> Another idea is to make Klass::_mirror and ArrayKlass::_component 
> mirror jobjects, which are java handles and store them in the 
> ClassLoaderData::_handles area, just as the 
> ConstantPool::_resolved_references array is stored.
>
I looked at ClassLoaderData::_handles, which is a JNIHandleBlock. It 
looks like JNIHandleBlock::oops_do() still need to scan all the 
references one-by-one. So we are just shifting the problem around.

[disclaimer: I have zero knowledge of how the various hotspot GC works] 
-- I was hoping that if we use a proper Java object array to store the 
references (to interned strings, mirrors, etc), the array will be 
eventually moved to old generation. Then, scanning of the roots in this 
object array can be done using card tables (more efficient than linear 
scanning??). Also, if the mirrors are also moved into old generation, 
then we will have no need to scan them at all (during young GCs).

Is my understanding correct?

Thanks
- Ioi

> Then we have zero oops to follow in any metadata.
>
> It has an extra complication with MVM because I need to virtualize 
> them but I need to virtualize a lot of things in ClassLoaderData for MVM.
>
> It might also cost a lock creating them but we only create the mirrors 
> once.
>
> Coleen
>
> On 07/19/2013 05:16 PM, Ioi Lam wrote:
>> There are two large group of "strong roots"  scanned during inside 
>> SharedHeap::process_strong_roots:
>>
>>     #1. interned strings
>>     #2. Klass::_mirror
>>
>> For some large apps, it's common for #1 to be in the order of 
>> hundreds of thousands, and #2 to be in tens of thousands.
>>
>> (There's also #3, system dictionary, but I am addressing that with 
>> JDK-8003420).
>> (#1 is partially addressed with parallel scanning JDK-8015237, but 
>> it's still a large number per parallel task).
>>
>> I am wondering if we can simplify the scanning by using a form of 
>> handles. E.g, instead of having
>>
>>     class Klass {
>>       oop _java_mirror;
>>       oop java_mirror() const              { return _java_mirror; }
>>    }
>>
>> we have
>>
>>     class Klass {
>>       int _java_mirror_index;
>>       oop java_mirror() const              { return 
>> _globalArrayOop->obj_at(_java_mirror_index); }
>>    }
>>
>> This way, SharedHeap::process_strong_roots only has to scan a single 
>> pointer: &_globalArrayOop.
>>
>> The up side would be reduced GC pauses.
>>
>> The down side is increased footprint, and potentially slower 
>> performance. Because _globalArrayOop is a regular Java obj array, we 
>> will need a separate int array to keep track of the free slots. So on 
>> 64-bit platforms, we need an additional 12 bytes per reference.
>>
>> Also, there will be internal fragmentation in the _globalArrayOop 
>> after class unloading, so in the worst case, more work needs to be 
>> done (scanning a lot of NULL entries in this array).
>>
>> What do you think?
>>
>> - Ioi
>