[9] RFR(M) 8041984: CompilerThread seems to occupy all CPU in a very rare situation

Tue Oct 14 20:40:35 UTC 2014

On 10/14/14 10:30 AM, David Chase wrote:
> I’m curious about the granularity of the completion-prediction —
> you’re checking every 4 iterations and extrapolating from there.
> Do we know that the times are that uniform, or would we be better off with larger sample sizes?

Yes, I observed uniformly slowly growing times for PARTS of iterations 
and not all iterations. That is why I want to take several samples.
For example first iterations where fast but when array allocation 
objects begin processed time are rapidly growing.
If I take very big sample fast iterations affect the result - delay abort.

I did implementation similar to your "alternate approach" suggestion and 
I saw that the abort happened much later than with current my approach.

> (How long does a single iteration take, anyway?)

As I said one slow iteration may take upto 1 sec. So I want to abort as 
soon as I see a such case (but allow spikes). Taking larger sample sizes 
may delay an abort for several seconds.

>
> And in the sample code, I see the choice of sample size (4) encoded as
>    (next & 3) == 0
>    // Each 4 iterations calculate how much time it will take
>    double time_per_iter = (stop_time - start_time) * 0.25;
>
> why not write this as (say)
>
>    static const int LOG_SAMPLE_SIZE = 2;
>    static const int SAMPLE_SIZE = (1 << LOG_SAMPLE_SIZE);
>    next % SAMPLE_SIZE == 0   // next could be a uint, couldn’t it?
>    double time_per_iter = (stop_time - start_time) * (1.0/SAMPLE_SIZE);
>
> perhaps rewriting 1.0/SAMPLE_SIZE as another static const as necessary
> to get it properly precalculated; I’m not sure what C++ compilers are doing nowadays.

I am optimizing :) by avoiding % and / operations. But I agree and can 
add SAMPLE_SIZE constant so code can be more clear.

Thanks,
Vladimir

>
> An alternate approach would be to compute the overall rate thus far, checking at
> (say) iterations 4, 8, 16, 32, etc and computing the average from there, starting
> at whatever power of two is a decent fraction of the iterations needed to timeout.
>
> e.g., don’t reset start_time, just compute elapsed at each sample:
>
> if ((next & (ptwo - 1) == 0) {
>    time.stop();
>    double elapsed = time.seconds() - start_time;
>    double projected = elapsed * (double) java_objects_length / next;
>
> Or you can compute a java_objects_length that would surely lead to failure, and
> always do that (fast, integer) comparison:
>    ...
>    unsigned int length_bound = (int) next * (CG_BUILD_TIME_LIMIT / elapsed)
>    …
>    if (java_objects_length > length_bound) { … timeout
>
> David
>
> On 2014-10-10, at 11:41 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>
>> https://bugs.openjdk.java.net/browse/JDK-8041984
>> http://cr.openjdk.java.net/~kvn/8041984/webrev/
>>
>> Next method puts C2's EA to its knees:
>>
>> com.graphhopper.coll.GHLongIntBTree$BTreeEntry::put()
>>
>> https://github.com/karussell/graphhopper/blob/master/core/src/main/java/com/graphhopper/coll/GHLongIntBTree.java
>>
>> It shows that current EA connection graph build code is very inefficient when a graph has a lot of connections (edges).
>>
>> With inlining the BTreeEntry::put() method has about 100 allocations from which about 80 are arrays allocations. Most of them are object arrays which are generated by Arrays.copyOf(). After number of edges in EA Connection Graph reaches about 50000 it takes more then 1 sec to find all nodes which reference an allocation.
>>
>> Current EA code has bailout by timeout from graph building code. But the timeout (30 sec for product VM) is checked only after all object nodes are processed. So it may take > 2 mins before EA is aborted:
>>
>> <phase name='connectionGraph' nodes='18787' live='10038' stamp='4.417'>
>> <connectionGraph_bailout reason='reached time limit'/>
>> <phase_done name='connectionGraph' nodes='18787' live='10038' stamp='150.967'/>
>>
>> Also ConnectionGraph::add_java_object_edges() method does not check the presence of a node when it adds new one to the worklist (in add_to_worklist() method). Which leads to hundreds MB of used by EA memory when BTreeEntry::put() is processed.
>>
>> To address issues new timeout checks were added.
>> New _pidx index field is added to graph's nodes and VectorSet is used to reduce size of worklist in add_java_object_edges() method.
>>
>> Escaping node CreateEx is mapped phantom_obj to reduce number of process objects in connection graph. We do the same for other escaping objects.
>>
>> Tested with graphhopper server application:
>>
>> <phase name='connectionGraph' nodes='18787' live='10038' stamp='6.829'>
>> <connectionGraph_bailout reason='reached time limit'/>
>> <phase_done name='connectionGraph' nodes='18787' live='10038' stamp='22.355'/>
>>
>> Thanks,
>> Vladimir
>