[9] RFR(M) 8041984: CompilerThread seems to occupy all CPU in a very rare situation

Tue Oct 14 23:52:17 UTC 2014

I updated changes.

http://cr.openjdk.java.net/~kvn/8041984/webrev.01/

As David suggested I added SAMPLE_SIZE constant:

#define SAMPLE_SIZE 4
         if ((next % SAMPLE_SIZE) == 0) {
           // Each 4 iterations calculate how much time it will take
           // to complete graph construction.
           time.stop();
           double stop_time = time.seconds();
           double time_per_iter = (stop_time - start_time) / 
(double)SAMPLE_SIZE;
           double time_until_end = time_per_iter * 
(double)(java_objects_length - next);
           if ((start_time + time_until_end) >= EscapeAnalysisTimeout) {

I also replaced TIME_LIMIT constant with flag:

   product(double, EscapeAnalysisTimeout, 20. DEBUG_ONLY(+40.), 
      \
           "Abort EA when it reaches time limit (in sec)") 
      \

Thanks,
Vladimir

On 10/14/14 1:40 PM, Vladimir Kozlov wrote:
> On 10/14/14 10:30 AM, David Chase wrote:
>> I’m curious about the granularity of the completion-prediction —
>> you’re checking every 4 iterations and extrapolating from there.
>> Do we know that the times are that uniform, or would we be better off
>> with larger sample sizes?
>
> Yes, I observed uniformly slowly growing times for PARTS of iterations
> and not all iterations. That is why I want to take several samples.
> For example first iterations where fast but when array allocation
> objects begin processed time are rapidly growing.
> If I take very big sample fast iterations affect the result - delay abort.
>
> I did implementation similar to your "alternate approach" suggestion and
> I saw that the abort happened much later than with current my approach.
>
>> (How long does a single iteration take, anyway?)
>
> As I said one slow iteration may take upto 1 sec. So I want to abort as
> soon as I see a such case (but allow spikes). Taking larger sample sizes
> may delay an abort for several seconds.
>
>>
>> And in the sample code, I see the choice of sample size (4) encoded as
>>    (next & 3) == 0
>>    // Each 4 iterations calculate how much time it will take
>>    double time_per_iter = (stop_time - start_time) * 0.25;
>>
>> why not write this as (say)
>>
>>    static const int LOG_SAMPLE_SIZE = 2;
>>    static const int SAMPLE_SIZE = (1 << LOG_SAMPLE_SIZE);
>>    next % SAMPLE_SIZE == 0   // next could be a uint, couldn’t it?
>>    double time_per_iter = (stop_time - start_time) * (1.0/SAMPLE_SIZE);
>>
>> perhaps rewriting 1.0/SAMPLE_SIZE as another static const as necessary
>> to get it properly precalculated; I’m not sure what C++ compilers are
>> doing nowadays.
>
> I am optimizing :) by avoiding % and / operations. But I agree and can
> add SAMPLE_SIZE constant so code can be more clear.
>
> Thanks,
> Vladimir
>
>>
>> An alternate approach would be to compute the overall rate thus far,
>> checking at
>> (say) iterations 4, 8, 16, 32, etc and computing the average from
>> there, starting
>> at whatever power of two is a decent fraction of the iterations needed
>> to timeout.
>>
>> e.g., don’t reset start_time, just compute elapsed at each sample:
>>
>> if ((next & (ptwo - 1) == 0) {
>>    time.stop();
>>    double elapsed = time.seconds() - start_time;
>>    double projected = elapsed * (double) java_objects_length / next;
>>
>> Or you can compute a java_objects_length that would surely lead to
>> failure, and
>> always do that (fast, integer) comparison:
>>    ...
>>    unsigned int length_bound = (int) next * (CG_BUILD_TIME_LIMIT /
>> elapsed)
>>    …
>>    if (java_objects_length > length_bound) { … timeout
>>
>> David
>>
>> On 2014-10-10, at 11:41 PM, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com> wrote:
>>
>>> https://bugs.openjdk.java.net/browse/JDK-8041984
>>> http://cr.openjdk.java.net/~kvn/8041984/webrev/
>>>
>>> Next method puts C2's EA to its knees:
>>>
>>> com.graphhopper.coll.GHLongIntBTree$BTreeEntry::put()
>>>
>>> https://github.com/karussell/graphhopper/blob/master/core/src/main/java/com/graphhopper/coll/GHLongIntBTree.java
>>>
>>>
>>> It shows that current EA connection graph build code is very
>>> inefficient when a graph has a lot of connections (edges).
>>>
>>> With inlining the BTreeEntry::put() method has about 100 allocations
>>> from which about 80 are arrays allocations. Most of them are object
>>> arrays which are generated by Arrays.copyOf(). After number of edges
>>> in EA Connection Graph reaches about 50000 it takes more then 1 sec
>>> to find all nodes which reference an allocation.
>>>
>>> Current EA code has bailout by timeout from graph building code. But
>>> the timeout (30 sec for product VM) is checked only after all object
>>> nodes are processed. So it may take > 2 mins before EA is aborted:
>>>
>>> <phase name='connectionGraph' nodes='18787' live='10038' stamp='4.417'>
>>> <connectionGraph_bailout reason='reached time limit'/>
>>> <phase_done name='connectionGraph' nodes='18787' live='10038'
>>> stamp='150.967'/>
>>>
>>> Also ConnectionGraph::add_java_object_edges() method does not check
>>> the presence of a node when it adds new one to the worklist (in
>>> add_to_worklist() method). Which leads to hundreds MB of used by EA
>>> memory when BTreeEntry::put() is processed.
>>>
>>> To address issues new timeout checks were added.
>>> New _pidx index field is added to graph's nodes and VectorSet is used
>>> to reduce size of worklist in add_java_object_edges() method.
>>>
>>> Escaping node CreateEx is mapped phantom_obj to reduce number of
>>> process objects in connection graph. We do the same for other
>>> escaping objects.
>>>
>>> Tested with graphhopper server application:
>>>
>>> <phase name='connectionGraph' nodes='18787' live='10038' stamp='6.829'>
>>> <connectionGraph_bailout reason='reached time limit'/>
>>> <phase_done name='connectionGraph' nodes='18787' live='10038'
>>> stamp='22.355'/>
>>>
>>> Thanks,
>>> Vladimir
>>