[9] RFR(M) 8041984: CompilerThread seems to occupy all CPU in a very rare situation

Vladimir Kozlov vladimir.kozlov at oracle.com
Tue Oct 21 21:32:29 UTC 2014


Please, I need reviews.

Thanks,
Vladimir

On 10/14/14 4:52 PM, Vladimir Kozlov wrote:
> I updated changes.
>
> http://cr.openjdk.java.net/~kvn/8041984/webrev.01/
>
> As David suggested I added SAMPLE_SIZE constant:
>
> #define SAMPLE_SIZE 4
>          if ((next % SAMPLE_SIZE) == 0) {
>            // Each 4 iterations calculate how much time it will take
>            // to complete graph construction.
>            time.stop();
>            double stop_time = time.seconds();
>            double time_per_iter = (stop_time - start_time) /
> (double)SAMPLE_SIZE;
>            double time_until_end = time_per_iter *
> (double)(java_objects_length - next);
>            if ((start_time + time_until_end) >= EscapeAnalysisTimeout) {
>
> I also replaced TIME_LIMIT constant with flag:
>
>    product(double, EscapeAnalysisTimeout, 20. DEBUG_ONLY(+40.),      \
>            "Abort EA when it reaches time limit (in sec)")      \
>
> Thanks,
> Vladimir
>
> On 10/14/14 1:40 PM, Vladimir Kozlov wrote:
>> On 10/14/14 10:30 AM, David Chase wrote:
>>> I’m curious about the granularity of the completion-prediction —
>>> you’re checking every 4 iterations and extrapolating from there.
>>> Do we know that the times are that uniform, or would we be better off
>>> with larger sample sizes?
>>
>> Yes, I observed uniformly slowly growing times for PARTS of iterations
>> and not all iterations. That is why I want to take several samples.
>> For example first iterations where fast but when array allocation
>> objects begin processed time are rapidly growing.
>> If I take very big sample fast iterations affect the result - delay
>> abort.
>>
>> I did implementation similar to your "alternate approach" suggestion and
>> I saw that the abort happened much later than with current my approach.
>>
>>> (How long does a single iteration take, anyway?)
>>
>> As I said one slow iteration may take upto 1 sec. So I want to abort as
>> soon as I see a such case (but allow spikes). Taking larger sample sizes
>> may delay an abort for several seconds.
>>
>>>
>>> And in the sample code, I see the choice of sample size (4) encoded as
>>>    (next & 3) == 0
>>>    // Each 4 iterations calculate how much time it will take
>>>    double time_per_iter = (stop_time - start_time) * 0.25;
>>>
>>> why not write this as (say)
>>>
>>>    static const int LOG_SAMPLE_SIZE = 2;
>>>    static const int SAMPLE_SIZE = (1 << LOG_SAMPLE_SIZE);
>>>    next % SAMPLE_SIZE == 0   // next could be a uint, couldn’t it?
>>>    double time_per_iter = (stop_time - start_time) * (1.0/SAMPLE_SIZE);
>>>
>>> perhaps rewriting 1.0/SAMPLE_SIZE as another static const as necessary
>>> to get it properly precalculated; I’m not sure what C++ compilers are
>>> doing nowadays.
>>
>> I am optimizing :) by avoiding % and / operations. But I agree and can
>> add SAMPLE_SIZE constant so code can be more clear.
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> An alternate approach would be to compute the overall rate thus far,
>>> checking at
>>> (say) iterations 4, 8, 16, 32, etc and computing the average from
>>> there, starting
>>> at whatever power of two is a decent fraction of the iterations needed
>>> to timeout.
>>>
>>> e.g., don’t reset start_time, just compute elapsed at each sample:
>>>
>>> if ((next & (ptwo - 1) == 0) {
>>>    time.stop();
>>>    double elapsed = time.seconds() - start_time;
>>>    double projected = elapsed * (double) java_objects_length / next;
>>>
>>> Or you can compute a java_objects_length that would surely lead to
>>> failure, and
>>> always do that (fast, integer) comparison:
>>>    ...
>>>    unsigned int length_bound = (int) next * (CG_BUILD_TIME_LIMIT /
>>> elapsed)
>>>>>>    if (java_objects_length > length_bound) { … timeout
>>>
>>> David
>>>
>>> On 2014-10-10, at 11:41 PM, Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com> wrote:
>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8041984
>>>> http://cr.openjdk.java.net/~kvn/8041984/webrev/
>>>>
>>>> Next method puts C2's EA to its knees:
>>>>
>>>> com.graphhopper.coll.GHLongIntBTree$BTreeEntry::put()
>>>>
>>>> https://github.com/karussell/graphhopper/blob/master/core/src/main/java/com/graphhopper/coll/GHLongIntBTree.java
>>>>
>>>>
>>>>
>>>> It shows that current EA connection graph build code is very
>>>> inefficient when a graph has a lot of connections (edges).
>>>>
>>>> With inlining the BTreeEntry::put() method has about 100 allocations
>>>> from which about 80 are arrays allocations. Most of them are object
>>>> arrays which are generated by Arrays.copyOf(). After number of edges
>>>> in EA Connection Graph reaches about 50000 it takes more then 1 sec
>>>> to find all nodes which reference an allocation.
>>>>
>>>> Current EA code has bailout by timeout from graph building code. But
>>>> the timeout (30 sec for product VM) is checked only after all object
>>>> nodes are processed. So it may take > 2 mins before EA is aborted:
>>>>
>>>> <phase name='connectionGraph' nodes='18787' live='10038' stamp='4.417'>
>>>> <connectionGraph_bailout reason='reached time limit'/>
>>>> <phase_done name='connectionGraph' nodes='18787' live='10038'
>>>> stamp='150.967'/>
>>>>
>>>> Also ConnectionGraph::add_java_object_edges() method does not check
>>>> the presence of a node when it adds new one to the worklist (in
>>>> add_to_worklist() method). Which leads to hundreds MB of used by EA
>>>> memory when BTreeEntry::put() is processed.
>>>>
>>>> To address issues new timeout checks were added.
>>>> New _pidx index field is added to graph's nodes and VectorSet is used
>>>> to reduce size of worklist in add_java_object_edges() method.
>>>>
>>>> Escaping node CreateEx is mapped phantom_obj to reduce number of
>>>> process objects in connection graph. We do the same for other
>>>> escaping objects.
>>>>
>>>> Tested with graphhopper server application:
>>>>
>>>> <phase name='connectionGraph' nodes='18787' live='10038' stamp='6.829'>
>>>> <connectionGraph_bailout reason='reached time limit'/>
>>>> <phase_done name='connectionGraph' nodes='18787' live='10038'
>>>> stamp='22.355'/>
>>>>
>>>> Thanks,
>>>> Vladimir
>>>


More information about the hotspot-compiler-dev mailing list