[9] RFR(M) 8041984: CompilerThread seems to occupy all CPU in a very rare situation

Wed Oct 22 06:05:13 UTC 2014

Alright, since nobody is taking on it.. The timeout feature looks good, I suppose a JIT needs to be adaptive to available CPU resources.

A couple of small questions:

1. Why is it necessary to add phantom_obj to worklists?
 130   ptnodes_worklist.append(phantom_obj);
 131   java_objects_worklist.append(phantom_obj);
2. What does pidx_bias do here?
 399   inline void add_to_worklist(PointsToNode* pt) {
 400     PointsToNode* ptf = pt;
 401     uint pidx_bias = 0;
 402     if (PointsToNode::is_base_use(pt)) {
 403       ptf = PointsToNode::get_use_node(pt)->as_Field();
 404       pidx_bias = _next_pidx;
 405     }
 406     if (!_in_worklist.test_set(ptf->pidx() + pidx_bias)) {
 407       _worklist.append(pt);
 408     }
 409   }

Thanks,
igor

On Oct 21, 2014, at 11:32 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:

> Please, I need reviews.
> 
> Thanks,
> Vladimir
> 
> On 10/14/14 4:52 PM, Vladimir Kozlov wrote:
>> I updated changes.
>> 
>> http://cr.openjdk.java.net/~kvn/8041984/webrev.01/
>> 
>> As David suggested I added SAMPLE_SIZE constant:
>> 
>> #define SAMPLE_SIZE 4
>>         if ((next % SAMPLE_SIZE) == 0) {
>>           // Each 4 iterations calculate how much time it will take
>>           // to complete graph construction.
>>           time.stop();
>>           double stop_time = time.seconds();
>>           double time_per_iter = (stop_time - start_time) /
>> (double)SAMPLE_SIZE;
>>           double time_until_end = time_per_iter *
>> (double)(java_objects_length - next);
>>           if ((start_time + time_until_end) >= EscapeAnalysisTimeout) {
>> 
>> I also replaced TIME_LIMIT constant with flag:
>> 
>>   product(double, EscapeAnalysisTimeout, 20. DEBUG_ONLY(+40.),      \
>>           "Abort EA when it reaches time limit (in sec)")      \
>> 
>> Thanks,
>> Vladimir
>> 
>> On 10/14/14 1:40 PM, Vladimir Kozlov wrote:
>>> On 10/14/14 10:30 AM, David Chase wrote:
>>>> I’m curious about the granularity of the completion-prediction —
>>>> you’re checking every 4 iterations and extrapolating from there.
>>>> Do we know that the times are that uniform, or would we be better off
>>>> with larger sample sizes?
>>> 
>>> Yes, I observed uniformly slowly growing times for PARTS of iterations
>>> and not all iterations. That is why I want to take several samples.
>>> For example first iterations where fast but when array allocation
>>> objects begin processed time are rapidly growing.
>>> If I take very big sample fast iterations affect the result - delay
>>> abort.
>>> 
>>> I did implementation similar to your "alternate approach" suggestion and
>>> I saw that the abort happened much later than with current my approach.
>>> 
>>>> (How long does a single iteration take, anyway?)
>>> 
>>> As I said one slow iteration may take upto 1 sec. So I want to abort as
>>> soon as I see a such case (but allow spikes). Taking larger sample sizes
>>> may delay an abort for several seconds.
>>> 
>>>> 
>>>> And in the sample code, I see the choice of sample size (4) encoded as
>>>>   (next & 3) == 0
>>>>   // Each 4 iterations calculate how much time it will take
>>>>   double time_per_iter = (stop_time - start_time) * 0.25;
>>>> 
>>>> why not write this as (say)
>>>> 
>>>>   static const int LOG_SAMPLE_SIZE = 2;
>>>>   static const int SAMPLE_SIZE = (1 << LOG_SAMPLE_SIZE);
>>>>   next % SAMPLE_SIZE == 0   // next could be a uint, couldn’t it?
>>>>   double time_per_iter = (stop_time - start_time) * (1.0/SAMPLE_SIZE);
>>>> 
>>>> perhaps rewriting 1.0/SAMPLE_SIZE as another static const as necessary
>>>> to get it properly precalculated; I’m not sure what C++ compilers are
>>>> doing nowadays.
>>> 
>>> I am optimizing :) by avoiding % and / operations. But I agree and can
>>> add SAMPLE_SIZE constant so code can be more clear.
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>>> 
>>>> An alternate approach would be to compute the overall rate thus far,
>>>> checking at
>>>> (say) iterations 4, 8, 16, 32, etc and computing the average from
>>>> there, starting
>>>> at whatever power of two is a decent fraction of the iterations needed
>>>> to timeout.
>>>> 
>>>> e.g., don’t reset start_time, just compute elapsed at each sample:
>>>> 
>>>> if ((next & (ptwo - 1) == 0) {
>>>>   time.stop();
>>>>   double elapsed = time.seconds() - start_time;
>>>>   double projected = elapsed * (double) java_objects_length / next;
>>>> 
>>>> Or you can compute a java_objects_length that would surely lead to
>>>> failure, and
>>>> always do that (fast, integer) comparison:
>>>>   ...
>>>>   unsigned int length_bound = (int) next * (CG_BUILD_TIME_LIMIT /
>>>> elapsed)
>>>>   …
>>>>   if (java_objects_length > length_bound) { … timeout
>>>> 
>>>> David
>>>> 
>>>> On 2014-10-10, at 11:41 PM, Vladimir Kozlov
>>>> <vladimir.kozlov at oracle.com> wrote:
>>>> 
>>>>> https://bugs.openjdk.java.net/browse/JDK-8041984
>>>>> http://cr.openjdk.java.net/~kvn/8041984/webrev/
>>>>> 
>>>>> Next method puts C2's EA to its knees:
>>>>> 
>>>>> com.graphhopper.coll.GHLongIntBTree$BTreeEntry::put()
>>>>> 
>>>>> https://github.com/karussell/graphhopper/blob/master/core/src/main/java/com/graphhopper/coll/GHLongIntBTree.java
>>>>> 
>>>>> 
>>>>> 
>>>>> It shows that current EA connection graph build code is very
>>>>> inefficient when a graph has a lot of connections (edges).
>>>>> 
>>>>> With inlining the BTreeEntry::put() method has about 100 allocations
>>>>> from which about 80 are arrays allocations. Most of them are object
>>>>> arrays which are generated by Arrays.copyOf(). After number of edges
>>>>> in EA Connection Graph reaches about 50000 it takes more then 1 sec
>>>>> to find all nodes which reference an allocation.
>>>>> 
>>>>> Current EA code has bailout by timeout from graph building code. But
>>>>> the timeout (30 sec for product VM) is checked only after all object
>>>>> nodes are processed. So it may take > 2 mins before EA is aborted:
>>>>> 
>>>>> <phase name='connectionGraph' nodes='18787' live='10038' stamp='4.417'>
>>>>> <connectionGraph_bailout reason='reached time limit'/>
>>>>> <phase_done name='connectionGraph' nodes='18787' live='10038'
>>>>> stamp='150.967'/>
>>>>> 
>>>>> Also ConnectionGraph::add_java_object_edges() method does not check
>>>>> the presence of a node when it adds new one to the worklist (in
>>>>> add_to_worklist() method). Which leads to hundreds MB of used by EA
>>>>> memory when BTreeEntry::put() is processed.
>>>>> 
>>>>> To address issues new timeout checks were added.
>>>>> New _pidx index field is added to graph's nodes and VectorSet is used
>>>>> to reduce size of worklist in add_java_object_edges() method.
>>>>> 
>>>>> Escaping node CreateEx is mapped phantom_obj to reduce number of
>>>>> process objects in connection graph. We do the same for other
>>>>> escaping objects.
>>>>> 
>>>>> Tested with graphhopper server application:
>>>>> 
>>>>> <phase name='connectionGraph' nodes='18787' live='10038' stamp='6.829'>
>>>>> <connectionGraph_bailout reason='reached time limit'/>
>>>>> <phase_done name='connectionGraph' nodes='18787' live='10038'
>>>>> stamp='22.355'/>
>>>>> 
>>>>> Thanks,
>>>>> Vladimir
>>>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20141021/9baa4839/attachment-0001.html>