[9] RFR(M) 8041984: CompilerThread seems to occupy all CPU in a very rare situation

Thu Oct 23 06:17:06 UTC 2014

Thank you, Igor

Vladimir

On 10/22/14 11:13 PM, Igor Veresov wrote:
> Thanks for the explanation! It’s a bit clearer now. :) Reviewed.
>
> igor
>
> On Oct 22, 2014, at 1:06 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>
>> Thank you, Igor
>>
>> On 10/21/14 11:05 PM, Igor Veresov wrote:
>>> Alright, since nobody is taking on it.. The timeout feature looks good,
>>> I suppose a JIT needs to be adaptive to available CPU resources.
>>>
>>> A couple of small questions:
>>>
>>> 1. Why is it necessary to add phantom_obj to worklists?
>>>
>>> *  130   ptnodes_worklist.append(phantom_obj);
>>>    131   java_objects_worklist.append(phantom_obj);*
>>
>> To avoid duplicated entries for phantom_obj on these lists.
>>
>> Even so processed ideal nodes are unique on ideal_nodes list several ideal nodes are mapped to phantom_obj:
>>
>>       case Op_CreateEx: {
>>         // assume that all exception objects globally escape
>>         map_ideal_node(n, phantom_obj);
>>         break;
>>
>> as result ptnode_adr(n->_idx) will return phantom_obj several times and it will be added to above lists.
>>
>> I added next comment:
>>
>>   // Processed ideal nodes are unique on ideal_nodes list
>>   // but several ideal nodes are mapped to the phantom_obj.
>>   // To avoid duplicated entries on the following worklists
>>   // add the phantom_obj only once to them.
>>   ptnodes_worklist.append(phantom_obj);
>>
>>>
>>> 2. What does pidx_bias do here?
>>>
>>>   399   inline void add_to_worklist(PointsToNode* pt) {
>>>   400     PointsToNode* ptf = pt;
>>>   401     uint pidx_bias = 0;
>>>   402     if (PointsToNode::is_base_use(pt)) {
>>>   403       ptf = PointsToNode::get_use_node(pt)->as_Field();
>>>   404       pidx_bias = _next_pidx;
>>>   405     }
>>>   406     if (!_in_worklist.test_set(ptf->pidx() + pidx_bias)) {
>>>   407       _worklist.append(pt);
>>>   408     }
>>>   409   }
>>
>> It is complicated :(
>>
>> To avoid creating a separate worklist in PointsToNode some edges are encoded specially (low bit set in a pointer):
>>
>>   // Mark base edge use to distinguish from stored value edge.
>>   bool add_base_use(FieldNode* use) { return _uses.append_if_missing((PointsToNode*)((intptr_t)use + 1)); }
>>
>> It is used in add_java_object_edges():
>>
>>     PointsToNode* use = _worklist.at(l);
>>     if (PointsToNode::is_base_use(use)) {
>>       // Add reference from jobj to field and from field to jobj (field's base).
>>       use = PointsToNode::get_use_node(use)->as_Field();
>>       if (add_base(use->as_Field(), jobj)) {
>>         new_edges++;
>>
>> So the _worklist may have the same node's pointers, with the bit set and without.  add_to_worklist() want to make sure only one of each type is present on the _worklist. As result VectorSet _in_worklist should have 2 separate entries for each type of pointers.
>>
>> I added new _pidx to PointsToNode because I did not want VectorSet _in_worklist to be big - number of created EA nodes is usually about 10% from number of Ideal nodes. So _next_pidx is the number of all created EA nodes and adding it as bias guarantee the an unique entry in VectorSet.
>>
>> I will add next comment to add_to_worklist():
>>
>>     if (PointsToNode::is_base_use(pt)) {
>>       // Create a separate entry in _in_worklist for a marked base edge
>>       // because _worklist may have an entry for a normal edge pointing
>>       // to the same node. To separate them use _next_pidx as bias.
>>       ptf = PointsToNode::get_use_node(pt)->as_Field();
>>       pidx_bias = _next_pidx;
>>
>>
>> I hope I answered your question :)
>>
>> Thanks,
>> Vladimir
>>
>>>
>>>
>>> Thanks,
>>> igor
>>>
>>>
>>> On Oct 21, 2014, at 11:32 AM, Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>>
>>>> Please, I need reviews.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 10/14/14 4:52 PM, Vladimir Kozlov wrote:
>>>>> I updated changes.
>>>>>
>>>>> http://cr.openjdk.java.net/~kvn/8041984/webrev.01/
>>>>>
>>>>> As David suggested I added SAMPLE_SIZE constant:
>>>>>
>>>>> #define SAMPLE_SIZE 4
>>>>>         if ((next % SAMPLE_SIZE) == 0) {
>>>>>           // Each 4 iterations calculate how much time it will take
>>>>>           // to complete graph construction.
>>>>>           time.stop();
>>>>>           double stop_time = time.seconds();
>>>>>           double time_per_iter = (stop_time - start_time) /
>>>>> (double)SAMPLE_SIZE;
>>>>>           double time_until_end = time_per_iter *
>>>>> (double)(java_objects_length - next);
>>>>>           if ((start_time + time_until_end) >= EscapeAnalysisTimeout) {
>>>>>
>>>>> I also replaced TIME_LIMIT constant with flag:
>>>>>
>>>>>   product(double, EscapeAnalysisTimeout, 20. DEBUG_ONLY(+40.),      \
>>>>>           "Abort EA when it reaches time limit (in sec)")      \
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 10/14/14 1:40 PM, Vladimir Kozlov wrote:
>>>>>> On 10/14/14 10:30 AM, David Chase wrote:
>>>>>>> I’m curious about the granularity of the completion-prediction —
>>>>>>> you’re checking every 4 iterations and extrapolating from there.
>>>>>>> Do we know that the times are that uniform, or would we be better off
>>>>>>> with larger sample sizes?
>>>>>>
>>>>>> Yes, I observed uniformly slowly growing times for PARTS of iterations
>>>>>> and not all iterations. That is why I want to take several samples.
>>>>>> For example first iterations where fast but when array allocation
>>>>>> objects begin processed time are rapidly growing.
>>>>>> If I take very big sample fast iterations affect the result - delay
>>>>>> abort.
>>>>>>
>>>>>> I did implementation similar to your "alternate approach" suggestion and
>>>>>> I saw that the abort happened much later than with current my approach.
>>>>>>
>>>>>>> (How long does a single iteration take, anyway?)
>>>>>>
>>>>>> As I said one slow iteration may take upto 1 sec. So I want to abort as
>>>>>> soon as I see a such case (but allow spikes). Taking larger sample sizes
>>>>>> may delay an abort for several seconds.
>>>>>>
>>>>>>>
>>>>>>> And in the sample code, I see the choice of sample size (4) encoded as
>>>>>>>   (next & 3) == 0
>>>>>>>   // Each 4 iterations calculate how much time it will take
>>>>>>>   double time_per_iter = (stop_time - start_time) * 0.25;
>>>>>>>
>>>>>>> why not write this as (say)
>>>>>>>
>>>>>>>   static const int LOG_SAMPLE_SIZE = 2;
>>>>>>>   static const int SAMPLE_SIZE = (1 << LOG_SAMPLE_SIZE);
>>>>>>>   next % SAMPLE_SIZE == 0   // next could be a uint, couldn’t it?
>>>>>>>   double time_per_iter = (stop_time - start_time) * (1.0/SAMPLE_SIZE);
>>>>>>>
>>>>>>> perhaps rewriting 1.0/SAMPLE_SIZE as another static const as necessary
>>>>>>> to get it properly precalculated; I’m not sure what C++ compilers are
>>>>>>> doing nowadays.
>>>>>>
>>>>>> I am optimizing :) by avoiding % and / operations. But I agree and can
>>>>>> add SAMPLE_SIZE constant so code can be more clear.
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>>>
>>>>>>> An alternate approach would be to compute the overall rate thus far,
>>>>>>> checking at
>>>>>>> (say) iterations 4, 8, 16, 32, etc and computing the average from
>>>>>>> there, starting
>>>>>>> at whatever power of two is a decent fraction of the iterations needed
>>>>>>> to timeout.
>>>>>>>
>>>>>>> e.g., don’t reset start_time, just compute elapsed at each sample:
>>>>>>>
>>>>>>> if ((next & (ptwo - 1) == 0) {
>>>>>>>   time.stop();
>>>>>>>   double elapsed = time.seconds() - start_time;
>>>>>>>   double projected = elapsed * (double) java_objects_length / next;
>>>>>>>
>>>>>>> Or you can compute a java_objects_length that would surely lead to
>>>>>>> failure, and
>>>>>>> always do that (fast, integer) comparison:
>>>>>>>   ...
>>>>>>>   unsigned int length_bound = (int) next * (CG_BUILD_TIME_LIMIT /
>>>>>>> elapsed)
>>>>>>>   …
>>>>>>>   if (java_objects_length > length_bound) { … timeout
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> On 2014-10-10, at 11:41 PM, Vladimir Kozlov
>>>>>>> <vladimir.kozlov at oracle.com> wrote:
>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8041984
>>>>>>>> http://cr.openjdk.java.net/~kvn/8041984/webrev/
>>>>>>>>
>>>>>>>> Next method puts C2's EA to its knees:
>>>>>>>>
>>>>>>>> com.graphhopper.coll.GHLongIntBTree$BTreeEntry::put()
>>>>>>>>
>>>>>>>> https://github.com/karussell/graphhopper/blob/master/core/src/main/java/com/graphhopper/coll/GHLongIntBTree.java
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> It shows that current EA connection graph build code is very
>>>>>>>> inefficient when a graph has a lot of connections (edges).
>>>>>>>>
>>>>>>>> With inlining the BTreeEntry::put() method has about 100 allocations
>>>>>>>> from which about 80 are arrays allocations. Most of them are object
>>>>>>>> arrays which are generated by Arrays.copyOf(). After number of edges
>>>>>>>> in EA Connection Graph reaches about 50000 it takes more then 1 sec
>>>>>>>> to find all nodes which reference an allocation.
>>>>>>>>
>>>>>>>> Current EA code has bailout by timeout from graph building code. But
>>>>>>>> the timeout (30 sec for product VM) is checked only after all object
>>>>>>>> nodes are processed. So it may take > 2 mins before EA is aborted:
>>>>>>>>
>>>>>>>> <phase name='connectionGraph' nodes='18787' live='10038'
>>>>>>>> stamp='4.417'>
>>>>>>>> <connectionGraph_bailout reason='reached time limit'/>
>>>>>>>> <phase_done name='connectionGraph' nodes='18787' live='10038'
>>>>>>>> stamp='150.967'/>
>>>>>>>>
>>>>>>>> Also ConnectionGraph::add_java_object_edges() method does not check
>>>>>>>> the presence of a node when it adds new one to the worklist (in
>>>>>>>> add_to_worklist() method). Which leads to hundreds MB of used by EA
>>>>>>>> memory when BTreeEntry::put() is processed.
>>>>>>>>
>>>>>>>> To address issues new timeout checks were added.
>>>>>>>> New _pidx index field is added to graph's nodes and VectorSet is used
>>>>>>>> to reduce size of worklist in add_java_object_edges() method.
>>>>>>>>
>>>>>>>> Escaping node CreateEx is mapped phantom_obj to reduce number of
>>>>>>>> process objects in connection graph. We do the same for other
>>>>>>>> escaping objects.
>>>>>>>>
>>>>>>>> Tested with graphhopper server application:
>>>>>>>>
>>>>>>>> <phase name='connectionGraph' nodes='18787' live='10038'
>>>>>>>> stamp='6.829'>
>>>>>>>> <connectionGraph_bailout reason='reached time limit'/>
>>>>>>>> <phase_done name='connectionGraph' nodes='18787' live='10038'
>>>>>>>> stamp='22.355'/>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>
>>>
>