[9] RFR(M) 8041984: CompilerThread seems to occupy all CPU in a very rare situation

Thu Oct 23 06:13:39 UTC 2014

Thanks for the explanation! It’s a bit clearer now. :) Reviewed.

igor

On Oct 22, 2014, at 1:06 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:

> Thank you, Igor
> 
> On 10/21/14 11:05 PM, Igor Veresov wrote:
>> Alright, since nobody is taking on it.. The timeout feature looks good,
>> I suppose a JIT needs to be adaptive to available CPU resources.
>> 
>> A couple of small questions:
>> 
>> 1. Why is it necessary to add phantom_obj to worklists?
>> 
>> *  130   ptnodes_worklist.append(phantom_obj);
>>   131   java_objects_worklist.append(phantom_obj);*
> 
> To avoid duplicated entries for phantom_obj on these lists.
> 
> Even so processed ideal nodes are unique on ideal_nodes list several ideal nodes are mapped to phantom_obj:
> 
>      case Op_CreateEx: {
>        // assume that all exception objects globally escape
>        map_ideal_node(n, phantom_obj);
>        break;
> 
> as result ptnode_adr(n->_idx) will return phantom_obj several times and it will be added to above lists.
> 
> I added next comment:
> 
>  // Processed ideal nodes are unique on ideal_nodes list
>  // but several ideal nodes are mapped to the phantom_obj.
>  // To avoid duplicated entries on the following worklists
>  // add the phantom_obj only once to them.
>  ptnodes_worklist.append(phantom_obj);
> 
>> 
>> 2. What does pidx_bias do here?
>> 
>>  399   inline void add_to_worklist(PointsToNode* pt) {
>>  400     PointsToNode* ptf = pt;
>>  401     uint pidx_bias = 0;
>>  402     if (PointsToNode::is_base_use(pt)) {
>>  403       ptf = PointsToNode::get_use_node(pt)->as_Field();
>>  404       pidx_bias = _next_pidx;
>>  405     }
>>  406     if (!_in_worklist.test_set(ptf->pidx() + pidx_bias)) {
>>  407       _worklist.append(pt);
>>  408     }
>>  409   }
> 
> It is complicated :(
> 
> To avoid creating a separate worklist in PointsToNode some edges are encoded specially (low bit set in a pointer):
> 
>  // Mark base edge use to distinguish from stored value edge.
>  bool add_base_use(FieldNode* use) { return _uses.append_if_missing((PointsToNode*)((intptr_t)use + 1)); }
> 
> It is used in add_java_object_edges():
> 
>    PointsToNode* use = _worklist.at(l);
>    if (PointsToNode::is_base_use(use)) {
>      // Add reference from jobj to field and from field to jobj (field's base).
>      use = PointsToNode::get_use_node(use)->as_Field();
>      if (add_base(use->as_Field(), jobj)) {
>        new_edges++;
> 
> So the _worklist may have the same node's pointers, with the bit set and without.  add_to_worklist() want to make sure only one of each type is present on the _worklist. As result VectorSet _in_worklist should have 2 separate entries for each type of pointers.
> 
> I added new _pidx to PointsToNode because I did not want VectorSet _in_worklist to be big - number of created EA nodes is usually about 10% from number of Ideal nodes. So _next_pidx is the number of all created EA nodes and adding it as bias guarantee the an unique entry in VectorSet.
> 
> I will add next comment to add_to_worklist():
> 
>    if (PointsToNode::is_base_use(pt)) {
>      // Create a separate entry in _in_worklist for a marked base edge
>      // because _worklist may have an entry for a normal edge pointing
>      // to the same node. To separate them use _next_pidx as bias.
>      ptf = PointsToNode::get_use_node(pt)->as_Field();
>      pidx_bias = _next_pidx;
> 
> 
> I hope I answered your question :)
> 
> Thanks,
> Vladimir
> 
>> 
>> 
>> Thanks,
>> igor
>> 
>> 
>> On Oct 21, 2014, at 11:32 AM, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>> 
>>> Please, I need reviews.
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>> On 10/14/14 4:52 PM, Vladimir Kozlov wrote:
>>>> I updated changes.
>>>> 
>>>> http://cr.openjdk.java.net/~kvn/8041984/webrev.01/
>>>> 
>>>> As David suggested I added SAMPLE_SIZE constant:
>>>> 
>>>> #define SAMPLE_SIZE 4
>>>>        if ((next % SAMPLE_SIZE) == 0) {
>>>>          // Each 4 iterations calculate how much time it will take
>>>>          // to complete graph construction.
>>>>          time.stop();
>>>>          double stop_time = time.seconds();
>>>>          double time_per_iter = (stop_time - start_time) /
>>>> (double)SAMPLE_SIZE;
>>>>          double time_until_end = time_per_iter *
>>>> (double)(java_objects_length - next);
>>>>          if ((start_time + time_until_end) >= EscapeAnalysisTimeout) {
>>>> 
>>>> I also replaced TIME_LIMIT constant with flag:
>>>> 
>>>>  product(double, EscapeAnalysisTimeout, 20. DEBUG_ONLY(+40.),      \
>>>>          "Abort EA when it reaches time limit (in sec)")      \
>>>> 
>>>> Thanks,
>>>> Vladimir
>>>> 
>>>> On 10/14/14 1:40 PM, Vladimir Kozlov wrote:
>>>>> On 10/14/14 10:30 AM, David Chase wrote:
>>>>>> I’m curious about the granularity of the completion-prediction —
>>>>>> you’re checking every 4 iterations and extrapolating from there.
>>>>>> Do we know that the times are that uniform, or would we be better off
>>>>>> with larger sample sizes?
>>>>> 
>>>>> Yes, I observed uniformly slowly growing times for PARTS of iterations
>>>>> and not all iterations. That is why I want to take several samples.
>>>>> For example first iterations where fast but when array allocation
>>>>> objects begin processed time are rapidly growing.
>>>>> If I take very big sample fast iterations affect the result - delay
>>>>> abort.
>>>>> 
>>>>> I did implementation similar to your "alternate approach" suggestion and
>>>>> I saw that the abort happened much later than with current my approach.
>>>>> 
>>>>>> (How long does a single iteration take, anyway?)
>>>>> 
>>>>> As I said one slow iteration may take upto 1 sec. So I want to abort as
>>>>> soon as I see a such case (but allow spikes). Taking larger sample sizes
>>>>> may delay an abort for several seconds.
>>>>> 
>>>>>> 
>>>>>> And in the sample code, I see the choice of sample size (4) encoded as
>>>>>>  (next & 3) == 0
>>>>>>  // Each 4 iterations calculate how much time it will take
>>>>>>  double time_per_iter = (stop_time - start_time) * 0.25;
>>>>>> 
>>>>>> why not write this as (say)
>>>>>> 
>>>>>>  static const int LOG_SAMPLE_SIZE = 2;
>>>>>>  static const int SAMPLE_SIZE = (1 << LOG_SAMPLE_SIZE);
>>>>>>  next % SAMPLE_SIZE == 0   // next could be a uint, couldn’t it?
>>>>>>  double time_per_iter = (stop_time - start_time) * (1.0/SAMPLE_SIZE);
>>>>>> 
>>>>>> perhaps rewriting 1.0/SAMPLE_SIZE as another static const as necessary
>>>>>> to get it properly precalculated; I’m not sure what C++ compilers are
>>>>>> doing nowadays.
>>>>> 
>>>>> I am optimizing :) by avoiding % and / operations. But I agree and can
>>>>> add SAMPLE_SIZE constant so code can be more clear.
>>>>> 
>>>>> Thanks,
>>>>> Vladimir
>>>>> 
>>>>>> 
>>>>>> An alternate approach would be to compute the overall rate thus far,
>>>>>> checking at
>>>>>> (say) iterations 4, 8, 16, 32, etc and computing the average from
>>>>>> there, starting
>>>>>> at whatever power of two is a decent fraction of the iterations needed
>>>>>> to timeout.
>>>>>> 
>>>>>> e.g., don’t reset start_time, just compute elapsed at each sample:
>>>>>> 
>>>>>> if ((next & (ptwo - 1) == 0) {
>>>>>>  time.stop();
>>>>>>  double elapsed = time.seconds() - start_time;
>>>>>>  double projected = elapsed * (double) java_objects_length / next;
>>>>>> 
>>>>>> Or you can compute a java_objects_length that would surely lead to
>>>>>> failure, and
>>>>>> always do that (fast, integer) comparison:
>>>>>>  ...
>>>>>>  unsigned int length_bound = (int) next * (CG_BUILD_TIME_LIMIT /
>>>>>> elapsed)
>>>>>>  …
>>>>>>  if (java_objects_length > length_bound) { … timeout
>>>>>> 
>>>>>> David
>>>>>> 
>>>>>> On 2014-10-10, at 11:41 PM, Vladimir Kozlov
>>>>>> <vladimir.kozlov at oracle.com> wrote:
>>>>>> 
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8041984
>>>>>>> http://cr.openjdk.java.net/~kvn/8041984/webrev/
>>>>>>> 
>>>>>>> Next method puts C2's EA to its knees:
>>>>>>> 
>>>>>>> com.graphhopper.coll.GHLongIntBTree$BTreeEntry::put()
>>>>>>> 
>>>>>>> https://github.com/karussell/graphhopper/blob/master/core/src/main/java/com/graphhopper/coll/GHLongIntBTree.java
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> It shows that current EA connection graph build code is very
>>>>>>> inefficient when a graph has a lot of connections (edges).
>>>>>>> 
>>>>>>> With inlining the BTreeEntry::put() method has about 100 allocations
>>>>>>> from which about 80 are arrays allocations. Most of them are object
>>>>>>> arrays which are generated by Arrays.copyOf(). After number of edges
>>>>>>> in EA Connection Graph reaches about 50000 it takes more then 1 sec
>>>>>>> to find all nodes which reference an allocation.
>>>>>>> 
>>>>>>> Current EA code has bailout by timeout from graph building code. But
>>>>>>> the timeout (30 sec for product VM) is checked only after all object
>>>>>>> nodes are processed. So it may take > 2 mins before EA is aborted:
>>>>>>> 
>>>>>>> <phase name='connectionGraph' nodes='18787' live='10038'
>>>>>>> stamp='4.417'>
>>>>>>> <connectionGraph_bailout reason='reached time limit'/>
>>>>>>> <phase_done name='connectionGraph' nodes='18787' live='10038'
>>>>>>> stamp='150.967'/>
>>>>>>> 
>>>>>>> Also ConnectionGraph::add_java_object_edges() method does not check
>>>>>>> the presence of a node when it adds new one to the worklist (in
>>>>>>> add_to_worklist() method). Which leads to hundreds MB of used by EA
>>>>>>> memory when BTreeEntry::put() is processed.
>>>>>>> 
>>>>>>> To address issues new timeout checks were added.
>>>>>>> New _pidx index field is added to graph's nodes and VectorSet is used
>>>>>>> to reduce size of worklist in add_java_object_edges() method.
>>>>>>> 
>>>>>>> Escaping node CreateEx is mapped phantom_obj to reduce number of
>>>>>>> process objects in connection graph. We do the same for other
>>>>>>> escaping objects.
>>>>>>> 
>>>>>>> Tested with graphhopper server application:
>>>>>>> 
>>>>>>> <phase name='connectionGraph' nodes='18787' live='10038'
>>>>>>> stamp='6.829'>
>>>>>>> <connectionGraph_bailout reason='reached time limit'/>
>>>>>>> <phase_done name='connectionGraph' nodes='18787' live='10038'
>>>>>>> stamp='22.355'/>
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>> 
>>