[9] RFR(M) 8041984: CompilerThread seems to occupy all CPU in a very rare situation
Igor Veresov
igor.veresov at oracle.com
Thu Oct 23 06:13:39 UTC 2014
Thanks for the explanation! It’s a bit clearer now. :) Reviewed.
igor
On Oct 22, 2014, at 1:06 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> Thank you, Igor
>
> On 10/21/14 11:05 PM, Igor Veresov wrote:
>> Alright, since nobody is taking on it.. The timeout feature looks good,
>> I suppose a JIT needs to be adaptive to available CPU resources.
>>
>> A couple of small questions:
>>
>> 1. Why is it necessary to add phantom_obj to worklists?
>>
>> * 130 ptnodes_worklist.append(phantom_obj);
>> 131 java_objects_worklist.append(phantom_obj);*
>
> To avoid duplicated entries for phantom_obj on these lists.
>
> Even so processed ideal nodes are unique on ideal_nodes list several ideal nodes are mapped to phantom_obj:
>
> case Op_CreateEx: {
> // assume that all exception objects globally escape
> map_ideal_node(n, phantom_obj);
> break;
>
> as result ptnode_adr(n->_idx) will return phantom_obj several times and it will be added to above lists.
>
> I added next comment:
>
> // Processed ideal nodes are unique on ideal_nodes list
> // but several ideal nodes are mapped to the phantom_obj.
> // To avoid duplicated entries on the following worklists
> // add the phantom_obj only once to them.
> ptnodes_worklist.append(phantom_obj);
>
>>
>> 2. What does pidx_bias do here?
>>
>> 399 inline void add_to_worklist(PointsToNode* pt) {
>> 400 PointsToNode* ptf = pt;
>> 401 uint pidx_bias = 0;
>> 402 if (PointsToNode::is_base_use(pt)) {
>> 403 ptf = PointsToNode::get_use_node(pt)->as_Field();
>> 404 pidx_bias = _next_pidx;
>> 405 }
>> 406 if (!_in_worklist.test_set(ptf->pidx() + pidx_bias)) {
>> 407 _worklist.append(pt);
>> 408 }
>> 409 }
>
> It is complicated :(
>
> To avoid creating a separate worklist in PointsToNode some edges are encoded specially (low bit set in a pointer):
>
> // Mark base edge use to distinguish from stored value edge.
> bool add_base_use(FieldNode* use) { return _uses.append_if_missing((PointsToNode*)((intptr_t)use + 1)); }
>
> It is used in add_java_object_edges():
>
> PointsToNode* use = _worklist.at(l);
> if (PointsToNode::is_base_use(use)) {
> // Add reference from jobj to field and from field to jobj (field's base).
> use = PointsToNode::get_use_node(use)->as_Field();
> if (add_base(use->as_Field(), jobj)) {
> new_edges++;
>
> So the _worklist may have the same node's pointers, with the bit set and without. add_to_worklist() want to make sure only one of each type is present on the _worklist. As result VectorSet _in_worklist should have 2 separate entries for each type of pointers.
>
> I added new _pidx to PointsToNode because I did not want VectorSet _in_worklist to be big - number of created EA nodes is usually about 10% from number of Ideal nodes. So _next_pidx is the number of all created EA nodes and adding it as bias guarantee the an unique entry in VectorSet.
>
> I will add next comment to add_to_worklist():
>
> if (PointsToNode::is_base_use(pt)) {
> // Create a separate entry in _in_worklist for a marked base edge
> // because _worklist may have an entry for a normal edge pointing
> // to the same node. To separate them use _next_pidx as bias.
> ptf = PointsToNode::get_use_node(pt)->as_Field();
> pidx_bias = _next_pidx;
>
>
> I hope I answered your question :)
>
> Thanks,
> Vladimir
>
>>
>>
>> Thanks,
>> igor
>>
>>
>> On Oct 21, 2014, at 11:32 AM, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>
>>> Please, I need reviews.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 10/14/14 4:52 PM, Vladimir Kozlov wrote:
>>>> I updated changes.
>>>>
>>>> http://cr.openjdk.java.net/~kvn/8041984/webrev.01/
>>>>
>>>> As David suggested I added SAMPLE_SIZE constant:
>>>>
>>>> #define SAMPLE_SIZE 4
>>>> if ((next % SAMPLE_SIZE) == 0) {
>>>> // Each 4 iterations calculate how much time it will take
>>>> // to complete graph construction.
>>>> time.stop();
>>>> double stop_time = time.seconds();
>>>> double time_per_iter = (stop_time - start_time) /
>>>> (double)SAMPLE_SIZE;
>>>> double time_until_end = time_per_iter *
>>>> (double)(java_objects_length - next);
>>>> if ((start_time + time_until_end) >= EscapeAnalysisTimeout) {
>>>>
>>>> I also replaced TIME_LIMIT constant with flag:
>>>>
>>>> product(double, EscapeAnalysisTimeout, 20. DEBUG_ONLY(+40.), \
>>>> "Abort EA when it reaches time limit (in sec)") \
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 10/14/14 1:40 PM, Vladimir Kozlov wrote:
>>>>> On 10/14/14 10:30 AM, David Chase wrote:
>>>>>> I’m curious about the granularity of the completion-prediction —
>>>>>> you’re checking every 4 iterations and extrapolating from there.
>>>>>> Do we know that the times are that uniform, or would we be better off
>>>>>> with larger sample sizes?
>>>>>
>>>>> Yes, I observed uniformly slowly growing times for PARTS of iterations
>>>>> and not all iterations. That is why I want to take several samples.
>>>>> For example first iterations where fast but when array allocation
>>>>> objects begin processed time are rapidly growing.
>>>>> If I take very big sample fast iterations affect the result - delay
>>>>> abort.
>>>>>
>>>>> I did implementation similar to your "alternate approach" suggestion and
>>>>> I saw that the abort happened much later than with current my approach.
>>>>>
>>>>>> (How long does a single iteration take, anyway?)
>>>>>
>>>>> As I said one slow iteration may take upto 1 sec. So I want to abort as
>>>>> soon as I see a such case (but allow spikes). Taking larger sample sizes
>>>>> may delay an abort for several seconds.
>>>>>
>>>>>>
>>>>>> And in the sample code, I see the choice of sample size (4) encoded as
>>>>>> (next & 3) == 0
>>>>>> // Each 4 iterations calculate how much time it will take
>>>>>> double time_per_iter = (stop_time - start_time) * 0.25;
>>>>>>
>>>>>> why not write this as (say)
>>>>>>
>>>>>> static const int LOG_SAMPLE_SIZE = 2;
>>>>>> static const int SAMPLE_SIZE = (1 << LOG_SAMPLE_SIZE);
>>>>>> next % SAMPLE_SIZE == 0 // next could be a uint, couldn’t it?
>>>>>> double time_per_iter = (stop_time - start_time) * (1.0/SAMPLE_SIZE);
>>>>>>
>>>>>> perhaps rewriting 1.0/SAMPLE_SIZE as another static const as necessary
>>>>>> to get it properly precalculated; I’m not sure what C++ compilers are
>>>>>> doing nowadays.
>>>>>
>>>>> I am optimizing :) by avoiding % and / operations. But I agree and can
>>>>> add SAMPLE_SIZE constant so code can be more clear.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>>>
>>>>>> An alternate approach would be to compute the overall rate thus far,
>>>>>> checking at
>>>>>> (say) iterations 4, 8, 16, 32, etc and computing the average from
>>>>>> there, starting
>>>>>> at whatever power of two is a decent fraction of the iterations needed
>>>>>> to timeout.
>>>>>>
>>>>>> e.g., don’t reset start_time, just compute elapsed at each sample:
>>>>>>
>>>>>> if ((next & (ptwo - 1) == 0) {
>>>>>> time.stop();
>>>>>> double elapsed = time.seconds() - start_time;
>>>>>> double projected = elapsed * (double) java_objects_length / next;
>>>>>>
>>>>>> Or you can compute a java_objects_length that would surely lead to
>>>>>> failure, and
>>>>>> always do that (fast, integer) comparison:
>>>>>> ...
>>>>>> unsigned int length_bound = (int) next * (CG_BUILD_TIME_LIMIT /
>>>>>> elapsed)
>>>>>> …
>>>>>> if (java_objects_length > length_bound) { … timeout
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On 2014-10-10, at 11:41 PM, Vladimir Kozlov
>>>>>> <vladimir.kozlov at oracle.com> wrote:
>>>>>>
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8041984
>>>>>>> http://cr.openjdk.java.net/~kvn/8041984/webrev/
>>>>>>>
>>>>>>> Next method puts C2's EA to its knees:
>>>>>>>
>>>>>>> com.graphhopper.coll.GHLongIntBTree$BTreeEntry::put()
>>>>>>>
>>>>>>> https://github.com/karussell/graphhopper/blob/master/core/src/main/java/com/graphhopper/coll/GHLongIntBTree.java
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> It shows that current EA connection graph build code is very
>>>>>>> inefficient when a graph has a lot of connections (edges).
>>>>>>>
>>>>>>> With inlining the BTreeEntry::put() method has about 100 allocations
>>>>>>> from which about 80 are arrays allocations. Most of them are object
>>>>>>> arrays which are generated by Arrays.copyOf(). After number of edges
>>>>>>> in EA Connection Graph reaches about 50000 it takes more then 1 sec
>>>>>>> to find all nodes which reference an allocation.
>>>>>>>
>>>>>>> Current EA code has bailout by timeout from graph building code. But
>>>>>>> the timeout (30 sec for product VM) is checked only after all object
>>>>>>> nodes are processed. So it may take > 2 mins before EA is aborted:
>>>>>>>
>>>>>>> <phase name='connectionGraph' nodes='18787' live='10038'
>>>>>>> stamp='4.417'>
>>>>>>> <connectionGraph_bailout reason='reached time limit'/>
>>>>>>> <phase_done name='connectionGraph' nodes='18787' live='10038'
>>>>>>> stamp='150.967'/>
>>>>>>>
>>>>>>> Also ConnectionGraph::add_java_object_edges() method does not check
>>>>>>> the presence of a node when it adds new one to the worklist (in
>>>>>>> add_to_worklist() method). Which leads to hundreds MB of used by EA
>>>>>>> memory when BTreeEntry::put() is processed.
>>>>>>>
>>>>>>> To address issues new timeout checks were added.
>>>>>>> New _pidx index field is added to graph's nodes and VectorSet is used
>>>>>>> to reduce size of worklist in add_java_object_edges() method.
>>>>>>>
>>>>>>> Escaping node CreateEx is mapped phantom_obj to reduce number of
>>>>>>> process objects in connection graph. We do the same for other
>>>>>>> escaping objects.
>>>>>>>
>>>>>>> Tested with graphhopper server application:
>>>>>>>
>>>>>>> <phase name='connectionGraph' nodes='18787' live='10038'
>>>>>>> stamp='6.829'>
>>>>>>> <connectionGraph_bailout reason='reached time limit'/>
>>>>>>> <phase_done name='connectionGraph' nodes='18787' live='10038'
>>>>>>> stamp='22.355'/>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>
>>
More information about the hotspot-compiler-dev
mailing list