[9] RFR(M) 8041984: CompilerThread seems to occupy all CPU in a very rare situation
Igor Veresov
igor.veresov at oracle.com
Wed Oct 22 06:05:13 UTC 2014
Alright, since nobody is taking on it.. The timeout feature looks good, I suppose a JIT needs to be adaptive to available CPU resources.
A couple of small questions:
1. Why is it necessary to add phantom_obj to worklists?
130 ptnodes_worklist.append(phantom_obj);
131 java_objects_worklist.append(phantom_obj);
2. What does pidx_bias do here?
399 inline void add_to_worklist(PointsToNode* pt) {
400 PointsToNode* ptf = pt;
401 uint pidx_bias = 0;
402 if (PointsToNode::is_base_use(pt)) {
403 ptf = PointsToNode::get_use_node(pt)->as_Field();
404 pidx_bias = _next_pidx;
405 }
406 if (!_in_worklist.test_set(ptf->pidx() + pidx_bias)) {
407 _worklist.append(pt);
408 }
409 }
Thanks,
igor
On Oct 21, 2014, at 11:32 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> Please, I need reviews.
>
> Thanks,
> Vladimir
>
> On 10/14/14 4:52 PM, Vladimir Kozlov wrote:
>> I updated changes.
>>
>> http://cr.openjdk.java.net/~kvn/8041984/webrev.01/
>>
>> As David suggested I added SAMPLE_SIZE constant:
>>
>> #define SAMPLE_SIZE 4
>> if ((next % SAMPLE_SIZE) == 0) {
>> // Each 4 iterations calculate how much time it will take
>> // to complete graph construction.
>> time.stop();
>> double stop_time = time.seconds();
>> double time_per_iter = (stop_time - start_time) /
>> (double)SAMPLE_SIZE;
>> double time_until_end = time_per_iter *
>> (double)(java_objects_length - next);
>> if ((start_time + time_until_end) >= EscapeAnalysisTimeout) {
>>
>> I also replaced TIME_LIMIT constant with flag:
>>
>> product(double, EscapeAnalysisTimeout, 20. DEBUG_ONLY(+40.), \
>> "Abort EA when it reaches time limit (in sec)") \
>>
>> Thanks,
>> Vladimir
>>
>> On 10/14/14 1:40 PM, Vladimir Kozlov wrote:
>>> On 10/14/14 10:30 AM, David Chase wrote:
>>>> I’m curious about the granularity of the completion-prediction —
>>>> you’re checking every 4 iterations and extrapolating from there.
>>>> Do we know that the times are that uniform, or would we be better off
>>>> with larger sample sizes?
>>>
>>> Yes, I observed uniformly slowly growing times for PARTS of iterations
>>> and not all iterations. That is why I want to take several samples.
>>> For example first iterations where fast but when array allocation
>>> objects begin processed time are rapidly growing.
>>> If I take very big sample fast iterations affect the result - delay
>>> abort.
>>>
>>> I did implementation similar to your "alternate approach" suggestion and
>>> I saw that the abort happened much later than with current my approach.
>>>
>>>> (How long does a single iteration take, anyway?)
>>>
>>> As I said one slow iteration may take upto 1 sec. So I want to abort as
>>> soon as I see a such case (but allow spikes). Taking larger sample sizes
>>> may delay an abort for several seconds.
>>>
>>>>
>>>> And in the sample code, I see the choice of sample size (4) encoded as
>>>> (next & 3) == 0
>>>> // Each 4 iterations calculate how much time it will take
>>>> double time_per_iter = (stop_time - start_time) * 0.25;
>>>>
>>>> why not write this as (say)
>>>>
>>>> static const int LOG_SAMPLE_SIZE = 2;
>>>> static const int SAMPLE_SIZE = (1 << LOG_SAMPLE_SIZE);
>>>> next % SAMPLE_SIZE == 0 // next could be a uint, couldn’t it?
>>>> double time_per_iter = (stop_time - start_time) * (1.0/SAMPLE_SIZE);
>>>>
>>>> perhaps rewriting 1.0/SAMPLE_SIZE as another static const as necessary
>>>> to get it properly precalculated; I’m not sure what C++ compilers are
>>>> doing nowadays.
>>>
>>> I am optimizing :) by avoiding % and / operations. But I agree and can
>>> add SAMPLE_SIZE constant so code can be more clear.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> An alternate approach would be to compute the overall rate thus far,
>>>> checking at
>>>> (say) iterations 4, 8, 16, 32, etc and computing the average from
>>>> there, starting
>>>> at whatever power of two is a decent fraction of the iterations needed
>>>> to timeout.
>>>>
>>>> e.g., don’t reset start_time, just compute elapsed at each sample:
>>>>
>>>> if ((next & (ptwo - 1) == 0) {
>>>> time.stop();
>>>> double elapsed = time.seconds() - start_time;
>>>> double projected = elapsed * (double) java_objects_length / next;
>>>>
>>>> Or you can compute a java_objects_length that would surely lead to
>>>> failure, and
>>>> always do that (fast, integer) comparison:
>>>> ...
>>>> unsigned int length_bound = (int) next * (CG_BUILD_TIME_LIMIT /
>>>> elapsed)
>>>> …
>>>> if (java_objects_length > length_bound) { … timeout
>>>>
>>>> David
>>>>
>>>> On 2014-10-10, at 11:41 PM, Vladimir Kozlov
>>>> <vladimir.kozlov at oracle.com> wrote:
>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8041984
>>>>> http://cr.openjdk.java.net/~kvn/8041984/webrev/
>>>>>
>>>>> Next method puts C2's EA to its knees:
>>>>>
>>>>> com.graphhopper.coll.GHLongIntBTree$BTreeEntry::put()
>>>>>
>>>>> https://github.com/karussell/graphhopper/blob/master/core/src/main/java/com/graphhopper/coll/GHLongIntBTree.java
>>>>>
>>>>>
>>>>>
>>>>> It shows that current EA connection graph build code is very
>>>>> inefficient when a graph has a lot of connections (edges).
>>>>>
>>>>> With inlining the BTreeEntry::put() method has about 100 allocations
>>>>> from which about 80 are arrays allocations. Most of them are object
>>>>> arrays which are generated by Arrays.copyOf(). After number of edges
>>>>> in EA Connection Graph reaches about 50000 it takes more then 1 sec
>>>>> to find all nodes which reference an allocation.
>>>>>
>>>>> Current EA code has bailout by timeout from graph building code. But
>>>>> the timeout (30 sec for product VM) is checked only after all object
>>>>> nodes are processed. So it may take > 2 mins before EA is aborted:
>>>>>
>>>>> <phase name='connectionGraph' nodes='18787' live='10038' stamp='4.417'>
>>>>> <connectionGraph_bailout reason='reached time limit'/>
>>>>> <phase_done name='connectionGraph' nodes='18787' live='10038'
>>>>> stamp='150.967'/>
>>>>>
>>>>> Also ConnectionGraph::add_java_object_edges() method does not check
>>>>> the presence of a node when it adds new one to the worklist (in
>>>>> add_to_worklist() method). Which leads to hundreds MB of used by EA
>>>>> memory when BTreeEntry::put() is processed.
>>>>>
>>>>> To address issues new timeout checks were added.
>>>>> New _pidx index field is added to graph's nodes and VectorSet is used
>>>>> to reduce size of worklist in add_java_object_edges() method.
>>>>>
>>>>> Escaping node CreateEx is mapped phantom_obj to reduce number of
>>>>> process objects in connection graph. We do the same for other
>>>>> escaping objects.
>>>>>
>>>>> Tested with graphhopper server application:
>>>>>
>>>>> <phase name='connectionGraph' nodes='18787' live='10038' stamp='6.829'>
>>>>> <connectionGraph_bailout reason='reached time limit'/>
>>>>> <phase_done name='connectionGraph' nodes='18787' live='10038'
>>>>> stamp='22.355'/>
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20141021/9baa4839/attachment-0001.html>
More information about the hotspot-compiler-dev
mailing list