[9] RFR(S): 8159016: Over-unrolled loop is partially removed
Vladimir Kozlov
vladimir.kozlov at oracle.com
Tue Jun 21 19:59:50 UTC 2016
Thank you, Tobias
Your changes are good to have.
You can still have a problem if unrolling is done before CCP phase which calculates array size and/or limit.
Try to delay size calculation in test until CCP phase. Something like next (please, verify):
// Simplified during CCP:
int size = 2;
for (; size < 4; size *= 2);
Object arr[] = new Object[size - 1];
Why over-unrolling check code in IdealLoopTree::policy_unroll() does not work?
I think the trip count check before main loop should collapse whole main loop code. Then we will not have this problem.
Why it is not collapsed?
About test.
You do & 3 two times:
int lim = (arg & 3);
test(i & 3);
Use 11_000 iterations because 10000 may not trigger compilation.
Let array escape to avoid EA. For example, return it.
Thanks,
Vladimir
On 6/21/16 8:19 AM, Tobias Hartmann wrote:
> Hi,
>
> please review the following fix:
>
> https://bugs.openjdk.java.net/browse/JDK-8159016
> http://cr.openjdk.java.net/~thartmann/8159016/webrev.00/
>
> I was able to reproduce the problem with a simple test:
>
> Object arr[] = new Object[3];
> int lim = (arg & 3);
> for (int i = 0; i < lim; ++i) {
> arr[i] = new Object();
> }
>
> The loop is split into pre, main and post loops. The pre loop is executed for at least one iteration, initializing arr[0]. The main loop is unrolled twice, initializing at least arr[1], arr[2], arr[3] and arr[4]. The arr[3] and arr[4] stores are always out of bounds and removed. However, C2 is unable to remove the "over-unrolled", dead main loop. As a result, there is a control path from the main loop to the PostScalarRce loop (see JDK-8151573) without a memory path since the last store was replaced by TOP. We crash because we use a memory edge from a non-dominating region.
>
> The out of bound stores in the over-unrolled loop are removed because their range check dependent CastII nodes are removed by the first optimization in CastIINode::Ideal():
> CastII (AddI Phi 2) -> AddI (CastII Phi) 2 -> AddI TOP 2
>
> The CastIINodes are replaced by TOP because the type of the loop Phi is >= 1 and the CastII is [1..2], i.e. there is no value >= 1 that is in the [1..2] range if +2 is added.
>
> The underlying problem is the over-unrolling of the loop. Since lim < 3, we always only execute at least 3 loop iterations. With the pre loop executing at least one iteration, the main loop should not be unrolled more than once. This problem is similar to JDK-4834191 where we over-unrolled a loop with a constant limit.
>
> I changed the implementation of IdealLoopTree::compute_exact_trip_count() to not only compute the exact trip count if the loop limit is constant but to also set the _trip_count field if we are able to determine an upper bound. Like this, the checks in IdealLoopTree::policy_unroll() prevent additional loop unrolling if the trip_count became too small and we keep the maximally unrolled version.
>
> An alternative fix would be to disable the CastII optimization for range check dependent CastII nodes but this does not fix the underlying problem of over-unrolling.
>
> Tested with regression test, JPRT and RBT (running).
>
> Thanks,
> Tobias
>
More information about the hotspot-compiler-dev
mailing list