[9] RFR(M): 6675699: need comprehensive fix for unconstrained ConvI2L with narrowed type

Vladimir Kozlov vladimir.kozlov at oracle.com
Fri Jan 15 18:25:05 UTC 2016


This looks good.

Thanks,
Vladimir

On 1/15/16 6:28 AM, Tobias Hartmann wrote:
> Thanks, Vladimir.
>
> On 14.01.2016 19:59, Vladimir Kozlov wrote:
>> You have to update code for 8146999 changes when Roland push it.
>
> Yes, I'll do so but Roland mentioned that he still has problems with his 814699 fix.
>
>> The only thing I don't like about changes is using #ifdef _LP64 for part of changes.
>> I know where it is coming from (ConvI2L for loop indexing) but as you said ConvI2L could be generated in other cases too. Should the test cast->has_range_check() return 'false' in 32-bit?
>
> I added the _LP64 ifdefs because we only emit a narrowed ConvI2L on 64 bit. But I agree - it's cleaner without those. As you suggested, I removed the ifdefs and changed has_range_check() to return false on 32 bit.
>
> Here is the new webrev:
> http://cr.openjdk.java.net/~thartmann/6675699/webrev.02/
>
> Thanks,
> Tobias
>
>> On 1/14/16 8:00 AM, Tobias Hartmann wrote:
>>> Hi,
>>>
>>> please review the following patch.
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-6675699
>>> http://cr.openjdk.java.net/~thartmann/6675699/webrev.01/
>>>
>>> *Problem*
>>> The problem is that ConvI2L nodes with a narrow type (used to convert integer array indices to long values) are not dependent on the corresponding range check that proves that the input value is always in the (integer-)range. As a result, the ConvI2L node may flow above the range check during loop optimizations and end up with an input that is not in its type range. The node is then replaced by TOP causing the data path to be eliminated. However, because there is no control dependency on the corresponding range check, the control path from the peeled iteration that uses the result of the ConvI2L may not be eliminated. We crash because we are potentially using a value that is not available.
>>>
>>> For example, TestLoopPeeling::testArrayAccess() triggers loop peeling because the loop contains an invariant check. The array store in line 66 is moved out of the loop and reachable from the peeled and old iterations of the loop. However, the array index computation consisting of a LShiftL(ConvI2L(Phi)) remains in each loop because it has loop variant usages and is not dependent on the range check that was moved out of the loop. The peeled iteration of the loop uses storeIndex == -1 causing the ConvI2L to be replaced by TOP because -1 is not in its [0, MAX_INT] range. The TOP is propagated downwards and ends up as one of the inputs to the Phi that merges the array index from the peeled and old loop exits. The Phi replaced by it's only remaining input and the store ends up using the index from the old iteration although it's still reachable from the peeled iteration. We crash because we potentially use the index value from the old iteration while coming from the peeled it!
 e!
>   r!
>> at!
>>>    ion (of co
>>>
>>> urse, the range check would catch this at runtime).
>>>
>>> This problem may show up with array accesses but also with other code for which we emit a ConvI2L node with a narrow type. For example, array allocation uses a ConvI2L to convert the integer array size to a long value (see TestLoopPeeling::testArrayAllocation). We solved several different instances of this problem in the past with "workaround-fixes" that just disabled loop optimizations in special cases (see below). Such a workaround fix is not feasible to fix all potential occurrences of this problem. TestLoopPeeling.java crashes JDK 7, 8 and 9.
>>>
>>> *Solution*
>>> To make the ConvI2L dependent on a range check, I added code to emit a narrow CastII node with a control dependency on the range check that is then used as input to the ConvI2L. Like this, we explicitly express the dependency and prevent loop optimizations from moving the ConvI2L above the range check.
>>>
>>> To make sure that the impact is as small as possible, the range check dependent CastII nodes are removed right after loop optimizations. Further, all optimizations that depend on the old shape of array address computations are adapted to be aware of the CastII node.
>>>
>>> With the fix, we could now remove the following old "workaround-fixes":
>>> https://bugs.openjdk.java.net/browse/JDK-4781451
>>> https://bugs.openjdk.java.net/browse/JDK-4799512
>>> https://bugs.openjdk.java.net/browse/JDK-6659207
>>> https://bugs.openjdk.java.net/browse/JDK-6663854
>>> For reference, the individual patches can be found here:
>>> http://cr.openjdk.java.net/~thartmann/6675699/backouts/
>>>
>>> However, performance evaluation showed that backing out the old fixes causes significant regressions. It seems that aggressive splitting of ConvI2L nodes through phis leads to less optimal code due to more register spilling. I suspect that additional changes to the loop optimizations are necessary and would therefore like to leave the workaround fixes in for now. I filed JDK-8145313 to remove them later. Like this, we also reduce the impact/risk when backporting this fix to JDK 8 and potentially JDK 7.
>>>
>>> Roland pointed out that the changes in ConvI2LNode::Ideal() could potentially be merged into the CastIINode::Ideal() optimization introduced by his fix for JDK-8145322. After some investigation it turned out that the CastII optimization does not only affect memory addressing but also other CastII(AddI(..)) graph shapes. Making it more generic has a broader impact and therefore needs more investigation. I filed JDK-8147394 for this.
>>>
>>> ConvI2L nodes with a narrow type are also emitted by intrinsics:
>>> - GraphKit::array_element_address()
>>> - PhaseMacroExpand::array_element_address()
>>> - ArrayCopyNode::prepare_array_copy()
>>> I was not able to reproduce the problem with intrinsics. It's also not easily possible to make the CastII node range check dependent here because the range check is not always available from within the intrinsic.
>>>
>>> *Testing*
>>> I did extensive testing to make sure the fix does not introduce correctness or performance issues.
>>> - Different RBT test suites [1] with and without -Xcomp.
>>> - Full run of multiple CTW suites.
>>> - Verified changes in "PhaseIdealLoop::match_fill_loop" (loopTransform.cpp) by manually checking the output of [2] with -XX:+TraceOptimizeFill.
>>> - Verified changes in "IfNode::improve_address_types" (ifnode.cpp) by manually checking the output of [3] with -XX:+PrintOptoAssembly to make sure all range checks are folded.
>>> - Verified changes in superword.cpp by comparing output with -XX:+TraceSuperWord.
>>> - Performance runs (Footprint, JMH-Javac, SPECjbb2005, SPECjvm2008, Startup, Volano) on x86 and SPARC showed no regression
>>>
>>> Thanks,
>>> Tobias
>>>
>>> [1] RBT test suites:
>>> - hotspot/test/:hotspot_all
>>> - noncolo.testlist
>>> - vm.compiler.testlist
>>> - vm.regression.testlist
>>> - nsk.regression.testlist
>>> - nsk.split_verifier.testlist
>>> - nsk.stress.testlist
>>> - nsk.stress.jck.testlist
>>> - jdk/test/:jdk_jfr
>>> - jdk/test/:svc_tools
>>> - jdk/test/:jdk_instrument
>>> - jdk/test/:jdk_lang
>>> - jdk/test/:jdk_svc
>>> - nashorn/test/:tier1
>>> - nashorn/test/:tier2
>>> - nashorn/test/:tier3
>>> Only without -Xcomp:
>>> - Kitchensink
>>> - runThese
>>> - Weblogic12medrec
>>> [2] test/compiler/intrinsics/6982370/Test6982370.java
>>> [3] test/compiler/rangechecks/TestExplicitRangeChecks.java
>>>


More information about the hotspot-compiler-dev mailing list