[9] RFR(S): 8155608: String intrinsic range checks are not strict enough

Tue May 10 19:14:54 UTC 2016

> On May 9, 2016, at 10:51 PM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> Hi Chris,
> 
> On 09.05.2016 21:52, Christian Thalinger wrote:
>> 
>>> On May 9, 2016, at 1:49 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>> 
>>> Hi John,
>>> 
>>> thanks for the feedback!
>>> 
>>> On 30.04.2016 02:47, John Rose wrote:
>>>> 
>>>>> On Apr 29, 2016, at 1:00 PM, Christian Thalinger <christian.thalinger at oracle.com <mailto:christian.thalinger at oracle.com>> wrote:
>>>>> 
>>>>> 
>>>>>> On Apr 28, 2016, at 9:23 PM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>>>> 
>>>>>> Hi Chris,
>>>>>> 
>>>>>> On 28.04.2016 22:11, Christian Thalinger wrote:
>>>>>>> 
>>>>>>>> On Apr 28, 2016, at 12:45 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com> <mailto:tobias.hartmann at oracle.com>> wrote:
>>>>>>>> 
>>>>>>>> Hi
>>>>>>>> 
>>>>>>>> please review the following patch:
>>>>>>>> 
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8155608
>>>>>>>> http://cr.openjdk.java.net/~thartmann/8155608/jdk/webrev.00/
>>>>>>> + checkBoundsOffCount(dstOff << 1, len << 1, dst.length);
>>>>>>> 
>>>>>>> It’s funny that we still do << 1 instead of * 2 when every compiler on this planet can optimize that.   Yeah, yeah, I know, it’s because of the interpreter but does it really matter?
>>>>>> 
>>>>>> I used it more for consistency because we use "<< 1" in all the other places in StringLatin1, StringUTF16 and String as well. I think this originated from the "value.length >> String.coder()" use case to get the length depending on the String encoding. Besides that, I'm not sure if interpreter speed really matters here but the String methods are executed a lot (especially during startup).
>>>>>> 
>>>>>>> Actually, I would prefer:
>>>>>>> 
>>>>>>> + checkBoundsOffCount(dstOff * Character.BYTES, len * Character.BYTES, dst.length);
>>>>>> 
>>>>>> I agree that this is more readable but for consistency I would like to go with the "<< 1" approach.
>>>>> 
>>>>> Again, a loss for maintainability versus consistency (a.k.a. “we’ve always done it this way”).
>>>> 
>>>> That '1' is an anti-pattern I call "naked constant".  Which "1" is it, after all?
>>>> The maintainer of code needs to know which condition (of thousands of possibilities) dictates a "1" here.
>>>> In HotSpot we discourage such nakedness, in favor of named constants:
>>>> https://wiki.openjdk.java.net/display/HotSpot/StyleGuide#StyleGuide-NamedCons
>>>> 
>>>> The Java code should follow this practice also.
>>> 
>>> Right, I agree. I filed JDK-8156530 [1] to fix this.
>>> 
>>>> Also, independently, I'm having a hard time figuring out how to prove that, between the Java code and the HotSpot intrinsic (and interpreter/C1/C2), the range checking logic is consistently applied.
>>>> 
>>>> My best advice for this is, always, put the range checking logic in one place, in Java code.  This makes me profoundly suspicious of *any* public method that is also marked @HSIC, if it takes any array arguments.  If the public method is an intrinsic, it means we are trusting C++ IR-assembly code to securely check Java array bounds.  That is unnatural, and (as history proves repeatedly) subject to errors.
>>> 
>>> Yes, we had that initially but I had to move some of the checks from the Java code into the intrinsic because of JDK-8142303 (see [2]).
>>> 
>>> I filed JDK-8156534 [3] to check if we can move the checks back into the Java code without problems similar to JDK-8142303. I will also change the implementation of some of the intrinsics with JDK-8139132 and JDK-8146547 and do some refactoring.
>>> 
>>>> Instead, we should have a Java routine which checks bounds, and a *private* intrinsic which is provably called *only after* the Java checker is called.
>>> 
>>> The problem is that some of the intrinsified String methods are heavily used and C2 is not always able to remove the range checks introduced by the public wrapper method. For example, StringUTF16::getChar() is used in places where we implicitly know that the access is in bounds (for example, because the array length is always a multiple of two) but C2 cannot prove this. 
>> 
>> Maybe we could sprinkle some Java code to help the compilers.
> 
> Yes, but that only helps if the Java code is compiled which may not be the case if the explicit Java range checks are for example in a different method that is not inlined. Otherwise, there is no way the compiler knows about them.

If it’s not compiled than it can’t matter for performance.  Also, you have to start thinking in an AOT world.  Methods like these are likely AOT compiled and should (almost) never run interpreted.

> 
> Looking at the JDK-8142303 again, it may be sufficient to have the null checks in the intrinsic. Also, we can probably make use of the intrinsified Preconditions.checkIndex(). I'll investigate this with JDK-8156534.
> 
> If there are no objections, I would like to push webrev.00.
> 
> Thanks,
> Tobias
> 
>>> Therefore, if we want to avoid a regression, we have to call the unchecked version of the intrinsic in some cases.
>>> 
>>> Thanks,
>>> Tobias
>>> 
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8156530
>>> [2] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-November/019834.html
>>> [3] https://bugs.openjdk.java.net/browse/JDK-8156534
>>