[9] RFR(S): 8155608: String intrinsic range checks are not strict enough

Sat Apr 30 00:47:26 UTC 2016

> On Apr 29, 2016, at 1:00 PM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
> 
> 
>> On Apr 28, 2016, at 9:23 PM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
>> 
>> Hi Chris,
>> 
>> On 28.04.2016 22:11, Christian Thalinger wrote:
>>> 
>>>> On Apr 28, 2016, at 12:45 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>> 
>>>> Hi
>>>> 
>>>> please review the following patch:
>>>> 
>>>> https://bugs.openjdk.java.net/browse/JDK-8155608
>>>> http://cr.openjdk.java.net/~thartmann/8155608/jdk/webrev.00/
>>> + checkBoundsOffCount(dstOff << 1, len << 1, dst.length);
>>> 
>>> It’s funny that we still do << 1 instead of * 2 when every compiler on this planet can optimize that.   Yeah, yeah, I know, it’s because of the interpreter but does it really matter?
>> 
>> I used it more for consistency because we use "<< 1" in all the other places in StringLatin1, StringUTF16 and String as well. I think this originated from the "value.length >> String.coder()" use case to get the length depending on the String encoding. Besides that, I'm not sure if interpreter speed really matters here but the String methods are executed a lot (especially during startup).
>> 
>>> Actually, I would prefer:
>>> 
>>> + checkBoundsOffCount(dstOff * Character.BYTES, len * Character.BYTES, dst.length);
>> 
>> I agree that this is more readable but for consistency I would like to go with the "<< 1" approach.
> 
> Again, a loss for maintainability versus consistency (a.k.a. “we’ve always done it this way”).

That '1' is an anti-pattern I call "naked constant".  Which "1" is it, after all?
The maintainer of code needs to know which condition (of thousands of possibilities) dictates a "1" here.
In HotSpot we discourage such nakedness, in favor of named constants:
  https://wiki.openjdk.java.net/display/HotSpot/StyleGuide#StyleGuide-NamedCons

The Java code should follow this practice also.

Also, independently, I'm having a hard time figuring out how to prove that, between the Java code and the HotSpot intrinsic (and interpreter/C1/C2), the range checking logic is consistently applied.

My best advice for this is, always, put the range checking logic in one place, in Java code.  This makes me profoundly suspicious of *any* public method that is also marked @HSIC, if it takes any array arguments.  If the public method is an intrinsic, it means we are trusting C++ IR-assembly code to securely check Java array bounds.  That is unnatural, and (as history proves repeatedly) subject to errors.

Instead, we should have a Java routine which checks bounds, and a *private* intrinsic which is provably called *only after* the Java checker is called.

— John