RFR(S): 8215792: AArch64: String.indexOf generates incorrect result

Pengfei Li (Arm Technology China) Pengfei.Li at arm.com
Thu Jan 3 09:42:30 UTC 2019


Hi,

This is a patch to fix an AArch64 string intrinsics issue. It can be reproduced by below code and JVM options.

public class Test {
  public static void main(String[] args) {
    StringBuilder str = new StringBuilder("ABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890123456789");
    str.setLength(str.length() - 10);
    System.out.println(str.indexOf("01234567890123456789"));
  }
}

$ java Test
-1
$ java -Xcomp -XX:-Inline Test
26

In the case above, we firstly have a long string "ABC...Z012...9012...9" (hereinafter called "the main string") and then truncate it by removing its last 10 characters. After doing this, we can incorrectly find the pattern string ("012...9012...9") inside the main string. This bug is caused by the boundary of the main string not being checked while working on the matching in AArch64 String.indexOf() intrinsics.

In the intrinsic implementation, we firstly find indexes of the first character of the pattern string (0x30 in this case) inside the main string. Each of the indexes could be a potential return value of the String.indexOf() method. And then for each index value, we compare the remaining characters inside the two strings. In this step, as Java strings in memory do not necessarily end with '\0' like C strings, we should explicitly check if the length of the remaining part of the main string is shorter than that of the pattern string.

In my fix, the length of the remaining part of the main string is calculated after we found a first-character-match. The length value is put into the ch2 register (as it can be used as a temp according to the code context) and then compared to the length of the pattern string (in cnt1). The compare and branch code is like below.

__ cmp(ch2, cnt1);
__ br(__ LT, NOMATCH);

Here we directly branch to the NOMATCH label since if the remaining part of the main string has fewer characters, there would not be any other pattern string match after current first-character-match index.

The length calculation and compare code is added at two positions in my patch, as there are two different first-character-match exits (L_HAS_ZERO and L_SMALL_HAS_ZERO) in the original intrinsic code. I also fixed the cnt2 value (which is used to count the number of bytes not processed in the main string) as well as some branch conditions in my patch. Because cnt2 always counts one more byte than the actual length. Fixing that makes the number of remaining bytes in the main string easier to be calculated.

JBS: https://bugs.openjdk.java.net/browse/JDK-8215792
webrev: http://cr.openjdk.java.net/~pli/rfr/8215792/webrev.00/

Could anyone help review this fix?

--
Thanks,
Pengfei



More information about the hotspot-compiler-dev mailing list