RFR(S): 8215792: AArch64: String.indexOf generates incorrect result

Thu Jan 10 15:10:55 UTC 2019

Hi Andrew,

I’ll focus on addressing your technical questions about testing this 
patch and intrinsic first.

By 'Regular' in previous email I meant all JCK and current jtreg tests 
which were also run [1]. This was to highlight the difference with 
IndexOfTest and IndexOfSameTest tests [2] developed for this intrinsic 
which was run for the original webrev and this patch. These tests cover 
all combinations of strings and substring lengths up to a specified 
length (IndexOfTest uses unique characters as padding and 
IndexOfSameTest is using first character from pattern to have cases with 
partial match tested). These tests are parameterized and require 
source_size parameter as first argument. Calling it with 30 and 300 
results in testing all modified indexof algorithms by brute force:

A. pattern size = 1: covered if source_size >= 1
B. pattern size = 2: covered if source_size >= 2
C. pattern size = 3: covered if source_size >= 3
D. special case with pattern size = 4: algorithm wasn't changed within 
the original webrev
E. pattern_size in [8, 256) and pattern_size < source_size/4: 
Boyer-Moore implementation: covered if source_size > 32. This is the 
algorithm that has the comment you mentioned in your email.
F. "pattern_size in [5, 8)" or "pattern_size in [8, 15] and pattern_size 
 >= source_size/4": Simple linear search, which loads and compares 
char-by-char: covered if source_size in [5, 32]
G. This is the one added by me. "pattern_size in [16, 256) and 
pattern_size >= source_size/4" or pattern_size >= 256. Block linear 
search (loads data by 8 byte chunks in search of first symbol): covered 
if source_size >= 16.

Below is listing of all branches in algorithm G and coverage test cases 
with sample parameter values when the branches are taken (test name, 
test parameter, pattern_size and expected_index inputs used in test 
during iterations, which leads to each branch 
taken/not_taken(fallthrough) at least once. Suffix U/L/UL denotes 
different encoding cases, where U = UTF-16 source string, L = Latin1 
source string, UL = both cases).

line 4399: __ br(__ LE, L_SMALL);                    // IndexOfTest with 
parameter 300: taken(UL): pattern_size=298, expected_index=0. Not 
taken(UL): pattern_size=250, expected_index=0
line 4410: __ br(__ NE, L_HAS_ZERO);                 // IndexOfTest with 
parameter 300: taken(UL): pattern_size=250, expected_index=0. Not 
taken(UL): pattern_size=250, expected_index=-1
line 4414: __ br(__ LT, L_POST_LOOP);                // IndexOfTest with 
parameter 300: taken(U):  pattern_size=295, expected_index=5. Not 
taken(U):  pattern_size=250, expected_index=8
                                                      // IndexOfTest 
with parameter 300: taken(L):  pattern_size=290, expected_index=8. Not 
taken(L):  pattern_size=250, expected_index=8
line 4421: __ br(__ NE, L_HAS_ZERO);                 // IndexOfTest with 
parameter 300: taken(U):  pattern_size=250, expected_index=5. Not 
taken(U):  pattern_size=250, expected_index=8
                                                      // IndexOfTest 
with parameter 300: taken(L):  pattern_size=250, expected_index=8. Not 
taken(L):  pattern_size=250, expected_index=16
line 4426: __ br(__ GE, L_LOOP);                     // IndexOfTest with 
parameter 300: taken(U):  pattern_size=250, expected_index=20. Not 
taken(U):  pattern_size=290, expected_index=8
                                                      // IndexOfTest 
with parameter 300: taken(L):  pattern_size=250, expected_index=30. Not 
taken(L):  pattern_size=280, expected_index=16
line 4429: __ br(__ LE, NOMATCH);                    // IndexOfTest with 
parameter 300: taken(U):  pattern_size=293, expected_index=-1. Not 
taken(U):  pattern_size=290, expected_index=-1
                                                      // IndexOfTest 
with parameter 300: taken(L):  pattern_size=285, expected_index=-1. Not 
taken(L):  pattern_size=280, expected_index=-1
line 4455: __ br(__ EQ, NOMATCH);                    // IndexOfTest with 
parameter 300: taken(UL): pattern_size=298, expected_index=-1. Not 
taken(UL): pattern_size=298, expected_index=0
line 4459: __ br(__ LE, L_SMALL_CMP_LOOP_LAST_CMP2); // this branch is 
not reached with current performance heuristic for algorithm selection 
(see MacroAssembler_aarch64.cpp:4599). It was also tested with heuristic 
disabled to keep algorithm generic and allow changes to heuristics
line 4478: __ br(__ NE, L_SMALL_CMP_LOOP_NOMATCH);   // IndexOfSameTest 
with parameter 300: taken(UL): pattern_size=298, expected_index=-1. Not 
taken(UL): pattern_size=298,expected_index=0
line 4486: __ br(__ GE, L_SMALL_CMP_LOOP_LAST_CMP);  // IndexOfTest with 
parameter 300: taken(UL): pattern_size=298, expected_index=0. Not 
taken(UL): pattern_size=298, expected_index=-1
line 4488: __ br(__ EQ, L_SMALL_CMP_LOOP);           // IndexOfTest with 
parameter 300: taken(UL): pattern_size=298, expected_index=0. Not 
taken(UL): pattern_size=298, expected_index=-1
line 4490: __ cbz(tmp2, NOMATCH);                    // IndexOfSameTest 
with parameter 300: taken UL: pattern_size=298, expected_index=-1. Not 
taken(UL): pattern_size=298, expected_index=0
line 4498: __ br(__ NE, L_SMALL_CMP_LOOP_NOMATCH);   // IndexOfTest with 
parameter 300: taken(UL): pattern_size=298, expected_index=-1. Not 
taken(UL): pattern_size=298, expected_index=0
line 4519: __ br(__ NE, L_SMALL_CMP_LOOP_NOMATCH);   // this branch is 
not reached with current performance heuristic for algorithm selection 
(see MacroAssembler_aarch64.cpp:4599). It was also tested with heuristic 
disabled to keep algorithm generic and allow changes to heuristics
line 4533: __ br(__ GE, L_CMP_LOOP_LAST_CMP2);       // this branch is 
not taken with current performance heuristic for algorithm selection 
(see MacroAssembler_aarch64.cpp:4599). It was also tested with heuristic 
disabled to keep algorithm generic and allow changes to heuristics
line 4557: __ br(__ NE, L_CMP_LOOP_NOMATCH);         // IndexOfTest with 
parameter 300: taken(UL): pattern_size=250, expected_index=1. Not 
taken(UL): pattern_size=250, expected_index=0
line 4565: __ br(__ GE, L_CMP_LOOP_LAST_CMP);        // IndexOfTest with 
parameter 300: taken(UL): pattern_size=250, expected_index=0. Not 
taken(UL): pattern_size=250, expected_index=-1
line 4567: __ br(__ EQ, L_CMP_LOOP);                 // IndexOfTest with 
parameter 300: taken(UL): pattern_size=250, expected_index=0. Not 
taken(UL): pattern_size=250, expected_index=-1
line 4570: __ cbz(tmp2, L_HAS_ZERO_LOOP_NOMATCH);    // IndexOfSameTest 
with parameter 300: taken(UL): pattern_size=250, expected_index=20. Not 
taken(UL): pattern_size=250, expected_index=0
line 4577: __ br(__ NE, L_CMP_LOOP_NOMATCH);         // IndexOfSameTest 
with parameter 300: taken(UL): pattern_size=250, expected_index=-1. 
IndexOfTest with parameter 300: Not taken(UL): pattern_size=250, 
expected_index=0
line 4601: __ br(__ NE, L_CMP_LOOP_NOMATCH);         // this branch is 
not reached with current performance heuristic for algorithm selection 
(see MacroAssembler_aarch64.cpp:4599). It was also tested with heuristic 
disabled to keep algorithm generic and allow changes to heuristics

source_size = 0 is covered by a pre-condition and the intrinsic is not 
called.

I referenced this test in initial review request for this intrinsic. It 
takes a long time to run, so I did not include it in the webrev. I'm 
going to update the webrev to include a subset of this test as jtreg.

Even brute force tests with 100% code coverage don't guarantee 100% 
correctness. The search-garbage-after-string test case for "algorithm G" 
and StringBuilder::setLength usage is a good catch by Stefan and 
Pengfei. And recent webrev addresses this case. I also tested a case 
symmetric to Pengfei's case checking that no "garbage" is read before 
specified source string [4]. I also am going to include it in the webrev.

Indeed it is hard to review complex algorithms. The Boyer-Moore comments 
you referenced were updated as part of the original webrev to describe 
changes in algorithm E, which is in macroAssembler_aarch64.cpp. I once 
asked to validate the level of comments with you during pow function 
review [3]. If this is the level of comments you find reasonable, I’ll 
be happy to improve it here and elsewhere to this level.

Once again, this is to address your question around testing for this 
intrinsic and patch. We are working on testing and review complex 
intrinsics to handle the wider problem of ensuring better quality of 
AArch64 intrinsics. We’ll follow up in a different email on that.

-Dmitrij

[1] all JCK, hotspot jtreg and jdk tier1-tier3 jtreg tests, including 
http://hg.openjdk.java.net/jdk/jdk/file/tip/test/hotspot/jtreg/compiler/intrinsics 
and http://hg.openjdk.java.net/jdk/jdk/file/tip/test/jdk/java/lang/String/
[2] http://cr.openjdk.java.net/~dpochepk/8189103/IndexOfTest.java, 
http://cr.openjdk.java.net/~dpochepk/8189103/IndexOfSameTest.java
[3] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-October/031092.html
[4] http://cr.openjdk.java.net/~dpochepk/8215792/IndexOfBeforeTest.java

On 09/01/2019 6:55 PM, Andrew Dinn wrote:
> Hello Dmitrij,
>
> On 09/01/2019 14:50, Dmitrij Pochepko wrote:
>> here is my version of this patch consisting of single "sub" instruction
>> (I haven't changed test):
>> http://cr.openjdk.java.net/~dpochepk/8215792/webrev.01/
>>
>> cnt2 is a counter for characters yet to be checked. So, instead of
>> checking all characters in source string for first character match
>> (which was initial reason for this bug), now it check (str2len - str1len
>> + 1).
> That looks like a simpler fix than Pengfei's although I think his is
> also correct. However, when I say 'correct' note that I can only make
> that judgement relative to this current bug. I have no confidence that
> there are no other bugs in your code.
>
>> Actually I think this "sub" instruction was initially lost while working
>> on this instrinsic and moving this instruction between this block
>> (generate_string_indexof_linear) and caller code. Regular tests couldn't
>> catch this problem.
> That's a somewhat contentious and, I would suggest, dubious statement.
> If you design code based on some algorithm -- especially a complex one
> like the one employed here -- then you need to put at least as much work
> into designing tests that check for problems in the encoding of that
> algorithm as you put into the code. 'Regular' is rather a weasel word to
> use at this point when it is clear that the test provision was not adequate.
>
> Having looked at your code I am at a loss to see how it is accurately
> described by the piece of C code -- i.e. the original Boyer-Moore
> algorithm -- that sits in macroAssembler_aarch64.cpp and purports to
> explain it. As happened with the trig/log code, your code actually
> follows an algorithm that is significantly more complex that that C
> original. Also, once again, it employs various coding tricks that are
> not explained at all. The latter can be understood with study but proper
> commenting would make maintenance and bug-fixing much easier and
> quicker. This is exactly the same problem and just as major a problem as
> it was with the trig/log code for *all* the same reasons.
>
>> I run some testing to ensure regular usecases are not affected and it
>> seems fine. Affected testcase and your test pass as well.
> 'some testing'? I'd really like to have full details of those tests.
> Ideally, they should be comprehensive. That really means they should
> come with a test plan that identifies all the different possible paths
> through the code and provides a measure of the coverage the tests
> actually provide that is high enough to instil some confidence in the
> testing. There are indeed quite a few such paths (not just in the stubs
> but also the intrinsics that cover the small cases) so I would expect
> the test plan and test suite to be fairly large. Do you have such a test
> plan and suite?
>
> Given your previous lack of success at testing your own code I'm not at
> all happy to accept your say so that 'oh, the code is fine'. I'm
> currently more inclined to ask you to revert your first patch and go
> back to the original Boyer-Moore code we had before you injected this
> bug (and who knows what others?).
>
>> btw: now this code is even faster, because less characters will be
>> loaded and checked
> Well, of course, you could make it even faster by deleting half the
> code. If you don't place too much priority on correctness you can
> achieve incredible performance.
>
> Unfortunately, speed has to be secondary to correctness. So, you need to
> stop concentrating on shaving cycles and concentrate on writing
> readable, maintainable code that clearly implements a well-defined
> algorithm. Can you provide any credible assurance that this code is
> worth keeping? If not then I'd personally recommend reversion of all
> your changes. Of course, I'll see what Andrew Haley has to say before
> pressing for that action.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander