RFR: 8254807: Optimize startsWith() for String.substring() [v5]
Xin Liu
xliu at openjdk.java.net
Wed Dec 16 03:57:55 UTC 2020
On Tue, 15 Dec 2020 22:03:47 GMT, Xin Liu <xliu at openjdk.org> wrote:
>> Thanks for the feedbacks. They are extremely valuable for me. I am working on the issues that Vladimir Ivanov's comment about 'late inlining skips the overloaded methods" and the bug of "too short substring" from John's comment.
>>
>> I just want to say that it's an opener PR. I try sell a technique I called it "API simplification". From my observation, many java code can be more efficient if Java programmers just know "one more API". Even they’ve known, we can't blame them for the hindsight because sometimes certain patterns such as s.substring(1).startsWith("a") only emerge after deep inlining.
>>
>> Api simplification is here for help them out. I guess the real gain will be seen in this split case, even though it follows the same recipe here.
>> https://github.com/navyxliu/StringFunc/blob/master/src/com/amazon/jdkteam/brownbag/SplitAndPick.java#L57
>>
>> For the startsWith case, I did spend some time to research it. C2 can eliminate AllocateNode of newstring indeed, but I found c2 failed to eliminate two expensive nodes: AllocateArray and ArrayCopy for the underlying byte buffer. I don’t think this problem can be perfectly solved by EA and Scalar Replacement. Essentially, it’s redundancy and no new object should be allocated all all.
>
>> _Mailing list message from [Vladimir Ivanov](mailto:vladimir.x.ivanov at oracle.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_
>>
>> > > - You introduce a call to a method (String::startsWith(String, int))
>> > > which is not present in bytecode. It means (unless the method is called
>> > > from different places) there won't be any profiling info present, and
>> > > even if there is, it is unrelated to the call site being compiled. If
>> > > the code is recompiled at some point (e.g., it hits an uncommon trap in
>> > > startsWith(String,int) overload), there won't be any re-profiling
>> > > happening. So, it can hit the very same condition on the consequent
>> > > recompilations.
>> >
>> >
>> > Thank you to raise this! I didn't consider the profiling data before.
>> > it is an issue. Maybe it can explain why I observe only 80% performance of hand-craft code.
>> > TBH, I haven't had a clear and solid answer for it yet.
>> > The transformation actually won't alter control-flow. Perhaps, I can find a way to amend methoddata from the old method.
>> > Or, is that possible I request to reprofile it after I modify some bytecodes?
>>
>> Sorry, I overlooked your question.
>>
>> To be clear: the transformations during C2 compilation don't affect
>> bytecodes in any way.
>>
>> If a method is inlined, a deoptimization may lead to a surprising
>> resumption of execution in the middle of unexpected method, but it's a
>> one time occurrence and consequent executions will follow original path
>> in the bytecode.
>>
> Hi, Vladmir,
>
> I manage to inline the rewired method in c5a25bf. I attempt to address your question "Will the new target
> method be taken into account during inlining?". From my benchmark, it is 10% faster than the last revision.
>
> Sorry, I still don't understand your concern about the inlined methods. This is my first non-bugfix patch. please forgive me if I look dumb.
> If startsWith(String, int) is deoptimized, the control will give back to the interpreter. IMHO, I can resume execution back to the interpreter as long as jvms are all correct. I don't think I mess them up.
>
>> If inlining doesn't happen, then the JVM should be able to finish
>> successully finish profiling in the callee if needed (reprofile request
>> invalidates the nmethod and it forces the new call in C2 to go through
>> either interpreter or C1 until it is compiled with C2).
>>
>> So, I'm concerned specifically about inlined case. And I consider it a
>> blocking issue for adapting the proposed approach of rewriting rules in
>> C2. I don't see a way to reliably fix it without intrinsifying the
>> operation.
>>
> I do mark substring and startsWith as intrinsics here.
> The reason I didn't mark "substring" pure because all nodes will be eliminated after late inlining if they are dead.
> yes, if I mark it pure, it will drop the call node immediately in do_late_inline().
>
>> Best regards,
>> Vladimir Ivanov
>
> Hi, John,
> In my defense, this patch is just 300 lines of change if you exclude test. You can see that I try to reuse the existing codebase to minimize foosteps. As a beginner, I just deliberately make it reviewable-size. I don't think it's non-maintainable or a scramble patch.
>
> Yes, I admit that the gain is not so attractive so far. it just opens a possibility. A possibility of interprocedural optimization of java APIs. I can also apply the same recipe from "StringBuilder.append(substring)" to "StringBuilder.append(base, begin, end)" etc.
>
> The bottom line is I have to learn in action. All your feedbacks here are helping me to grow. I really appreciate that and I think I can return it in other patches. I plan to explore in Vladimir kolzov's direction next. We can rewire the underlying bytebuffer of a string in substring pattern. Of course, it's only safe to do so if EA tells us that the object never escape.
hi, Vladimir,
Now I understand what you mean. vframeArray::unpack_to_stack() resumes vframe which points to the new overloaded method. It doesn't match the original bytecode and the entry of cpcache! It might require appalling hack to make it work. I guess that's why you guys all say it's not the right direction.
-------------
PR: https://git.openjdk.java.net/jdk/pull/974
More information about the hotspot-compiler-dev
mailing list