RFR: 8254807: Optimize startsWith() for String.substring() [v5]
Vladimir Kozlov
kvn at openjdk.java.net
Wed Dec 16 21:13:01 UTC 2020
On Wed, 16 Dec 2020 03:54:09 GMT, Xin Liu <xliu at openjdk.org> wrote:
>>> _Mailing list message from [Vladimir Ivanov](mailto:vladimir.x.ivanov at oracle.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_
>>>
>>> > > - You introduce a call to a method (String::startsWith(String, int))
>>> > > which is not present in bytecode. It means (unless the method is called
>>> > > from different places) there won't be any profiling info present, and
>>> > > even if there is, it is unrelated to the call site being compiled. If
>>> > > the code is recompiled at some point (e.g., it hits an uncommon trap in
>>> > > startsWith(String,int) overload), there won't be any re-profiling
>>> > > happening. So, it can hit the very same condition on the consequent
>>> > > recompilations.
>>> >
>>> >
>>> > Thank you to raise this! I didn't consider the profiling data before.
>>> > it is an issue. Maybe it can explain why I observe only 80% performance of hand-craft code.
>>> > TBH, I haven't had a clear and solid answer for it yet.
>>> > The transformation actually won't alter control-flow. Perhaps, I can find a way to amend methoddata from the old method.
>>> > Or, is that possible I request to reprofile it after I modify some bytecodes?
>>>
>>> Sorry, I overlooked your question.
>>>
>>> To be clear: the transformations during C2 compilation don't affect
>>> bytecodes in any way.
>>>
>>> If a method is inlined, a deoptimization may lead to a surprising
>>> resumption of execution in the middle of unexpected method, but it's a
>>> one time occurrence and consequent executions will follow original path
>>> in the bytecode.
>>>
>> Hi, Vladmir,
>>
>> I manage to inline the rewired method in c5a25bf. I attempt to address your question "Will the new target
>> method be taken into account during inlining?". From my benchmark, it is 10% faster than the last revision.
>>
>> Sorry, I still don't understand your concern about the inlined methods. This is my first non-bugfix patch. please forgive me if I look dumb.
>> If startsWith(String, int) is deoptimized, the control will give back to the interpreter. IMHO, I can resume execution back to the interpreter as long as jvms are all correct. I don't think I mess them up.
>>
>>> If inlining doesn't happen, then the JVM should be able to finish
>>> successully finish profiling in the callee if needed (reprofile request
>>> invalidates the nmethod and it forces the new call in C2 to go through
>>> either interpreter or C1 until it is compiled with C2).
>>>
>>> So, I'm concerned specifically about inlined case. And I consider it a
>>> blocking issue for adapting the proposed approach of rewriting rules in
>>> C2. I don't see a way to reliably fix it without intrinsifying the
>>> operation.
>>>
>> I do mark substring and startsWith as intrinsics here.
>> The reason I didn't mark "substring" pure because all nodes will be eliminated after late inlining if they are dead.
>> yes, if I mark it pure, it will drop the call node immediately in do_late_inline().
>>
>>> Best regards,
>>> Vladimir Ivanov
>>
>> Hi, John,
>> In my defense, this patch is just 300 lines of change if you exclude test. You can see that I try to reuse the existing codebase to minimize foosteps. As a beginner, I just deliberately make it reviewable-size. I don't think it's non-maintainable or a scramble patch.
>>
>> Yes, I admit that the gain is not so attractive so far. it just opens a possibility. A possibility of interprocedural optimization of java APIs. I can also apply the same recipe from "StringBuilder.append(substring)" to "StringBuilder.append(base, begin, end)" etc.
>>
>> The bottom line is I have to learn in action. All your feedbacks here are helping me to grow. I really appreciate that and I think I can return it in other patches. I plan to explore in Vladimir kolzov's direction next. We can rewire the underlying bytebuffer of a string in substring pattern. Of course, it's only safe to do so if EA tells us that the object never escape.
>
> hi, Vladimir,
>
> Now I understand what you mean. vframeArray::unpack_to_stack() resumes vframe which points to the new overloaded method. It doesn't match the original bytecode and the entry of cpcache! It might require appalling hack to make it work. I guess that's why you guys all say it's not the right direction.
@navyxliu during your investigation did you look on other applications in addition to `SpecJVM2008/xml.validation`?
I would like to know what patterns you see. For example, we would not benefit much if an extracted substring is used after startsWith() call. Also I don't think we should optimize Java code which can be replaced with existing String API as in RFE's example.
Note, in RFE's example you added length check which is not needed because startsWith(prefix, offset) does it:
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L1455
And if you see a lot of cases like `return substring(base, beg, end).startsWith(prefix)` or code where substring is used based on startsWith() check:
String sub=substring(base, beg, end);
if (sub.startsWith(prefix)) {
// Do something with sub
we should consider extending String API with `startsWith(prefix, beg, end)`.
As always fixing Java code gives a lot more performance improvement than JIT compiler can.
-------------
PR: https://git.openjdk.java.net/jdk/pull/974
More information about the hotspot-compiler-dev
mailing list