RFR: 8254807: Optimize startsWith() for String.substring() [v2]
Xin Liu
xliu at openjdk.java.net
Wed Nov 18 10:14:05 UTC 2020
On Wed, 11 Nov 2020 08:07:20 GMT, Xin Liu <xliu at openjdk.org> wrote:
>> here is the result of updated micro benchmark.
>> DoubeleBytes group is the utf-16 string. SingleByte group the ascii string.
>> without the optimization, the throughput is just 1/4~1/5 than hand-crafted code.
>>
>> With the optimization, the throughput can reach over 80% of hand-crafted code. The reason it can't reach 100% throughput of hand-crafted equivalence is that it has to generate stricter boundary checks.
>>
>> // without OptimizeSubstring
>> Benchmark (substrLength) Mode Cnt Score Error Units
>> SubstringStartsWith.substr2StartsWith_doubleBytes 4 thrpt 125 40847.644 ? 1269.274 ops/ms
>> SubstringStartsWith.substr2StartsWith_doubleBytes 24 thrpt 125 39284.961 ? 155.109 ops/ms
>> SubstringStartsWith.substr2StartsWith_doubleBytes 256 thrpt 125 12370.385 ? 622.280 ops/ms
>> SubstringStartsWith.substr2StartsWith_singleByte 4 thrpt 125 60633.544 ? 1989.512 ops/ms
>> SubstringStartsWith.substr2StartsWith_singleByte 24 thrpt 125 60613.490 ? 1024.846 ops/ms
>> SubstringStartsWith.substr2StartsWith_singleByte 256 thrpt 125 30212.033 ? 1106.752 ops/ms
>> SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 4 thrpt 125 154666.457 ? 7.132 ops/ms
>> SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 24 thrpt 125 154659.583 ? 7.663 ops/ms
>> SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 256 thrpt 125 154665.357 ? 6.414 ops/ms
>> SubstringStartsWith.substr2StartsWith_noalloc_singleByte 4 thrpt 125 162833.972 ? 8.170 ops/ms
>> SubstringStartsWith.substr2StartsWith_noalloc_singleByte 24 thrpt 125 162834.059 ? 7.862 ops/ms
>> SubstringStartsWith.substr2StartsWith_noalloc_singleByte 256 thrpt 125 162456.046 ? 217.304 ops/ms
>>
>>
>> // with OptimizeSubstring
>> Benchmark (substrLength) Mode Cnt Score Error Units
>> SubstringStartsWith.substr2StartsWith_doubleBytes 4 thrpt 125 119789.181 ? 3374.123 ops/ms
>> SubstringStartsWith.substr2StartsWith_doubleBytes 24 thrpt 125 123740.637 ? 31.982 ops/ms
>> SubstringStartsWith.substr2StartsWith_doubleBytes 256 thrpt 125 123701.525 ? 68.741 ops/ms
>> SubstringStartsWith.substr2StartsWith_singleByte 4 thrpt 125 134529.257 ? 6.331 ops/ms
>> SubstringStartsWith.substr2StartsWith_singleByte 24 thrpt 125 134517.222 ? 6.373 ops/ms
>> SubstringStartsWith.substr2StartsWith_singleByte 256 thrpt 125 134527.929 ? 4.660 ops/ms
>> SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 4 thrpt 125 154668.784 ? 6.990 ops/ms
>> SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 24 thrpt 125 154567.971 ? 69.286 ops/ms
>> SubstringStartsWith.substr2StartsWith_noalloc_doubleBytes 256 thrpt 125 154630.115 ? 36.157 ops/ms
>> SubstringStartsWith.substr2StartsWith_noalloc_singleByte 4 thrpt 125 162852.894 ? 5.933 ops/ms
>> SubstringStartsWith.substr2StartsWith_noalloc_singleByte 24 thrpt 125 162850.630 ? 5.990 ops/ms
>> SubstringStartsWith.substr2StartsWith_noalloc_singleByte 256 thrpt 125 162841.513 ? 6.441 ops/ms
>
> hello,
>
> May I ping to review this patch?
>
> Many business-oriented Java applications manipulate strings a lot. Comparing to arithmetic scalar, string as oop is more expensive. furthermore, many string objects need to be allocated on heap, so they increase workload of GC.
>
> By analyzing the construction of string object, I found that one source is String.substring(). My target is to eliminate substring as many as possible. This is the first attempt for me to enable substring optimization. If it works out, I will apply the api-substitution approach on other APIs such as String.charAt, StringBuilder::append, and even String.split().
>
> thanks you in advance.
hi, Vladimir,
Thank you for taking time to review it.
I would like to call out that I plan to optimize substring(). My single aim to cut off a use of substring in this PR.
This is a gateway patch. I plan to apply a series of API substitutions for substrings. eg. StringBuilder::append, String.split and String.charAt etc.
> _Mailing list message from [Vladimir Ivanov](mailto:vladimir.x.ivanov at oracle.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_
>
> Hi Xin,
>
> > the optimization transforms code from s=substring(base, beg, end); s.startsWith(prefix)
> > to substring(base, beg, end) | base.startsWith(prefix, beg).
> > it reduces an use of substring. hopefully c2 optimizer can remove the used substring()
>
> It would be very helpful to see a more elaborate description of intended
> behavior to understand better what is the desired goal of the enhancement.
>
> Though it looks attractive to address the problem in the JIT-compiler,
> it poses some new challenges which makes proposed approach questionable.
> I understand your desire to rely on existing String-related
> optimizations, but coalescing multiple concatenations differs
> significantly from your case.
>
> Some of the concerns/questions I had while briefly looking through the
> patch:
>
> - You introduce a call to a method (String::startsWith(String, int))
> which is not present in bytecode. It means (unless the method is called
> from different places) there won't be any profiling info present, and
> even if there is, it is unrelated to the call site being compiled. If
> the code is recompiled at some point (e.g., it hits an uncommon trap in
> startsWith(String,int) overload), there won't be any re-profiling
> happening. So, it can hit the very same condition on the consequent
> recompilations.
>
Thank you to raise this! I didn't consider the profiling data before.
it is an issue. Maybe it can explain why I observe only 80% performance of hand-craft code.
TBH, I haven't had a clear and solid answer for it yet.
The transformation actually won't alter control-flow. Perhaps, I can find a way to amend methoddata from the old method.
Or, is that possible I request to reprofile it after I modify some bytecodes?
> - "s.startsWith(prefix)" call node is reused and rewired to call
> unrelated method "base.startsWith(prefix, beg)". Will the new target
> method be taken into account during inlining? I would be much more
> comfortable seeing fresh call populated using standard process
> (involving CallGenerators et al) which then substitutes the original
> node. That way you make it less fragile longer term.
>
I can make that happen. actually, I generated a brand-new call-generator in early revision of this patch.
I drop it because it requires more code.
> - "hopefully c2 optimizer can remove the used substring()"
> If everything is inlined in the end, it can happen, but it's fragile.
> Instead, you could teach C2 that the method is "pure" (no interesting
> side effects to care about) and cut the call early. It already happens
> for boxing methods (see LateInlineCallGenerator::_is_pure_call for
> details).
>
I do have the exact logic in early revision. I remarked substring() as late-inlining candidate and pure.
I delete it because I found substring() is "always" inlined and c2 optimizer can do the deadcode elimination job if nobody uses it.
I understand your concern. I can improve it in the future.
> Overall, if you want to keep the enhancement C2-specific, I'd suggest to
> look into intrinsifying String::startsWith(String, int) and not relying
> on its bytecodes at all. That way you would avoid fighting against the
> rest of the JVM in some situations.
>
Because this optimization isn't one-off thing for startsWith, I can't intrinsify it.
I'd like to explore an extensible approach.
> Best regards,
> Vladimir Ivanov
-------------
PR: https://git.openjdk.java.net/jdk/pull/974
More information about the hotspot-compiler-dev
mailing list