RFR: 8278518: String(byte[], int, int, Charset) constructor and String.translateEscapes() miss bounds check elimination [v4]
John R Rose
jrose at openjdk.java.net
Sat Jan 15 04:32:23 UTC 2022
On Fri, 14 Jan 2022 16:26:12 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> In the micro benchmark for this bug, the bytecode has a single loop
>> head with 2 backedges. One branch is taken a lot more frequently than
>> the other (for instance, profiling reports 3820938 for the most
>> common, 134068 for the least common). When ciTypeFlow builds its loop
>> tree, the least common loop ends up as the inner loop (ciTypeFlow uses
>> the tail block number to order loops with a shared header). Then
>> ciTypeFlow clones the inner loop head.
>>
>> The results I get then is:
>>
>> StringBenchmark.safeDecoding avgt 5 141.965 ± 53.511 ns/op
>> StringBenchmark.unsafeDecoding avgt 5 123.051 ± 77.855 ns/op
>>
>> The unsafe version uses unsafe instead of standard array accesses. The
>> reason for the unsafe version is to demonstrate that a missed range
>> check elimination in the common path hurts performance (the range
>> check is not performed on the loop iv but another phi so can't be
>> eliminated).
>>
>> The patch I propose takes back branch count (from profile data) into
>> account when building the ciTypeFlow tree. As a consequence, the inner
>> loop (the one for which the loop head is cloned), is the most most
>> frequent one. The range check in the common path is also hoisted in
>> that case (it's now performed on the iv). Running the micro benchmark
>> again, I get:
>>
>> StringBenchmark.safeDecoding avgt 5 53.368 ± 2.232 ns/op
>> StringBenchmark.unsafeDecoding avgt 5 55.565 ± 5.124 ns/op
>>
>> Both are improved (2-3x) and the unsafe version doesn't perform better.
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
>
> review
Thank you; that's much clearer.
Reviewed!
A couple of minor style issues I noticed which you might address (or not):
- The name insertion_point should be probably be named is_insertion_point or at_insertion_point since it's a predicate
- In `assert((succs->at(ciTypeFlow::IF_NOT_TAKEN) == loop->head()) == (tail->limit() == loop->head()->start()), "bytecode and CFG not consistent")` the top-level `==` should be reversed so that the assert looks more like the previous assert. It's easier to tell what they are doing if their structures line up with each other.
I wish we could handle profile counts on `switch` arms. But that's a matter for another day.
-------------
Marked as reviewed by jrose (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/7034
More information about the hotspot-compiler-dev
mailing list