RFR: 8276066: Reset LoopPercentProfileLimit for x86 due to suboptimal performance

Jie Fu jiefu at openjdk.java.net
Mon Nov 1 08:49:09 UTC 2021


On Wed, 27 Oct 2021 14:26:49 GMT, Jie Fu <jiefu at openjdk.org> wrote:

> Hi all,
> 
> I'd like to reset the value of `LoopPercentProfileLimit` (from 30 to the original 10) for x86 to fix performance degradation.
> 
> We had observed that for the same Java App, the performance of x86 is slower than that of aarch64.
> But the x86's performance should not be so worse than the aarch64 according to some SPEC benchmark results.
> 
> After some investigation, it seems that the slowness of x86 is caused by the different default settings of `LoopPercentProfileLimit` (30 for x86, but 10 for other platforms).
> If we change `LoopPercentProfileLimit` from 30 to 10, x86 would run faster.
> 
> In JDK-8149421, `LoopPercentProfileLimit` [1] was first added and set to be 30 for x86 and 10 for other platforms.
> Logically, the default value of `LoopPercentProfileLimit` is 10 for all platforms even before JDK-8149421.
> This is because when `LoopPercentProfileLimit=10`, `10.0` [2] equals `100.0 / LoopPercentProfileLimit` [3].
> So if we set `LoopPercentProfileLimit=10`, this unrolling rule [3] would be the same as the original design before JDK-8149421.
> 
> One most important fact is that from the very beginning of OpenJDK source code, the default value of `LoopPercentProfileLimit` (logically) is 10 for all platforms.   
> So I suggest resetting `LoopPercentProfileLimit` to the original value (10) for x86, just as other platforms.
> 
> I've noted that the review thread mentioned that JDK-8149421 would be beneficial for some SPECjvm2008 benchmarks [4].
> Then I run SPECjvm2008 with `LoopPercentProfileLimit=10` finding that there is no performance drop on x86.
> So it won't revert JDK-8149421's opts for SPECjvm2008.
> 
> To show the potential improvement of this change, I've made a jmh test in the patch.
> Performance can be improved by 1.25x ~ 2.0x according to this micro benchmark.
> 
> Any comments?
> 
> Thanks.
> Best regards,
> Jie
> 
>   
> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L908
> [2] https://github.com/openjdk/jdk8u/blob/master/hotspot/src/share/vm/opto/loopTransform.cpp#L673
> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopTransform.cpp#L903
> [4] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-February/021205.html
> 
> <img width="420" alt="ratio" src="https://user-images.githubusercontent.com/19923746/139084273-aa2e2eae-4a74-4fcb-8430-d2e2e49d5d5c.png">
> 
> <img width="615" alt="before" src="https://user-images.githubusercontent.com/19923746/139084508-793f1109-1ce3-4427-a2e3-660c91758a7c.png">
> 
> <img width="617" alt="after" src="https://user-images.githubusercontent.com/19923746/139084542-c8a8a705-d7ed-499f-9ceb-7175671c0e3b.png">

> > I'll run this through our performance testing and report back.
> 
> Performance results look good.
> 
> Is this change still required after re-enabling post loop vectorization?



> > I'll run this through our performance testing and report back.
> 
> Performance results look good.
> 
> Is this change still required after re-enabling post loop vectorization?

This is part of loop unrolling rule, so I think it would be better to change it to improve the performance for the current code base.
Then all the future opts on x86 can be evaluated based on that improved version.
And we may backport the change to other repos like jdk11u.
So I would suggest resetting it to 10 on x86.
Thanks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6142


More information about the hotspot-compiler-dev mailing list