RFR(M): 8231460: Performance issue (CodeHeap) with large free blocks

Tue Nov 19 16:57:52 UTC 2019

Looks good, Lutz.

..Thomas

On Tue, Nov 19, 2019 at 3:36 PM Schmidt, Lutz <lutz.schmidt at sap.com> wrote:

> Hi Andrew,
>
> finally(!) I was able to create some measurements which show kind of an
> effect on a real-world problem.
>
> I added my timers when running the renaissance benchmark (
> https://renaissance.dev). I am well aware of the limitations. One could
> argue this benchmark does not solve a real-world problem. Furthermore, the
> optimizations do not have a visible effect on the overall runtime (> 1
> hour) of the test. But at least, deep down, the inner mechanics of CodeHeap
> management show some timing difference. I have attached a file with some
> measurement data to this mail for convenience. The same file was also
> uploaded to the bug. The measurements are from runs on linuxppc64. Other
> platforms show similar results.
>
> Here is what you can see (and my interpretation of the visible):
>
> CodeHeap::mark_segmap_as_used()
> ===============================
> The number of segment map entries to be processed per call is reduced by a
> factor of 2.5 to 5. As a consequence, the time spent in the method
> decreases as well, but not by the same factor. This is due to the added
> check for fragmentation and the defragmentation itself which occurs twice
> and eliminates roughly 3.500 excessive fragments.
>
> CodeHeap::add_to_freelist()
> ===========================
> Here, the free list length controls the effort spent. Depending on the
> platform, the length increases by a factor of 2 (with optimizations turned
> on) or decreases by the same factor. Even with increased free list length,
> the total time spent in the method decreases. That's obviously an effect of
> not having to search the free list from the beginning every time.
>
>
> I have created a new webrev, mainly to reflect the changes I applied,
> based on Thomas' comments:
> http://cr.openjdk.java.net/~lucy/webrevs/8231460.02/
>
> jdk/submit tests pending...
>
> Please let me know if we have reached a state now where this change can be
> considered reviewed.
>
> Thanks a lot,
> Lutz
>
>
>
> On 07.11.19, 22:33, "Schmidt, Lutz" <lutz.schmidt at sap.com> wrote:
>
>     Hi Andrew,
>
>     thanks for spending more thoughts on this matter - and for updating
>     your opinion.
>
>     The instrumentation and measurement of other tests will take longer
> than expected. It got delayed by JDK-8233787. The fix for this bug will
> enable my timing code to run smoother.
>
>     Side note: this timing code I have mentioned now several times is
> nothing secret. It's just not suitable to contribute, among other reasons
> because it's only available for ppc and s390. I can give you more
> information in case you are interested - no problem if you say "ahhh, never
> mind...".
>
>     Thanks,
>     Lutz
>
>     On 07.11.19, 17:34, "Andrew Dinn" <adinn at redhat.com> wrote:
>
>         On 04/11/2019 15:35, Schmidt, Lutz wrote:
>         > thank you for your thoughts. I do not agree to your conclusion,
>         > though.
>         >
>         > There are two bottlenecks in the CodeHeap management code. One
> is in
>         > CodeHeap::mark_segmap_as_used(), uncovered by
>         > OverflowCodeCacheTest.java.  The other is in
>         > CodeHeap::add_to_freelist(), uncovered by
> StressCodeCacheTest.java.
>         >
>         > Both bottlenecks are tackled by the recommended changeset.
>           . . .
>         > CodeHeap::add_to_freelist() is still O(n*n), with n being the
> free
>         > list length. But the kick-in point of the non-linearity could be
>         > significantly shifted towards larger n. The time reduction from
>         > approx. 8 seconds to 160 milliseconds supports this statement.
>
>         Ah sorry, I was not clear from your original post that the proposed
>         change had significantly improved the time spent in free list
> management
>         in the second test by significantly cutting down the free list
> size. As
>         you say, a reduction factor of 1/K in list size will give a 1/K*K
>         reduction in execution time. Since this test is a lot nearer to
> reality
>         than the overflow test I think the current result is perhaps
> enough to
>         justify its value.
>
>         > I agree it would be helpful to have a "real-world" example
> showing
>         > some improvement. Providing such evidence is hard, though. I
> could
>         > instrument the code and print some values form time to time.
> It's
>         > certain this additional output will mess up success/failure
> decisions
>         > in our test environment. Not sure everybody likes that. But I
> will
>         > give it a try and take the hits. This will be a multi-day effort.
>
>         Well, that would be nice to have but not if it stops other work.
> The one
>         thing about the Stress test that I fear may be 'unreal' is the
>         potentially over-high probability of generating long(ish) runs of
>         adjacent free segments. That might be giving an artificial win
> that we
>         will not in fact see. However, given the current numbers I'd be
> happy to
>         risk that and let this patch go in as is.
>
>         > On a general note, I am always uncomfortable knowing of a O(n*n)
>         > effort, in particular when it could be removed or at least tamed
>         > considerably. Experience tells (at least to me) that, at some
> point
>         > in time, n will be large enough to hurt.
>
>         Well, yes, although salesman do travel /and/ make money ... ;-)
>
>         > I'll be back.
>
>         Sure, thanks for following up. This is all very interesting.
>
>         regards,
>
>
>         Andrew Dinn
>         -----------
>         Senior Principal Software Engineer
>         Red Hat UK Ltd
>         Registered in England and Wales under Company Registration No.
> 03798903
>         Directors: Michael Cunningham, Michael ("Mike") O'Neill
>
>
>
>
>
>
>