RFR(M): 8231460: Performance issue (CodeHeap) with large free blocks
Thomas Stüfe
thomas.stuefe at gmail.com
Tue Nov 19 16:57:52 UTC 2019
Looks good, Lutz.
..Thomas
On Tue, Nov 19, 2019 at 3:36 PM Schmidt, Lutz <lutz.schmidt at sap.com> wrote:
> Hi Andrew,
>
> finally(!) I was able to create some measurements which show kind of an
> effect on a real-world problem.
>
> I added my timers when running the renaissance benchmark (
> https://renaissance.dev). I am well aware of the limitations. One could
> argue this benchmark does not solve a real-world problem. Furthermore, the
> optimizations do not have a visible effect on the overall runtime (> 1
> hour) of the test. But at least, deep down, the inner mechanics of CodeHeap
> management show some timing difference. I have attached a file with some
> measurement data to this mail for convenience. The same file was also
> uploaded to the bug. The measurements are from runs on linuxppc64. Other
> platforms show similar results.
>
> Here is what you can see (and my interpretation of the visible):
>
> CodeHeap::mark_segmap_as_used()
> ===============================
> The number of segment map entries to be processed per call is reduced by a
> factor of 2.5 to 5. As a consequence, the time spent in the method
> decreases as well, but not by the same factor. This is due to the added
> check for fragmentation and the defragmentation itself which occurs twice
> and eliminates roughly 3.500 excessive fragments.
>
> CodeHeap::add_to_freelist()
> ===========================
> Here, the free list length controls the effort spent. Depending on the
> platform, the length increases by a factor of 2 (with optimizations turned
> on) or decreases by the same factor. Even with increased free list length,
> the total time spent in the method decreases. That's obviously an effect of
> not having to search the free list from the beginning every time.
>
>
> I have created a new webrev, mainly to reflect the changes I applied,
> based on Thomas' comments:
> http://cr.openjdk.java.net/~lucy/webrevs/8231460.02/
>
> jdk/submit tests pending...
>
> Please let me know if we have reached a state now where this change can be
> considered reviewed.
>
> Thanks a lot,
> Lutz
>
>
>
> On 07.11.19, 22:33, "Schmidt, Lutz" <lutz.schmidt at sap.com> wrote:
>
> Hi Andrew,
>
> thanks for spending more thoughts on this matter - and for updating
> your opinion.
>
> The instrumentation and measurement of other tests will take longer
> than expected. It got delayed by JDK-8233787. The fix for this bug will
> enable my timing code to run smoother.
>
> Side note: this timing code I have mentioned now several times is
> nothing secret. It's just not suitable to contribute, among other reasons
> because it's only available for ppc and s390. I can give you more
> information in case you are interested - no problem if you say "ahhh, never
> mind...".
>
> Thanks,
> Lutz
>
> On 07.11.19, 17:34, "Andrew Dinn" <adinn at redhat.com> wrote:
>
> On 04/11/2019 15:35, Schmidt, Lutz wrote:
> > thank you for your thoughts. I do not agree to your conclusion,
> > though.
> >
> > There are two bottlenecks in the CodeHeap management code. One
> is in
> > CodeHeap::mark_segmap_as_used(), uncovered by
> > OverflowCodeCacheTest.java. The other is in
> > CodeHeap::add_to_freelist(), uncovered by
> StressCodeCacheTest.java.
> >
> > Both bottlenecks are tackled by the recommended changeset.
> . . .
> > CodeHeap::add_to_freelist() is still O(n*n), with n being the
> free
> > list length. But the kick-in point of the non-linearity could be
> > significantly shifted towards larger n. The time reduction from
> > approx. 8 seconds to 160 milliseconds supports this statement.
>
> Ah sorry, I was not clear from your original post that the proposed
> change had significantly improved the time spent in free list
> management
> in the second test by significantly cutting down the free list
> size. As
> you say, a reduction factor of 1/K in list size will give a 1/K*K
> reduction in execution time. Since this test is a lot nearer to
> reality
> than the overflow test I think the current result is perhaps
> enough to
> justify its value.
>
> > I agree it would be helpful to have a "real-world" example
> showing
> > some improvement. Providing such evidence is hard, though. I
> could
> > instrument the code and print some values form time to time.
> It's
> > certain this additional output will mess up success/failure
> decisions
> > in our test environment. Not sure everybody likes that. But I
> will
> > give it a try and take the hits. This will be a multi-day effort.
>
> Well, that would be nice to have but not if it stops other work.
> The one
> thing about the Stress test that I fear may be 'unreal' is the
> potentially over-high probability of generating long(ish) runs of
> adjacent free segments. That might be giving an artificial win
> that we
> will not in fact see. However, given the current numbers I'd be
> happy to
> risk that and let this patch go in as is.
>
> > On a general note, I am always uncomfortable knowing of a O(n*n)
> > effort, in particular when it could be removed or at least tamed
> > considerably. Experience tells (at least to me) that, at some
> point
> > in time, n will be large enough to hurt.
>
> Well, yes, although salesman do travel /and/ make money ... ;-)
>
> > I'll be back.
>
> Sure, thanks for following up. This is all very interesting.
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No.
> 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill
>
>
>
>
>
>
>
More information about the hotspot-compiler-dev
mailing list