RFR(M): 8231460: Performance issue (CodeHeap) with large free blocks

Thomas Stüfe thomas.stuefe at gmail.com
Mon Nov 4 17:04:41 UTC 2019


Hi Andrew, Lutz,

I agree with Lutz in this case. I think the patch complexity could be
reduced if needed (see my mail to Lutz) if complexity is a concern, but I
like most of these changes and the comment improvements are nice.

Just my 5 cent.

Cheers, Thomas


On Mon, Nov 4, 2019 at 4:36 PM Schmidt, Lutz <lutz.schmidt at sap.com> wrote:

> Hi Andrew,
>
> thank you for your thoughts. I do not agree to your conclusion, though.
>
> There are two bottlenecks in the CodeHeap management code. One is in
> CodeHeap::mark_segmap_as_used(), uncovered by OverflowCodeCacheTest.java.
> The other is in CodeHeap::add_to_freelist(), uncovered by
> StressCodeCacheTest.java.
>
> Both bottlenecks are tackled by the recommended changeset.
>
> CodeHeap::mark_segmap_as_used() is no longer O(n*n) for the critical
> "FreeBlock-join" case. It actually is O(1) now. The time reduction from >
> 80 seconds to just a few milliseconds is proof of that statement.
>
> CodeHeap::add_to_freelist() is still O(n*n), with n being the free list
> length. But the kick-in point of the non-linearity could be significantly
> shifted towards larger n. The time reduction from approx. 8 seconds to 160
> milliseconds supports this statement.
>
> I agree it would be helpful to have a "real-world" example showing some
> improvement. Providing such evidence is hard, though. I could instrument
> the code and print some values form time to time. It's certain this
> additional output will mess up success/failure decisions in our test
> environment. Not sure everybody likes that. But I will give it a try and
> take the hits. This will be a multi-day effort.
>
> On a general note, I am always uncomfortable knowing of a O(n*n) effort,
> in particular when it could be removed or at least tamed considerably.
> Experience tells (at least to me) that, at some point in time, n will be
> large enough to hurt.
>
> I'll be back.
>
> Thanks,
> Lutz
>
>
> On 04.11.19, 11:08, "Andrew Dinn" <adinn at redhat.com> wrote:
>
>     Hi Lutz,
>
>     I'll summarize my thoughts here rather than answer point by point.
>
>     The patch successfully addresses the worst case performance but it
> seems
>     to me extremely unlikely that we will see anything that approaches that
>     case in real applications. So, that doesn't argue for pushing the
> patch.
>
>     The patch does not seem to make a significant difference to the stress
>     test. This test is also not necessarily 'representative' of real cases
>     but it is much more likely to be so than the worst case test. That
>     suggests to me that the current patch is perhaps not worth pursuing (it
>     ain't really broke so ...).  Especially so given that it is not
> possible
>     to distinguish any benefit when running the Spec benchmark apps. One
>     could argue that the patch looks like it will do no harm and may do
> good
>     in pathological  cases but that's not really good enough reason to make
>     a change. We really need evidence that this is worth doing.
>
>     The free list 'search bottleneck' certainly looks like a more promising
>     problem to tackle than the 'merge problem'. However, once again this
>     'problem' may just be an artefact of running this specific test rather
>     than anything that might happen in real life.
>
>     I think the only way to find out for sure whether the current patch or
> a
>     patch that addresses the 'search bottleneck' is going to be beneficial
>     is to instrument the JVM to record traces for code-cache use from real
>     apps and then replay allocations/frees based on those traces to see
> what
>     difference a patch makes and how much this might help the overall
>     execution time.
>
>     regards,
>
>
>     Andrew Dinn
>     -----------
>
>     On 31/10/2019 16:55, Schmidt, Lutz wrote:
>     > Hi Andrew, (and hi to the interested crowd),
>     >
>     > Please accept my apologies for taking so long to get back.
>     >
>     > These tests (OverflowCodeCacheTest and StressCodeCacheTest) were
> causing me quite some headaches. Some layer between me and the test
> prevents the vm (in particular: the VMThread) from terminating normally.
> The final output from my time measurements is therefore not generated or
> thrown away. Adding to that were some test machine unavailabilities and a
> bug in my measurement code, causing crashes.
>     >
>     > Anyway, I added some on-the-fly output, printing the timer values
> after 10k measurement intervals. This reveals some interesting, additional
> facts about the tests and the CodeHeap management methods. For detailed
> numbers, refer to the files attached to the bug (
> https://bugs.openjdk.java.net/browse/JDK-8231460). For even more detail,
> I can provide the jtr files on request.
>     >
>     >
>     > OverflowCodeCacheTest
>     > =====================
>     > This test runs (in my setup) with a 1GB CodeCache.
>     >
>     > For this test, CodeHeap::mark_segmap_as_used() is THE performance
> hog. 40% of all calls have to mark more than 16k segment map entries (in
> the not optimized case). Basically all of these calls convert to len=1
> calls with the optimization turned on. Note that during FreeBlock joining,
> the segment count is forced to 1(one). No wonder the time spent in
> CodeHeap::mark_segmap_as_used() collapses from >80sec (half of the test
> runtime) to <100msec.
>     >
>     > CodeHeap::add_to_freelist() on the other hand, is almost not
> observable. Average free list length is at two elements, making even linear
> search really quick.
>     >
>     >
>     > StressCodeCacheTest
>     > ===================
>     > With a 1GB CodeCache, this test runs into a 12 min timeout, set by
> our internal test environment. Scaling back to 300MB prevents the test from
> timing out.
>     >
>     > For this test, CodeHeap::mark_segmap_as_used() is not a factor. From
> 200,000 calls, only a few (less than 3%) had to process a block consisting
> of more than 16 segments. Note that during FreeBlock joining, the segment
> count is forced to 1(one).
>     >
>     > Another method is popping up as performance hog instead:
> CodeHeap::add_to_freelist(). More than 8 out of 40 seconds of test runtime
> (before optimization) are spent in this method, for just 160,000 calls. The
> test seems to create a long list of non-contiguous free blocks (around
> 5,500 on average). This list is linearly scanned to find the insert point
> for the free block at hand.
>     >
>     > Suffering as well from the long free block list is
> CodeHeap::search_freelist(). It uses another 2.7 seconds for 270,000
> calls.
>     >
>     >
>     > SPEVjvm2008 suite
>     > =================
>     > With respect to the task at hand, this is a well-behaved test suite.
> Timing shows some before/after difference, but nothing spectacular. The
> measurements due not provide evidence of a performance bottleneck.
>     >
>     >
>     > There were some minor adjustments to the code. Unused code blocks
> have been removed as well. I have therefore created a new webrev. You can
> find it here:
>     >    http://cr.openjdk.java.net/~lucy/webrevs/8231460.01/
>     >
>     > Thanks for investing your time!
>     > Lutz
>     >
>     >
>     > On 21.10.19, 15:06, "Andrew Dinn" <adinn at redhat.com> wrote:
>     >
>     >     Hi Lutz,
>     >
>     >     On 21/10/2019 13:37, Schmidt, Lutz wrote:
>     >     > I understand what you are interested in. And I was hoping to
> be able
>     >     > to provide some (first) numbers by today. Unfortunately, the
>     >     > measurement code I activated last Friday was buggy and blew
> most of
>     >     > the tests I had hoped to run over the weekend.
>     >     >
>     >     > I will take your modified test and run it with and without my
>     >     > optimization. In parallel, I will try to generate some
> (non-random)
>     >     > numbers for other tests.
>     >     >
>     >     > I'll be back as soon as I have results.
>     >
>     >     Thanks for trying the test and also for deriving some call stats
> from a
>     >     real example. I'm keen to see how much your patch improves
> things.
>     >
>     >     regards,
>     >
>     >
>     >     Andrew Dinn
>     >     -----------
>     >     Senior Principal Software Engineer
>     >     Red Hat UK Ltd
>     >     Registered in England and Wales under Company Registration No.
> 03798903
>     >     Directors: Michael Cunningham, Michael ("Mike") O'Neill
>     >
>     >
>     >
>     >
>
>
>
>


More information about the hotspot-compiler-dev mailing list