RFR(M): 8231460: Performance issue (CodeHeap) with large free blocks

Tue Nov 19 14:35:45 UTC 2019

Hi Andrew, 

finally(!) I was able to create some measurements which show kind of an effect on a real-world problem. 

I added my timers when running the renaissance benchmark (https://renaissance.dev). I am well aware of the limitations. One could argue this benchmark does not solve a real-world problem. Furthermore, the optimizations do not have a visible effect on the overall runtime (> 1 hour) of the test. But at least, deep down, the inner mechanics of CodeHeap management show some timing difference. I have attached a file with some measurement data to this mail for convenience. The same file was also uploaded to the bug. The measurements are from runs on linuxppc64. Other platforms show similar results. 

Here is what you can see (and my interpretation of the visible):

CodeHeap::mark_segmap_as_used()
===============================
The number of segment map entries to be processed per call is reduced by a factor of 2.5 to 5. As a consequence, the time spent in the method decreases as well, but not by the same factor. This is due to the added check for fragmentation and the defragmentation itself which occurs twice and eliminates roughly 3.500 excessive fragments. 

CodeHeap::add_to_freelist()
===========================
Here, the free list length controls the effort spent. Depending on the platform, the length increases by a factor of 2 (with optimizations turned on) or decreases by the same factor. Even with increased free list length, the total time spent in the method decreases. That's obviously an effect of not having to search the free list from the beginning every time.

I have created a new webrev, mainly to reflect the changes I applied, based on Thomas' comments:
http://cr.openjdk.java.net/~lucy/webrevs/8231460.02/ 

jdk/submit tests pending...

Please let me know if we have reached a state now where this change can be considered reviewed. 

Thanks a lot,
Lutz

On 07.11.19, 22:33, "Schmidt, Lutz" <lutz.schmidt at sap.com> wrote:

    Hi Andrew,

    thanks for spending more thoughts on this matter - and for updating
    your opinion.

    The instrumentation and measurement of other tests will take longer than expected. It got delayed by JDK-8233787. The fix for this bug will enable my timing code to run smoother.

    Side note: this timing code I have mentioned now several times is nothing secret. It's just not suitable to contribute, among other reasons because it's only available for ppc and s390. I can give you more information in case you are interested - no problem if you say "ahhh, never mind...".

    Thanks,
    Lutz

    On 07.11.19, 17:34, "Andrew Dinn" <adinn at redhat.com> wrote:

        On 04/11/2019 15:35, Schmidt, Lutz wrote:
        > thank you for your thoughts. I do not agree to your conclusion, 
        > though.
        > 
        > There are two bottlenecks in the CodeHeap management code. One is in
        > CodeHeap::mark_segmap_as_used(), uncovered by
        > OverflowCodeCacheTest.java.  The other is in 
        > CodeHeap::add_to_freelist(), uncovered by StressCodeCacheTest.java.
        > 
        > Both bottlenecks are tackled by the recommended changeset.
          . . .
        > CodeHeap::add_to_freelist() is still O(n*n), with n being the free 
        > list length. But the kick-in point of the non-linearity could be 
        > significantly shifted towards larger n. The time reduction from 
        > approx. 8 seconds to 160 milliseconds supports this statement.

        Ah sorry, I was not clear from your original post that the proposed
        change had significantly improved the time spent in free list management
        in the second test by significantly cutting down the free list size. As
        you say, a reduction factor of 1/K in list size will give a 1/K*K
        reduction in execution time. Since this test is a lot nearer to reality
        than the overflow test I think the current result is perhaps enough to
        justify its value.

        > I agree it would be helpful to have a "real-world" example showing 
        > some improvement. Providing such evidence is hard, though. I could 
        > instrument the code and print some values form time to time. It's 
        > certain this additional output will mess up success/failure decisions
        > in our test environment. Not sure everybody likes that. But I will
        > give it a try and take the hits. This will be a multi-day effort.

        Well, that would be nice to have but not if it stops other work. The one
        thing about the Stress test that I fear may be 'unreal' is the
        potentially over-high probability of generating long(ish) runs of
        adjacent free segments. That might be giving an artificial win that we
        will not in fact see. However, given the current numbers I'd be happy to
        risk that and let this patch go in as is.

        > On a general note, I am always uncomfortable knowing of a O(n*n) 
        > effort, in particular when it could be removed or at least tamed 
        > considerably. Experience tells (at least to me) that, at some point 
        > in time, n will be large enough to hurt.

        Well, yes, although salesman do travel /and/ make money ... ;-)

        > I'll be back.

        Sure, thanks for following up. This is all very interesting.

        regards,

        Andrew Dinn
        -----------
        Senior Principal Software Engineer
        Red Hat UK Ltd
        Registered in England and Wales under Company Registration No. 03798903
        Directors: Michael Cunningham, Michael ("Mike") O'Neill

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Renaissance_CodeHeap_timing.txt
URL: <https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20191119/1fdd6ed6/Renaissance_CodeHeap_timing.txt>