RFR(M): 8231460: Performance issue (CodeHeap) with large free blocks

Mon Nov 4 17:02:26 UTC 2019

Hi Lutz,

I like this patch and have a number of remarks:

Segment map splicing:

- If the last byte of the leading block happens to contain 0xFD, you do not
need to increment the fragmentation counter after splicing since the
leading hop would be fully formed.

- If patch complexity is a concern, I would maybe leave out the
segmap_template. That would simplify the coding a bit and I am not sure how
much this actually brings. I really am curious. Is it better to let the
compiler store immediates or to memcpy the template? Would the latter not
mean I pay for the loads too?

- In ::allocate(), I like that you re-use block. This is just bikeshedding,
but you also could reshape this function a tiny bit for a clean single exit
at the end of the function.

- Comment nits:

- * code block (e.g. nmethod) when only a pointer to somewhere inside the
+ * code block (e.g. nmethod) when only a pointer to a location inside the

- *  - Segment addresses should be aligned to be multiples of
CodeCacheSegmentSize.
+ *  - Segment start addresses should be aligned to be multiples of
CodeCacheSegmentSize.

- *  - Allocation in the code cache can only happen at segment boundaries.
+ *  - Allocation in the code cache can only happen at segment start
addresses.

- I would like comments for the newly added functions in heap.hpp, not in
the cpp file, because my IDE expands those comments from the hpp. But thats
just me :=)

- block_at() could be implemented now using address_for and a cast

- fragmentation_limit, freelist_limit: I thought using enums for constants
is not done anymore, and the preferred way is to use static const. See e.g.
markWord.hpp.

- +  // Boundaries of committed space.
  +  // Boundaries of reserved space.
   Thanks for the comments. Out of curiosity, can the lower part be
uncommitted? Or would low() == low_boundary() always be true?

- +  bool contains(const void* p) const             { return low() <= p &&
p < high(); }
  Is comparing with void* legal C++?

- CodeHeap::merge_right():
    - Nits, I would maybe rename "follower" to "beg_follower", or,
alternatively, "beg" to "leader".
    - I wonder whether we can have a "wipe header function" which only
wipes the FreeBlock header of the follower, instead of wiping the whole
segment.

- CodeHeap::add_to_freelist():
   I am astonished that this brings such a performance increase. I would
naively have thought the deallocation to be pretty random, and therefore
this technique to have an improvement of factor 2 in general, but we also
need to find the start of the last_insert_free_block which may take some
hops, and then, the block might not even be free...

--

General remarks, not necessarily for your patch:

- This code could really gain readability if the segment id would be its an
own type, e.g. "typdef int segid_t". APIs like "mark_segmap_as_used(segid_t
beg, segid_t end) are immediately clearer than when size_t is used.

- In CodeHeap::verify(), do we actually check that the segment map is
correctly formed? That from every point in the map, we reach the correct
start of the associated code blob? I could not find this but I may be blind.

Thanks, Thomas

On Thu, Oct 31, 2019 at 5:55 PM Schmidt, Lutz <lutz.schmidt at sap.com> wrote:

> Hi Andrew, (and hi to the interested crowd),
>
> Please accept my apologies for taking so long to get back.
>
> These tests (OverflowCodeCacheTest and StressCodeCacheTest) were causing
> me quite some headaches. Some layer between me and the test prevents the vm
> (in particular: the VMThread) from terminating normally. The final output
> from my time measurements is therefore not generated or thrown away. Adding
> to that were some test machine unavailabilities and a bug in my measurement
> code, causing crashes.
>
> Anyway, I added some on-the-fly output, printing the timer values after
> 10k measurement intervals. This reveals some interesting, additional facts
> about the tests and the CodeHeap management methods. For detailed numbers,
> refer to the files attached to the bug (
> https://bugs.openjdk.java.net/browse/JDK-8231460). For even more detail,
> I can provide the jtr files on request.
>
>
> OverflowCodeCacheTest
> =====================
> This test runs (in my setup) with a 1GB CodeCache.
>
> For this test, CodeHeap::mark_segmap_as_used() is THE performance hog. 40%
> of all calls have to mark more than 16k segment map entries (in the not
> optimized case). Basically all of these calls convert to len=1 calls with
> the optimization turned on. Note that during FreeBlock joining, the segment
> count is forced to 1(one). No wonder the time spent in
> CodeHeap::mark_segmap_as_used() collapses from >80sec (half of the test
> runtime) to <100msec.
>
> CodeHeap::add_to_freelist() on the other hand, is almost not observable.
> Average free list length is at two elements, making even linear search
> really quick.
>
>
> StressCodeCacheTest
> ===================
> With a 1GB CodeCache, this test runs into a 12 min timeout, set by our
> internal test environment. Scaling back to 300MB prevents the test from
> timing out.
>
> For this test, CodeHeap::mark_segmap_as_used() is not a factor. From
> 200,000 calls, only a few (less than 3%) had to process a block consisting
> of more than 16 segments. Note that during FreeBlock joining, the segment
> count is forced to 1(one).
>
> Another method is popping up as performance hog instead:
> CodeHeap::add_to_freelist(). More than 8 out of 40 seconds of test runtime
> (before optimization) are spent in this method, for just 160,000 calls. The
> test seems to create a long list of non-contiguous free blocks (around
> 5,500 on average). This list is linearly scanned to find the insert point
> for the free block at hand.
>
> Suffering as well from the long free block list is
> CodeHeap::search_freelist(). It uses another 2.7 seconds for 270,000
> calls.
>
>
> SPEVjvm2008 suite
> =================
> With respect to the task at hand, this is a well-behaved test suite.
> Timing shows some before/after difference, but nothing spectacular. The
> measurements due not provide evidence of a performance bottleneck.
>
>
> There were some minor adjustments to the code. Unused code blocks have
> been removed as well. I have therefore created a new webrev. You can find
> it here:
>    http://cr.openjdk.java.net/~lucy/webrevs/8231460.01/
>
> Thanks for investing your time!
> Lutz
>
>
> On 21.10.19, 15:06, "Andrew Dinn" <adinn at redhat.com> wrote:
>
>     Hi Lutz,
>
>     On 21/10/2019 13:37, Schmidt, Lutz wrote:
>     > I understand what you are interested in. And I was hoping to be able
>     > to provide some (first) numbers by today. Unfortunately, the
>     > measurement code I activated last Friday was buggy and blew most of
>     > the tests I had hoped to run over the weekend.
>     >
>     > I will take your modified test and run it with and without my
>     > optimization. In parallel, I will try to generate some (non-random)
>     > numbers for other tests.
>     >
>     > I'll be back as soon as I have results.
>
>     Thanks for trying the test and also for deriving some call stats from a
>     real example. I'm keen to see how much your patch improves things.
>
>     regards,
>
>
>     Andrew Dinn
>     -----------
>     Senior Principal Software Engineer
>     Red Hat UK Ltd
>     Registered in England and Wales under Company Registration No. 03798903
>     Directors: Michael Cunningham, Michael ("Mike") O'Neill
>
>
>
>
>