[aarch64-port-dev ] Issue with 64K page size for JDK7 on AArch64

Fri Nov 7 10:08:41 UTC 2014

I have seen some 'interesting' results from a jtreg test running on my
mustang box which relate to the page size on Linux/AArch64 (n.b. the
x86-aarch64 hybrid build does not manifest these problems because it
uses the Linux-x86 page size).

The salient datum is this: os::vm_page_size() returns 64K on my mustang
whereas it returns 4K on x86 hardware. The problem with this disparity
arises because of the desire to ensure that card tables used to mark
heap region occupancy start and end on an OS page boundary -- this is
detailed in the comment at the head of method
GenCollectorPolicy::compute_max_alignment (see
memory/collectorPolicy.cpp:219).

The card size is 512 i.e. each byte in the card table array marks 512
bytes in the corresponding heap area. So, the size of the byte array
needed for the card table is computed by dividing the heap extent by
this card size. Conversely, if the resulting array is to occupy some
integral number of os pages then the heap area start and end have to be
aligned to 512 * os::page_size(). On linux-x86 that alignment is 2Mb but
on linux-aarch64 it is 32Mb.

This causes a problem because it means 32Mb becomes the granule size for
any card-marked heap. So, when using the default, parallel collector
this means that whatever value is specified for the perm gen max
(-XX:MaxPermSize=xxx)  gets coerced to be a multiple of 32Mb. That's ok
if you use the default (64M) but can cause problems for existing clients
which specify a specific max size.

If you specify a max extent which is not a multiple of 32Mb it gets
rounded down to the greatest lesser multiple (e.g. ask for 48Mb and what
you get is 32Mb). In particular, if you specify a max extent less than
32Mb then it gets rounded down to 0 causing a failure at VM startup.

I have not yet followed up the full implications here but I believe the
same rounding down will apply for the card marked mature spaces of G1
and CMS. That's arguably a lesser problem as no one is likely to want
the mature space to be less than 32Mb -- so startup failures are
unlikely to be seen. However, rounding down the mature space size (e.g.
from 48Mb to 32Mb) might still cause problems for apps which hope to
carefully limit their footprint. Also, while the perm gen issue only
affects JDK7 any effect on G1/CMS may be relevant to JDK8 and JDK9.

I don't think we can really live with enforcing JDK7 heap extents to be
multiples of 32Mb. I saw this problem because a jtreg which specified
-XX:MaxPermSize=8M was borked when running on mustang hw but no doubt
there will be real cases where existing configurations attempt to run a
low footprint JVM and rounding down either suffers a startup failure or
results in a much smaller max heap than requested. If AArch64 breaks
cases which work on x86 then I think we need to worry about it --
particularly so for JDK7. So, I think we need to enforce a smaller
granule size which means:

  1) we need to identify some algorithm for deciding what it will be

  2) we then need to to decide how to size and lay out card tables

As regards 1) I suggest we compute the granule size using the current
page size-based algorithm but then impose a cap of 2Mb on JDK7. This
will practically ensure that we can support Perm Gen heaps (and small
CMS/G1 mature gen heaps) at the same sizings as are supported on x86. We
might perhaps adopt a larger cap on JDK8/9 (4Mb? 8Mb?) if we need to
avoid problems with G1/CMS.

For 2) I can see two options depending upon how important we believe it
is that card tables occupy their own dedicated pages. The current code
ensures that the base of the card table is aligned at
os::vm_page_size(). Option a) is to derive the table extent from the
heap size -- i.e. the card table will simply occupy an extent whose size
is a multiple of 4K bytes. Option b) is to over-allocate space for the
table -- if necessary -- so that its extent is fitted into one or more
dedicated 64K pages.

I don't really understand why it is considered important for card tables
to be allocated at os::vm_page_size. If the assumption is that it has
some benefit in terms of cache collisions then I don't know whether that
is really justified with a 64K granularity. I would suggest we just
ignore this for now and allocate the card table using the heap extent
i.e. we follow 1) and 2a).

Anyone have any comments or arguments in favour or against?

regards,

Andrew Dinn
-----------