Thoughts about SubstrateVM GC
Christian Wimmer
christian.wimmer at oracle.com
Thu Feb 28 22:09:34 UTC 2019
Hi Roman,
Thanks for your interest in Substrate VM!
On 2/28/19 12:59, Roman Kennke wrote:
> Hello all,
>
> I hope this is the right mailing list to discuss SubstrateVM? If not,
> please redirect me.
There is no better list, at least at this time.
> During the last couple of days, I did have a closer look at
> SubstrateVM's GC, and also did some experiments. I would like to
> summarize what I found (so that you can correct me if I'm wrong), and
> make a case for some improvements that I would like to work on.
>
> Here's my findings so far:
> Substrate GC is a 2-generation, STW, single-threaded GC.
yes
> The young generation is a single space. When collected, all live objects
> get scavenged out, directly into the old generation.
yes
> The old generation is 2 semispaces (actually, 4 with the 2 pinned
> spaces, which I'll ask about later). When collected, live objects get
> scavenged back-and-forth between the two spaces.
yes, in theory. In "reality" there is no contiguous memory for the
spaces, so after a GC all the aligned chunks of the from-space are
either returned to the OS or cached and immediately reused for young
generation allocation.
> Is that correct so far?
>
> It seemed a bit weird at first to write a Java GC in Java language. :-)
It makes some things easier, e.g., the object layout code used by the GC
can immediately be used in other parts of the VM and the compiler. But
in the end there is C-style memory access of course to actually process
the objects, and that code is more verbose in Java.
> I analyzed a bit of generated assembly code, comparing it side-by-side
> with corresponding Java code, and was actually quite impressed by it.
> It's also got room for improvements, but that was not the major
> bottleneck. The single major bottleneck in my experiments was waiting
> for loads of the mark word during scavenging, in other words, it's doing
> way too much of it ;-)
>
> I have noticed a bunch of problems so far:
> - The promotion rate between young-gen and old-gen seems fairly hot.
> This is because there is no notion of tenuring objects or so.
> - This implies that there are relatively many old-gen collections
> happening, which seriously affect application throughput (once they happen)
> - Because of the above, the usual wisdoms from other GCs don't apply: I
> could get significant improvements (i.e. fewer diving into full-GCs) by
> configuring a small young-gen (like 10%) and large old-gen (like 90%).
> But that's not really great either.
Our "design goal" was to start out with the simplest possible GC
implementation that is viable (i.e., generational). That was the
starting point, and we have not had time yet to do something better. But
we know something better is certainly needed.
> - The policy when to start collecting seems a bit unclear to me. In my
> understanding, there is (almost) no use (for STW GCs) in starting a
> collection before allocation space is exhausted. Which means, it seems
> an obvious trigger to start collection on allocation failure. Yet, the
> policies I'm looking at are time-based or time-and-space-based. I said
> 'almost' because the single use for time-based collection would be
> periodic GC that is able to knock out lingering garbage during
> no/little-allocation phase of an application, and then only when the GC
> is also uncommitting the resulting unused pages (which, afaics,
> Substrate GC would do: bravo!). But that doesn't seem to be the point of
> the time-based policies: it looks like the goal of those policies is to
> balance time spent in young-vs-old-gen collections.?!
The young generation size is fixed, and a collection is started when
this space is filled. So from that point of view, the starting trigger
of a GC is always space based.
The policy whether to do an incremental (young) collection or a full
collection is time based (but you can easily plug in any policy you
want). The goal is to balance time between incremental and full
collection. We certainly don't want to fill up the old generation
because that is the maximum heap size, and it is by default provisioned
to be very big.
> With a little bit of distance and squinting of eyes, one can see that
> Substrate GC's young generation is really what is called 'nursery space'
> elsewhere, which aims to reduce the rate at which objects get introduced
> into young generation. And the old generation is really what is usually
> called young generation elsewhere. What's missing is a true old
> generation space.
Not really, because the young generation can be collected independently,
i.e., there are the generational write barriers, remembered sets, ...
So the young generation is reduced to the nursery space, but I argue the
old generation is really an old generation.
> Considering all this, I would like to propose some improvements:
> - Introduce a notion of tenuring objects. I guess we need like 2 age
> bits in the header or elsewhere for this. Do we have that room?
You don't need the age bits in the header. You can easily go from the
object to the aligned chunk that the object is in (we do that all the
time, for example for the write barrier to do the card marking), and
store the age in the chunk header. Requiring all objects in one chunk to
have the same age is not much of a limitation.
Adding tenuring is definitely necessary to achieve reasonable GC
performance.
> - Implement a true old-space (and rename the existing young to nursery
> and old to young?). In my experience, sliding/mark-compact collection
> using a mark bitmap works best for this: it tends to create a 'sediment'
> of permanent/very-long-lived objects at the bottom which would never get
> copied again. Using a bitmap, walking of live objects (e.g. during
> copying, updating etc) would be very fast: much faster than walking
> objects by their size.
A mark-and-compact old generation algorithm definitely makes sense.
Again, the only reason why we don't have it yet is that no one had time
to implement it.
Mark-and-compact is also great to reduce memory footprint. Right now,
during GC the memory footprint can double because of the temporary space
for copying.
> - I am not totally sure about the policies. My current thinking is that
> this needs some cleanup/straightening-out, or maybe I am
> misunderstanding something there. I believe (fairly strongly) that
> allocation failure is the single useful trigger for STW GC, and on top
> of that an (optional) periodic GC trigger that would kick in after X
> (milli)seconds no GC.
As I mentioned above, the GC trigger is allocation failure for the young
generation.
> - Low-hanging-fruit improvement that could be done right now: allocate
> large objects(arrays) straight into old-gen instead of copying them
> around. Those are usually long-lived anyway, and copying them
> back-and-forth just costs CPU time for no benefit. This will become even
> more pronounced with a true old-gen.
Large arrays are allocated separately in unaligned chunks. Such arrays
are never copied, but only logically moved from the young generation
into the old generation. An unaligned chunk contains exactly one large
array.
> Oh and a question: what's this pinned object/chunks/spaces all about?
There are two mechanisms right now to get objects that are never moved
by the GC:
1) A "normal" object can be temporarily pinned using
org.graalvm.nativeimage.PinnedObject. The current implementation then
keeps the whole aligned chunk that contains the object alive, i.e., it
is designed for pinnings that are released quickly so that no objects
are actually ever pinned when the GC runs, unless the GC runs in an
unlucky moments. We use such pinning for example to pass pointers into
byte[] arrays directly to C functions without copying.
2) A PinnedAllocator can be used to get objects that are non-moving for
a long period of time. This is currently used for the metadata of
runtime compiled code. We are actively working to make PinnedAllocator
unnecessary by putting the metadata into C memory, and then hopefully we
can remove PinnedAllocator and all code that is necessary for it in the
GC, i.e., the notion of pinned spaces you mentioned before.
> What do you think about all this? Somebody else might have thought about
> all this already, and have some insights that I don't have in my naive
> understanding? Maybe some of it is already worked on or planned? Maybe
> there are some big obstactles that I don't see yet, that make it less
> feasible?
We certainly have ideas and plans, and they match your observations. If
you are interested in contributing, we can definitely give you some
guidance so that you immediately work into the right direction.
-Christian
More information about the graal-dev
mailing list