Thoughts about SubstrateVM GC

Thu Feb 28 20:59:22 UTC 2019

Hello all,

I hope this is the right mailing list to discuss SubstrateVM? If not,
please redirect me.

During the last couple of days, I did have a closer look at
SubstrateVM's GC, and also did some experiments. I would like to
summarize what I found (so that you can correct me if I'm wrong), and
make a case for some improvements that I would like to work on.

Here's my findings so far:
Substrate GC is a 2-generation, STW, single-threaded GC.

The young generation is a single space. When collected, all live objects
get scavenged out, directly into the old generation.

The old generation is 2 semispaces (actually, 4 with the 2 pinned
spaces, which I'll ask about later). When collected, live objects get
scavenged back-and-forth between the two spaces.

Is that correct so far?

It seemed a bit weird at first to write a Java GC in Java language. :-)
I analyzed a bit of generated assembly code, comparing it side-by-side
with corresponding Java code, and was actually quite impressed by it.
It's also got room for improvements, but that was not the major
bottleneck. The single major bottleneck in my experiments was waiting
for loads of the mark word during scavenging, in other words, it's doing
way too much of it ;-)

I have noticed a bunch of problems so far:
- The promotion rate between young-gen and old-gen seems fairly hot.
This is because there is no notion of tenuring objects or so.
- This implies that there are relatively many old-gen collections
happening, which seriously affect application throughput (once they happen)
- Because of the above, the usual wisdoms from other GCs don't apply: I
could get significant improvements (i.e. fewer diving into full-GCs) by
configuring a small young-gen (like 10%) and large old-gen (like 90%).
But that's not really great either.
- The policy when to start collecting seems a bit unclear to me. In my
understanding, there is (almost) no use (for STW GCs) in starting a
collection before allocation space is exhausted. Which means, it seems
an obvious trigger to start collection on allocation failure. Yet, the
policies I'm looking at are time-based or time-and-space-based. I said
'almost' because the single use for time-based collection would be
periodic GC that is able to knock out lingering garbage during
no/little-allocation phase of an application, and then only when the GC
is also uncommitting the resulting unused pages (which, afaics,
Substrate GC would do: bravo!). But that doesn't seem to be the point of
the time-based policies: it looks like the goal of those policies is to
balance time spent in young-vs-old-gen collections.?!

With a little bit of distance and squinting of eyes, one can see that
Substrate GC's young generation is really what is called 'nursery space'
elsewhere, which aims to reduce the rate at which objects get introduced
into young generation. And the old generation is really what is usually
called young generation elsewhere. What's missing is a true old
generation space.

Considering all this, I would like to propose some improvements:
- Introduce a notion of tenuring objects. I guess we need like 2 age
bits in the header or elsewhere for this. Do we have that room?
- Implement a true old-space (and rename the existing young to nursery
and old to young?). In my experience, sliding/mark-compact collection
using a mark bitmap works best for this: it tends to create a 'sediment'
of permanent/very-long-lived objects at the bottom which would never get
copied again. Using a bitmap, walking of live objects (e.g. during
copying, updating etc) would be very fast: much faster than walking
objects by their size.
- I am not totally sure about the policies. My current thinking is that
this needs some cleanup/straightening-out, or maybe I am
misunderstanding something there. I believe (fairly strongly) that
allocation failure is the single useful trigger for STW GC, and on top
of that an (optional) periodic GC trigger that would kick in after X
(milli)seconds no GC.
- Low-hanging-fruit improvement that could be done right now: allocate
large objects(arrays) straight into old-gen instead of copying them
around. Those are usually long-lived anyway, and copying them
back-and-forth just costs CPU time for no benefit. This will become even
more pronounced with a true old-gen.

Oh and a question: what's this pinned object/chunks/spaces all about?

What do you think about all this? Somebody else might have thought about
all this already, and have some insights that I don't have in my naive
understanding? Maybe some of it is already worked on or planned? Maybe
there are some big obstactles that I don't see yet, that make it less
feasible?

Thanks and best regards!
Roman