Thoughts about SubstrateVM GC
Christian Wimmer
christian.wimmer at oracle.com
Mon Mar 4 18:15:17 UTC 2019
Note that Substrate VM does not have a "mark word" like HotSpot. This
makes mark-and-compact algorithms a bit more complicated because by
default there is no space in an object to store a forwarding pointer
without overwriting some part of the object.
-Christian
On 3/1/19 15:17, Roman Kennke wrote:
> I am working on that ;-) I have users with a case, and we can prototype
> and measure with Serial GC vs EpsilonGC+ Aleksey's sliding GC hack
> (https://shipilev.net/jvm/diy-gc/),
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__shipilev.net_jvm_diy-2Dgc_-29-2C&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=w9vifOOsC2ZsjAx6TFeZGg3A9LGl1ogvmoJFfqxEaCc&m=HAofysyFRXceFWGK5ADd5xstVRFwcRP2z9fAZgjOME0&s=7FE9w5YeWEgNhOfwIDK4-ThsVVln8Xd07cPgbkMEU0w&e=>
> that should give us some ideas :-)
>
> Cheers, Roman
>
>
>
> Am 1. März 2019 23:13:09 MEZ schrieb "Peter B. Kessler"
> <Peter.B.Kessler at Oracle.COM>:
>
> I think we need requirements from users, representative benchmarks or real applications, inspired ideas, measurements of alternatives, and time to write garbage collectors. :-)
>
> So far, SubstrateVM has explored one point in the space, and we know that one size does not fit all when it comes to garbage collection.
>
> ... peter
>
> On 03/ 1/19 12:33 PM, Roman Kennke wrote:
>
> Hi Peter,
>
> picking up on our comments here:
>
> https://github.com/oracle/graal/pull/1015
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_oracle_graal_pull_1015&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=w9vifOOsC2ZsjAx6TFeZGg3A9LGl1ogvmoJFfqxEaCc&m=HAofysyFRXceFWGK5ADd5xstVRFwcRP2z9fAZgjOME0&s=HaipLkqibnH29ob1Klr5vO7Q_LhIlvDpo1B_HQC4Gqc&e=>
>
> It seems like there is a significant class of applications that
> would be
> attractive with SubstrateVM that would be happy to do no GC at all.
> Shortlived microservices come to mind. For those applications, we'd
> realistically do want a GC, but only as last resort. In other
> words, a
> single-space heap, that would trigger a collection only when
> exhausted,
> and then do sliding compaction, would do it, and arguably better
> than
> the current GC: it would not require barriers overhead (i.e. pay for
> something that we bet would never or very rarely happen), it
> would be
> able to use the full space, rather than dividing up in
> generations and
> semispaces, and as you say, even eliminate the safepoint checks
> overhead
> (does SubstrateVM only do safepoints for GC? that would be cool!).
>
> Even if it *does* GC, it might still be better off overall: objects
> would have more time to die, GC frequency would be rarer. With
> current
> SubstrateVM I see relatively frequent full-GCs anyway, so rarer
> GCs with
> more relative garbage, combined with increased throughput: I
> guess that
> might actually work well for certain classes of applications,
> especially
> those that would want to run on SubstrateVM anyway?
>
> You commented about GraalVM runtime compiler allocating a lot and
> leaving behind lots of garbage: would that be a concern in
> SustrateVM's
> closed world view?
>
> WDYT?
>
> Roman
>
> Two comments inline. And more encouragement to send along
> your ideas.
>
> ... peter
>
> On 03/ 1/19 02:16 AM, Roman Kennke wrote:
>
> Hi Christian,
>
> thanks for your replies. This is very interesting. Some
> additional
> comments below inline:
>
> The old generation is 2 semispaces (actually, 4
> with the 2 pinned
> spaces, which I'll ask about later). When
> collected, live objects get
> scavenged back-and-forth between the two spaces.
>
>
> yes, in theory. In "reality" there is no contiguous
> memory for the
> spaces, so after a GC all the aligned chunks of the
> from-space are
> either returned to the OS or cached and immediately
> reused for young
> generation allocation.
>
>
> Aha, ok. This definitely affects future design decisions.
>
>
> - The policy when to start collecting seems a
> bit unclear to me. In my
> understanding, there is (almost) no use (for STW
> GCs) in starting a
> collection before allocation space is exhausted.
> Which means, it seems
> an obvious trigger to start collection on
> allocation failure. Yet, the
> policies I'm looking at are time-based or
> time-and-space-based. I said
> 'almost' because the single use for time-based
> collection would be
> periodic GC that is able to knock out lingering
> garbage during
> no/little-allocation phase of an application,
> and then only when the GC
> is also uncommitting the resulting unused pages
> (which, afaics,
> Substrate GC would do: bravo!). But that doesn't
> seem to be the
> point of
> the time-based policies: it looks like the goal
> of those policies is to
> balance time spent in young-vs-old-gen
> collections.?!
>
>
> The young generation size is fixed, and a collection
> is started when
> this space is filled. So from that point of view,
> the starting trigger
> of a GC is always space based.
>
> The policy whether to do an incremental (young)
> collection or a full
> collection is time based (but you can easily plug in
> any policy you
> want). The goal is to balance time between
> incremental and full
> collection. We certainly don't want to fill up the
> old generation
> because that is the maximum heap size, and it is by
> default provisioned
> to be very big.
>
>
> Ok, I see.
> Also, the way it's currently done, the old-gen needs to
> be able to
> absorb (worst-case) all of young-gen in next cycle, and
> therefore needs
> *plenty* of headroom. I.e. we need to collect old-gen
> much earlier than
> when it's full (like when remaining free space in
> old-gen is smaller
> than young-gen size). Alternatively, we could exhaust
> old-gen, and
> change young-gen-collection-policy to skip collection if
> old-gen doesn't
> have enough space left, and dive into full-GC right
> away. Or, even
> better, add an intermediate tenuring generation. :-)
>
>
> There is no fixed heap size, or fixed generation sizes. As
> long as the
> collector can allocate memory from the OS it can keep adding
> chunks as
> needed to the old generation (or the young generation, for
> that matter.
> E.g., to delay collection until it is "convenient".) If you
> run out of
> address space, or physical memory, then you are in trouble.
>
>
>
> With a little bit of distance and squinting of
> eyes, one can see that
> Substrate GC's young generation is really what
> is called 'nursery
> space'
> elsewhere, which aims to reduce the rate at
> which objects get
> introduced
> into young generation. And the old generation is
> really what is usually
> called young generation elsewhere. What's
> missing is a true old
> generation space.
>
>
> Not really, because the young generation can be
> collected independently,
> i.e., there are the generational write barriers,
> remembered sets, ...
>
> So the young generation is reduced to the nursery
> space, but I argue the
> old generation is really an old generation.
>
>
> Ok.
>
> Considering all this, I would like to propose
> some improvements:
> - Introduce a notion of tenuring objects. I
> guess we need like 2 age
> bits in the header or elsewhere for this. Do we
> have that room?
>
>
> You don't need the age bits in the header. You can
> easily go from the
> object to the aligned chunk that the object is in
> (we do that all the
> time, for example for the write barrier to do the
> card marking), and
> store the age in the chunk header. Requiring all
> objects in one chunk to
> have the same age is not much of a limitation.
>
> Right.
>
> Adding tenuring is definitely necessary to achieve
> reasonable GC
> performance.
>
>
> +1
>
> - Implement a true old-space (and rename the
> existing young to nursery
> and old to young?). In my experience,
> sliding/mark-compact collection
> using a mark bitmap works best for this: it
> tends to create a
> 'sediment'
> of permanent/very-long-lived objects at the
> bottom which would never
> get
> copied again. Using a bitmap, walking of live
> objects (e.g. during
> copying, updating etc) would be very fast: much
> faster than walking
> objects by their size.
>
>
> A mark-and-compact old generation algorithm
> definitely makes sense.
> Again, the only reason why we don't have it yet is
> that no one had time
> to implement it.
>
> Mark-and-compact is also great to reduce memory
> footprint. Right now,
> during GC the memory footprint can double because of
> the temporary space
> for copying.
>
>
> Yeah. However, as Peter noted, having no contiguous
> memory block
> complicates this. I'd need to see how to deal with it
> (per-chunk-bitmap
> probably, or maybe mark bit in object header, with some
> clever tricks to
> make scanning the heap fast like serial GC does).
>
> - I am not totally sure about the policies. My
> current thinking is that
> this needs some cleanup/straightening-out, or
> maybe I am
> misunderstanding something there. I believe
> (fairly strongly) that
> allocation failure is the single useful trigger
> for STW GC, and on top
> of that an (optional) periodic GC trigger that
> would kick in after X
> (milli)seconds no GC.
>
>
> As I mentioned above, the GC trigger is allocation
> failure for the young
> generation.
>
>
> Ok, good.
>
> - Low-hanging-fruit improvement that could be
> done right now: allocate
> large objects(arrays) straight into old-gen
> instead of copying them
> around. Those are usually long-lived anyway, and
> copying them
> back-and-forth just costs CPU time for no
> benefit. This will become
> even
> more pronounced with a true old-gen.
>
>
> Large arrays are allocated separately in unaligned
> chunks. Such arrays
> are never copied, but only logically moved from the
> young generation
> into the old generation. An unaligned chunk contains
> exactly one large
> array.
>
>
> Ok, good.
>
> Oh and a question: what's this pinned
> object/chunks/spaces all about?
>
>
> There are two mechanisms right now to get objects
> that are never moved
> by the GC:
> 1) A "normal" object can be temporarily pinned using
> org.graalvm.nativeimage.PinnedObject. The current
> implementation then
> keeps the whole aligned chunk that contains the
> object alive, i.e., it
> is designed for pinnings that are released quickly
> so that no objects
> are actually ever pinned when the GC runs, unless
> the GC runs in an
> unlucky moments. We use such pinning for example to
> pass pointers into
> byte[] arrays directly to C functions without copying.
>
> 2) A PinnedAllocator can be used to get objects that
> are non-moving for
> a long period of time. This is currently used for
> the metadata of
> runtime compiled code. We are actively working to
> make PinnedAllocator
> unnecessary by putting the metadata into C memory,
> and then hopefully we
> can remove PinnedAllocator and all code that is
> necessary for it in the
> GC, i.e., the notion of pinned spaces you mentioned
> before.
>
>
> Ok, I guessed so. I mostly wondered about it because
> it's got from-space
> and to-space:
>
> [pinnedFromSpace:
> aligned: 0/0 unaligned: 0/0]
> [pinnedToSpace:
> aligned: 0/0 unaligned: 0/0]]
>
> And would it ever copy between them? I guess not.
>
>
> The collector logically moves a pinned chunk from pinned
> from-space to
> pinned to-space by updating bookkeeping information in the
> chunk. The
> contents of the pinned chunk are not moved, and their
> addresses do not
> change. If a pinned chunk is unpinned by the application, it
> is moved
> to the unpinned from-space and at the next full collection
> the reachable
> objects in it are scavenged to the unpinned to-space, like
> any other
> objects in unpinned from-space. Between collections the
> pinned to-space
> is empty. In your example, the pinned from-space is also
> empty. Spaces
> do not represent address ranges, so an empty space is just a
> few null
> pointers in the space data structure. (Spaces not being
> address ranges
> also complicates answer questions like: Is this object in
> the young
> generation.)
>
> ... peter
>
>
> What do you think about all this? Somebody else
> might have thought
> about
> all this already, and have some insights that I
> don't have in my naive
> understanding? Maybe some of it is already
> worked on or planned? Maybe
> there are some big obstactles that I don't see
> yet, that make it less
> feasible?
>
>
> We certainly have ideas and plans, and they match
> your observations. If
> you are interested in contributing, we can
> definitely give you some
> guidance so that you immediately work into the right
> direction.
>
>
> Yes, I am. :-)
>
> Roman
>
>
>
>
>
> --
> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
More information about the graal-dev
mailing list