Thoughts about SubstrateVM GC

Mon Mar 4 18:15:17 UTC 2019

Note that Substrate VM does not have a "mark word" like HotSpot. This 
makes mark-and-compact algorithms a bit more complicated because by 
default there is no space in an object to store a forwarding pointer 
without overwriting some part of the object.

-Christian

On 3/1/19 15:17, Roman Kennke wrote:
> I am working on that ;-) I have users with a case, and we can prototype 
> and measure with Serial GC vs EpsilonGC+ Aleksey's sliding GC hack 
> (https://shipilev.net/jvm/diy-gc/), 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__shipilev.net_jvm_diy-2Dgc_-29-2C&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=w9vifOOsC2ZsjAx6TFeZGg3A9LGl1ogvmoJFfqxEaCc&m=HAofysyFRXceFWGK5ADd5xstVRFwcRP2z9fAZgjOME0&s=7FE9w5YeWEgNhOfwIDK4-ThsVVln8Xd07cPgbkMEU0w&e=> 
> that should give us some ideas :-)
> 
> Cheers, Roman
> 
> 
> 
> Am 1. März 2019 23:13:09 MEZ schrieb "Peter B. Kessler" 
> <Peter.B.Kessler at Oracle.COM>:
> 
>     I think we need requirements from users, representative benchmarks or real applications, inspired ideas, measurements of alternatives, and time to write garbage collectors. :-)
> 
>     So far, SubstrateVM has explored one point in the space, and we know that one size does not fit all when it comes to garbage collection.
> 
>     			... peter
> 
>     On 03/ 1/19 12:33 PM, Roman Kennke wrote:
> 
>         Hi Peter,
> 
>         picking up on our comments here:
> 
>         https://github.com/oracle/graal/pull/1015
>         <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_oracle_graal_pull_1015&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=w9vifOOsC2ZsjAx6TFeZGg3A9LGl1ogvmoJFfqxEaCc&m=HAofysyFRXceFWGK5ADd5xstVRFwcRP2z9fAZgjOME0&s=HaipLkqibnH29ob1Klr5vO7Q_LhIlvDpo1B_HQC4Gqc&e=>
> 
>         It seems like there is a significant class of applications that
>         would be
>         attractive with SubstrateVM that would be happy to do no GC at all.
>         Shortlived microservices come to mind. For those applications, we'd
>         realistically do want a GC, but only as last resort. In other
>         words, a
>         single-space heap, that would trigger a collection only when
>         exhausted,
>         and then do sliding compaction, would do it, and arguably better
>         than
>         the current GC: it would not require barriers overhead (i.e. pay for
>         something that we bet would never or very rarely happen), it
>         would be
>         able to use the full space, rather than dividing up in
>         generations and
>         semispaces, and as you say, even eliminate the safepoint checks
>         overhead
>         (does SubstrateVM only do safepoints for GC? that would be cool!).
> 
>         Even if it *does* GC, it might still be better off overall: objects
>         would have more time to die, GC frequency would be rarer. With
>         current
>         SubstrateVM I see relatively frequent full-GCs anyway, so rarer
>         GCs with
>         more relative garbage, combined with increased throughput: I
>         guess that
>         might actually work well for certain classes of applications,
>         especially
>         those that would want to run on SubstrateVM anyway?
> 
>         You commented about GraalVM runtime compiler allocating a lot and
>         leaving behind lots of garbage: would that be a concern in
>         SustrateVM's
>         closed world view?
> 
>         WDYT?
> 
>         Roman
> 
>             Two comments inline. And more encouragement to send along
>             your ideas.
> 
>             ... peter
> 
>             On 03/ 1/19 02:16 AM, Roman Kennke wrote:
> 
>                 Hi Christian,
> 
>                 thanks for your replies. This is very interesting. Some
>                 additional
>                 comments below inline:
> 
>                         The old generation is 2 semispaces (actually, 4
>                         with the 2 pinned
>                         spaces, which I'll ask about later). When
>                         collected, live objects get
>                         scavenged back-and-forth between the two spaces.
> 
> 
>                     yes, in theory. In "reality" there is no contiguous
>                     memory for the
>                     spaces, so after a GC all the aligned chunks of the
>                     from-space are
>                     either returned to the OS or cached and immediately
>                     reused for young
>                     generation allocation.
> 
> 
>                 Aha, ok. This definitely affects future design decisions.
> 
> 
>                         - The policy when to start collecting seems a
>                         bit unclear to me. In my
>                         understanding, there is (almost) no use (for STW
>                         GCs) in starting a
>                         collection before allocation space is exhausted.
>                         Which means, it seems
>                         an obvious trigger to start collection on
>                         allocation failure. Yet, the
>                         policies I'm looking at are time-based or
>                         time-and-space-based. I said
>                         'almost' because the single use for time-based
>                         collection would be
>                         periodic GC that is able to knock out lingering
>                         garbage during
>                         no/little-allocation phase of an application,
>                         and then only when the GC
>                         is also uncommitting the resulting unused pages
>                         (which, afaics,
>                         Substrate GC would do: bravo!). But that doesn't
>                         seem to be the
>                         point of
>                         the time-based policies: it looks like the goal
>                         of those policies is to
>                         balance time spent in young-vs-old-gen
>                         collections.?!
> 
> 
>                     The young generation size is fixed, and a collection
>                     is started when
>                     this space is filled. So from that point of view,
>                     the starting trigger
>                     of a GC is always space based.
> 
>                     The policy whether to do an incremental (young)
>                     collection or a full
>                     collection is time based (but you can easily plug in
>                     any policy you
>                     want). The goal is to balance time between
>                     incremental and full
>                     collection. We certainly don't want to fill up the
>                     old generation
>                     because that is the maximum heap size, and it is by
>                     default provisioned
>                     to be very big.
> 
> 
>                 Ok, I see.
>                 Also, the way it's currently done, the old-gen needs to
>                 be able to
>                 absorb (worst-case) all of young-gen in next cycle, and
>                 therefore needs
>                 *plenty* of headroom. I.e. we need to collect old-gen
>                 much earlier than
>                 when it's full (like when remaining free space in
>                 old-gen is smaller
>                 than young-gen size). Alternatively, we could exhaust
>                 old-gen, and
>                 change young-gen-collection-policy to skip collection if
>                 old-gen doesn't
>                 have enough space left, and dive into full-GC right
>                 away. Or, even
>                 better, add an intermediate tenuring generation. :-)
> 
> 
>             There is no fixed heap size, or fixed generation sizes. As
>             long as the
>             collector can allocate memory from the OS it can keep adding
>             chunks as
>             needed to the old generation (or the young generation, for
>             that matter.
>             E.g., to delay collection until it is "convenient".) If you
>             run out of
>             address space, or physical memory, then you are in trouble.
> 
> 
> 
>                         With a little bit of distance and squinting of
>                         eyes, one can see that
>                         Substrate GC's young generation is really what
>                         is called 'nursery
>                         space'
>                         elsewhere, which aims to reduce the rate at
>                         which objects get
>                         introduced
>                         into young generation. And the old generation is
>                         really what is usually
>                         called young generation elsewhere. What's
>                         missing is a true old
>                         generation space.
> 
> 
>                     Not really, because the young generation can be
>                     collected independently,
>                     i.e., there are the generational write barriers,
>                     remembered sets, ...
> 
>                     So the young generation is reduced to the nursery
>                     space, but I argue the
>                     old generation is really an old generation.
> 
> 
>                 Ok.
> 
>                         Considering all this, I would like to propose
>                         some improvements:
>                         - Introduce a notion of tenuring objects. I
>                         guess we need like 2 age
>                         bits in the header or elsewhere for this. Do we
>                         have that room?
> 
> 
>                     You don't need the age bits in the header. You can
>                     easily go from the
>                     object to the aligned chunk that the object is in
>                     (we do that all the
>                     time, for example for the write barrier to do the
>                     card marking), and
>                     store the age in the chunk header. Requiring all
>                     objects in one chunk to
>                     have the same age is not much of a limitation.
> 
>                 Right.
> 
>                     Adding tenuring is definitely necessary to achieve
>                     reasonable GC
>                     performance.
> 
> 
>                 +1
> 
>                         - Implement a true old-space (and rename the
>                         existing young to nursery
>                         and old to young?). In my experience,
>                         sliding/mark-compact collection
>                         using a mark bitmap works best for this: it
>                         tends to create a
>                         'sediment'
>                         of permanent/very-long-lived objects at the
>                         bottom which would never
>                         get
>                         copied again. Using a bitmap, walking of live
>                         objects (e.g. during
>                         copying, updating etc) would be very fast: much
>                         faster than walking
>                         objects by their size.
> 
> 
>                     A mark-and-compact old generation algorithm
>                     definitely makes sense.
>                     Again, the only reason why we don't have it yet is
>                     that no one had time
>                     to implement it.
> 
>                     Mark-and-compact is also great to reduce memory
>                     footprint. Right now,
>                     during GC the memory footprint can double because of
>                     the temporary space
>                     for copying.
> 
> 
>                 Yeah. However, as Peter noted, having no contiguous
>                 memory block
>                 complicates this. I'd need to see how to deal with it
>                 (per-chunk-bitmap
>                 probably, or maybe mark bit in object header, with some
>                 clever tricks to
>                 make scanning the heap fast like serial GC does).
> 
>                         - I am not totally sure about the policies. My
>                         current thinking is that
>                         this needs some cleanup/straightening-out, or
>                         maybe I am
>                         misunderstanding something there. I believe
>                         (fairly strongly) that
>                         allocation failure is the single useful trigger
>                         for STW GC, and on top
>                         of that an (optional) periodic GC trigger that
>                         would kick in after X
>                         (milli)seconds no GC.
> 
> 
>                     As I mentioned above, the GC trigger is allocation
>                     failure for the young
>                     generation.
> 
> 
>                 Ok, good.
> 
>                         - Low-hanging-fruit improvement that could be
>                         done right now: allocate
>                         large objects(arrays) straight into old-gen
>                         instead of copying them
>                         around. Those are usually long-lived anyway, and
>                         copying them
>                         back-and-forth just costs CPU time for no
>                         benefit. This will become
>                         even
>                         more pronounced with a true old-gen.
> 
> 
>                     Large arrays are allocated separately in unaligned
>                     chunks. Such arrays
>                     are never copied, but only logically moved from the
>                     young generation
>                     into the old generation. An unaligned chunk contains
>                     exactly one large
>                     array.
> 
> 
>                 Ok, good.
> 
>                         Oh and a question: what's this pinned
>                         object/chunks/spaces all about?
> 
> 
>                     There are two mechanisms right now to get objects
>                     that are never moved
>                     by the GC:
>                     1) A "normal" object can be temporarily pinned using
>                     org.graalvm.nativeimage.PinnedObject. The current
>                     implementation then
>                     keeps the whole aligned chunk that contains the
>                     object alive, i.e., it
>                     is designed for pinnings that are released quickly
>                     so that no objects
>                     are actually ever pinned when the GC runs, unless
>                     the GC runs in an
>                     unlucky moments. We use such pinning for example to
>                     pass pointers into
>                     byte[] arrays directly to C functions without copying.
> 
>                     2) A PinnedAllocator can be used to get objects that
>                     are non-moving for
>                     a long period of time. This is currently used for
>                     the metadata of
>                     runtime compiled code. We are actively working to
>                     make PinnedAllocator
>                     unnecessary by putting the metadata into C memory,
>                     and then hopefully we
>                     can remove PinnedAllocator and all code that is
>                     necessary for it in the
>                     GC, i.e., the notion of pinned spaces you mentioned
>                     before.
> 
> 
>                 Ok, I guessed so. I mostly wondered about it because
>                 it's got from-space
>                 and to-space:
> 
>                 [pinnedFromSpace:
>                 aligned: 0/0 unaligned: 0/0]
>                 [pinnedToSpace:
>                 aligned: 0/0 unaligned: 0/0]]
> 
>                 And would it ever copy between them? I guess not.
> 
> 
>             The collector logically moves a pinned chunk from pinned
>             from-space to
>             pinned to-space by updating bookkeeping information in the
>             chunk. The
>             contents of the pinned chunk are not moved, and their
>             addresses do not
>             change. If a pinned chunk is unpinned by the application, it
>             is moved
>             to the unpinned from-space and at the next full collection
>             the reachable
>             objects in it are scavenged to the unpinned to-space, like
>             any other
>             objects in unpinned from-space. Between collections the
>             pinned to-space
>             is empty. In your example, the pinned from-space is also
>             empty. Spaces
>             do not represent address ranges, so an empty space is just a
>             few null
>             pointers in the space data structure. (Spaces not being
>             address ranges
>             also complicates answer questions like: Is this object in
>             the young
>             generation.)
> 
>             ... peter
> 
> 
>                         What do you think about all this? Somebody else
>                         might have thought
>                         about
>                         all this already, and have some insights that I
>                         don't have in my naive
>                         understanding? Maybe some of it is already
>                         worked on or planned? Maybe
>                         there are some big obstactles that I don't see
>                         yet, that make it less
>                         feasible?
> 
> 
>                     We certainly have ideas and plans, and they match
>                     your observations. If
>                     you are interested in contributing, we can
>                     definitely give you some
>                     guidance so that you immediately work into the right
>                     direction.
> 
> 
>                 Yes, I am. :-)
> 
>                 Roman
> 
> 
> 
> 
> 
> -- 
> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.