Thoughts about SubstrateVM GC

Sat Mar 2 02:15:08 UTC 2019

See also https://github.com/oracle/graal/issues/853 for an example of using the `CollectionPolicy$NeverCollect` collection policy.  Though, we did not get any numbers out of that.

			... peter

On 03/ 1/19 03:17 PM, Roman Kennke wrote:
> I am working on that ;-) I have users with a case, and we can prototype and measure with Serial GC vs EpsilonGC+ Aleksey's sliding GC hack (https://shipilev.net/jvm/diy-gc/), <https://urldefense.proofpoint.com/v2/url?u=https-3A__shipilev.net_jvm_diy-2Dgc_-29-2C&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=5PHnD4dL2SCEK_lUGMobCLH5NBctgdkmyl5xLuOZO1A&m=aekfXZLWZQlKVb8Be4--hgZyRCokk_88RWkk345He3I&s=SfSQB0HT77JOaQ35yfT24O8pLV9rkCS9WD6dotXJq1g&e=> that should give us some ideas :-)
> 
> Cheers, Roman
> 
> 
> 
> Am 1. März 2019 23:13:09 MEZ schrieb "Peter B. Kessler" <Peter.B.Kessler at Oracle.COM>:
> 
>     I think we need requirements from users, representative benchmarks or real applications, inspired ideas, measurements of alternatives, and time to write garbage collectors. :-)
> 
>     So far, SubstrateVM has explored one point in the space, and we know that one size does not fit all when it comes to garbage collection.
> 
>     			... peter
> 
>     On 03/ 1/19 12:33 PM, Roman Kennke wrote:
> 
>         Hi Peter,
> 
>         picking up on our comments here:
> 
>         https://github.com/oracle/graal/pull/1015 <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_oracle_graal_pull_1015&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=5PHnD4dL2SCEK_lUGMobCLH5NBctgdkmyl5xLuOZO1A&m=aekfXZLWZQlKVb8Be4--hgZyRCokk_88RWkk345He3I&s=_VwIfmLOnCNwX55vUK8E-l6WJ7rX5cS7zCSGzfgRFFQ&e=>
> 
>         It seems like there is a significant class of applications that would be
>         attractive with SubstrateVM that would be happy to do no GC at all.
>         Shortlived microservices come to mind. For those applications, we'd
>         realistically do want a GC, but only as last resort. In other words, a
>         single-space heap, that would trigger a collection only when exhausted,
>         and then do sliding compaction, would do it, and arguably better than
>         the current GC: it would not require barriers overhead (i.e. pay for
>         something that we bet would never or very rarely happen), it would be
>         able to use the full space, rather than dividing up in generations and
>         semispaces, and as you say, even eliminate the safepoint checks overhead
>         (does SubstrateVM only do safepoints for GC? that would be cool!).
> 
>         Even if it *does* GC, it might still be better off overall: objects
>         would have more time to die, GC frequency would be rarer. With current
>         SubstrateVM I see relatively frequent full-GCs anyway, so rarer GCs with
>         more relative garbage, combined with increased throughput: I guess that
>         might actually work well for certain classes of applications, especially
>         those that would want to run on SubstrateVM anyway?
> 
>         You commented about GraalVM runtime compiler allocating a lot and
>         leaving behind lots of garbage: would that be a concern in SustrateVM's
>         closed world view?
> 
>         WDYT?
> 
>         Roman
> 
>             Two comments inline. And more encouragement to send along your ideas.
> 
>             ... peter
> 
>             On 03/ 1/19 02:16 AM, Roman Kennke wrote:
> 
>                 Hi Christian,
> 
>                 thanks for your replies. This is very interesting. Some additional
>                 comments below inline:
> 
>                         The old generation is 2 semispaces (actually, 4 with the 2 pinned
>                         spaces, which I'll ask about later). When collected, live objects get
>                         scavenged back-and-forth between the two spaces.
> 
> 
>                     yes, in theory. In "reality" there is no contiguous memory for the
>                     spaces, so after a GC all the aligned chunks of the from-space are
>                     either returned to the OS or cached and immediately reused for young
>                     generation allocation.
> 
> 
>                 Aha, ok. This definitely affects future design decisions.
> 
> 
>                         - The policy when to start collecting seems a bit unclear to me. In my
>                         understanding, there is (almost) no use (for STW GCs) in starting a
>                         collection before allocation space is exhausted. Which means, it seems
>                         an obvious trigger to start collection on allocation failure. Yet, the
>                         policies I'm looking at are time-based or time-and-space-based. I said
>                         'almost' because the single use for time-based collection would be
>                         periodic GC that is able to knock out lingering garbage during
>                         no/little-allocation phase of an application, and then only when the GC
>                         is also uncommitting the resulting unused pages (which, afaics,
>                         Substrate GC would do: bravo!). But that doesn't seem to be the
>                         point of
>                         the time-based policies: it looks like the goal of those policies is to
>                         balance time spent in young-vs-old-gen collections.?!
> 
> 
>                     The young generation size is fixed, and a collection is started when
>                     this space is filled. So from that point of view, the starting trigger
>                     of a GC is always space based.
> 
>                     The policy whether to do an incremental (young) collection or a full
>                     collection is time based (but you can easily plug in any policy you
>                     want). The goal is to balance time between incremental and full
>                     collection. We certainly don't want to fill up the old generation
>                     because that is the maximum heap size, and it is by default provisioned
>                     to be very big.
> 
> 
>                 Ok, I see.
>                 Also, the way it's currently done, the old-gen needs to be able to
>                 absorb (worst-case) all of young-gen in next cycle, and therefore needs
>                 *plenty* of headroom. I.e. we need to collect old-gen much earlier than
>                 when it's full (like when remaining free space in old-gen is smaller
>                 than young-gen size). Alternatively, we could exhaust old-gen, and
>                 change young-gen-collection-policy to skip collection if old-gen doesn't
>                 have enough space left, and dive into full-GC right away. Or, even
>                 better, add an intermediate tenuring generation. :-)
> 
> 
>             There is no fixed heap size, or fixed generation sizes. As long as the
>             collector can allocate memory from the OS it can keep adding chunks as
>             needed to the old generation (or the young generation, for that matter.
>             E.g., to delay collection until it is "convenient".) If you run out of
>             address space, or physical memory, then you are in trouble.
> 
> 
> 
>                         With a little bit of distance and squinting of eyes, one can see that
>                         Substrate GC's young generation is really what is called 'nursery
>                         space'
>                         elsewhere, which aims to reduce the rate at which objects get
>                         introduced
>                         into young generation. And the old generation is really what is usually
>                         called young generation elsewhere. What's missing is a true old
>                         generation space.
> 
> 
>                     Not really, because the young generation can be collected independently,
>                     i.e., there are the generational write barriers, remembered sets, ...
> 
>                     So the young generation is reduced to the nursery space, but I argue the
>                     old generation is really an old generation.
> 
> 
>                 Ok.
> 
>                         Considering all this, I would like to propose some improvements:
>                         - Introduce a notion of tenuring objects. I guess we need like 2 age
>                         bits in the header or elsewhere for this. Do we have that room?
> 
> 
>                     You don't need the age bits in the header. You can easily go from the
>                     object to the aligned chunk that the object is in (we do that all the
>                     time, for example for the write barrier to do the card marking), and
>                     store the age in the chunk header. Requiring all objects in one chunk to
>                     have the same age is not much of a limitation.
> 
>                 Right.
> 
>                     Adding tenuring is definitely necessary to achieve reasonable GC
>                     performance.
> 
> 
>                 +1
> 
>                         - Implement a true old-space (and rename the existing young to nursery
>                         and old to young?). In my experience, sliding/mark-compact collection
>                         using a mark bitmap works best for this: it tends to create a
>                         'sediment'
>                         of permanent/very-long-lived objects at the bottom which would never
>                         get
>                         copied again. Using a bitmap, walking of live objects (e.g. during
>                         copying, updating etc) would be very fast: much faster than walking
>                         objects by their size.
> 
> 
>                     A mark-and-compact old generation algorithm definitely makes sense.
>                     Again, the only reason why we don't have it yet is that no one had time
>                     to implement it.
> 
>                     Mark-and-compact is also great to reduce memory footprint. Right now,
>                     during GC the memory footprint can double because of the temporary space
>                     for copying.
> 
> 
>                 Yeah. However, as Peter noted, having no contiguous memory block
>                 complicates this. I'd need to see how to deal with it (per-chunk-bitmap
>                 probably, or maybe mark bit in object header, with some clever tricks to
>                 make scanning the heap fast like serial GC does).
> 
>                         - I am not totally sure about the policies. My current thinking is that
>                         this needs some cleanup/straightening-out, or maybe I am
>                         misunderstanding something there. I believe (fairly strongly) that
>                         allocation failure is the single useful trigger for STW GC, and on top
>                         of that an (optional) periodic GC trigger that would kick in after X
>                         (milli)seconds no GC.
> 
> 
>                     As I mentioned above, the GC trigger is allocation failure for the young
>                     generation.
> 
> 
>                 Ok, good.
> 
>                         - Low-hanging-fruit improvement that could be done right now: allocate
>                         large objects(arrays) straight into old-gen instead of copying them
>                         around. Those are usually long-lived anyway, and copying them
>                         back-and-forth just costs CPU time for no benefit. This will become
>                         even
>                         more pronounced with a true old-gen.
> 
> 
>                     Large arrays are allocated separately in unaligned chunks. Such arrays
>                     are never copied, but only logically moved from the young generation
>                     into the old generation. An unaligned chunk contains exactly one large
>                     array.
> 
> 
>                 Ok, good.
> 
>                         Oh and a question: what's this pinned object/chunks/spaces all about?
> 
> 
>                     There are two mechanisms right now to get objects that are never moved
>                     by the GC:
>                     1) A "normal" object can be temporarily pinned using
>                     org.graalvm.nativeimage.PinnedObject. The current implementation then
>                     keeps the whole aligned chunk that contains the object alive, i.e., it
>                     is designed for pinnings that are released quickly so that no objects
>                     are actually ever pinned when the GC runs, unless the GC runs in an
>                     unlucky moments. We use such pinning for example to pass pointers into
>                     byte[] arrays directly to C functions without copying.
> 
>                     2) A PinnedAllocator can be used to get objects that are non-moving for
>                     a long period of time. This is currently used for the metadata of
>                     runtime compiled code. We are actively working to make PinnedAllocator
>                     unnecessary by putting the metadata into C memory, and then hopefully we
>                     can remove PinnedAllocator and all code that is necessary for it in the
>                     GC, i.e., the notion of pinned spaces you mentioned before.
> 
> 
>                 Ok, I guessed so. I mostly wondered about it because it's got from-space
>                 and to-space:
> 
>                 [pinnedFromSpace:
>                 aligned: 0/0 unaligned: 0/0]
>                 [pinnedToSpace:
>                 aligned: 0/0 unaligned: 0/0]]
> 
>                 And would it ever copy between them? I guess not.
> 
> 
>             The collector logically moves a pinned chunk from pinned from-space to
>             pinned to-space by updating bookkeeping information in the chunk. The
>             contents of the pinned chunk are not moved, and their addresses do not
>             change. If a pinned chunk is unpinned by the application, it is moved
>             to the unpinned from-space and at the next full collection the reachable
>             objects in it are scavenged to the unpinned to-space, like any other
>             objects in unpinned from-space. Between collections the pinned to-space
>             is empty. In your example, the pinned from-space is also empty. Spaces
>             do not represent address ranges, so an empty space is just a few null
>             pointers in the space data structure. (Spaces not being address ranges
>             also complicates answer questions like: Is this object in the young
>             generation.)
> 
>             ... peter
> 
> 
>                         What do you think about all this? Somebody else might have thought
>                         about
>                         all this already, and have some insights that I don't have in my naive
>                         understanding? Maybe some of it is already worked on or planned? Maybe
>                         there are some big obstactles that I don't see yet, that make it less
>                         feasible?
> 
> 
>                     We certainly have ideas and plans, and they match your observations. If
>                     you are interested in contributing, we can definitely give you some
>                     guidance so that you immediately work into the right direction.
> 
> 
>                 Yes, I am. :-)
> 
>                 Roman
> 
> 
> 
> 
> 
> -- 
> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.