Thoughts about SubstrateVM GC
Peter B. Kessler
Peter.B.Kessler at Oracle.COM
Sat Mar 2 02:15:08 UTC 2019
See also https://github.com/oracle/graal/issues/853 for an example of using the `CollectionPolicy$NeverCollect` collection policy. Though, we did not get any numbers out of that.
... peter
On 03/ 1/19 03:17 PM, Roman Kennke wrote:
> I am working on that ;-) I have users with a case, and we can prototype and measure with Serial GC vs EpsilonGC+ Aleksey's sliding GC hack (https://shipilev.net/jvm/diy-gc/), <https://urldefense.proofpoint.com/v2/url?u=https-3A__shipilev.net_jvm_diy-2Dgc_-29-2C&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=5PHnD4dL2SCEK_lUGMobCLH5NBctgdkmyl5xLuOZO1A&m=aekfXZLWZQlKVb8Be4--hgZyRCokk_88RWkk345He3I&s=SfSQB0HT77JOaQ35yfT24O8pLV9rkCS9WD6dotXJq1g&e=> that should give us some ideas :-)
>
> Cheers, Roman
>
>
>
> Am 1. März 2019 23:13:09 MEZ schrieb "Peter B. Kessler" <Peter.B.Kessler at Oracle.COM>:
>
> I think we need requirements from users, representative benchmarks or real applications, inspired ideas, measurements of alternatives, and time to write garbage collectors. :-)
>
> So far, SubstrateVM has explored one point in the space, and we know that one size does not fit all when it comes to garbage collection.
>
> ... peter
>
> On 03/ 1/19 12:33 PM, Roman Kennke wrote:
>
> Hi Peter,
>
> picking up on our comments here:
>
> https://github.com/oracle/graal/pull/1015 <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_oracle_graal_pull_1015&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=5PHnD4dL2SCEK_lUGMobCLH5NBctgdkmyl5xLuOZO1A&m=aekfXZLWZQlKVb8Be4--hgZyRCokk_88RWkk345He3I&s=_VwIfmLOnCNwX55vUK8E-l6WJ7rX5cS7zCSGzfgRFFQ&e=>
>
> It seems like there is a significant class of applications that would be
> attractive with SubstrateVM that would be happy to do no GC at all.
> Shortlived microservices come to mind. For those applications, we'd
> realistically do want a GC, but only as last resort. In other words, a
> single-space heap, that would trigger a collection only when exhausted,
> and then do sliding compaction, would do it, and arguably better than
> the current GC: it would not require barriers overhead (i.e. pay for
> something that we bet would never or very rarely happen), it would be
> able to use the full space, rather than dividing up in generations and
> semispaces, and as you say, even eliminate the safepoint checks overhead
> (does SubstrateVM only do safepoints for GC? that would be cool!).
>
> Even if it *does* GC, it might still be better off overall: objects
> would have more time to die, GC frequency would be rarer. With current
> SubstrateVM I see relatively frequent full-GCs anyway, so rarer GCs with
> more relative garbage, combined with increased throughput: I guess that
> might actually work well for certain classes of applications, especially
> those that would want to run on SubstrateVM anyway?
>
> You commented about GraalVM runtime compiler allocating a lot and
> leaving behind lots of garbage: would that be a concern in SustrateVM's
> closed world view?
>
> WDYT?
>
> Roman
>
> Two comments inline. And more encouragement to send along your ideas.
>
> ... peter
>
> On 03/ 1/19 02:16 AM, Roman Kennke wrote:
>
> Hi Christian,
>
> thanks for your replies. This is very interesting. Some additional
> comments below inline:
>
> The old generation is 2 semispaces (actually, 4 with the 2 pinned
> spaces, which I'll ask about later). When collected, live objects get
> scavenged back-and-forth between the two spaces.
>
>
> yes, in theory. In "reality" there is no contiguous memory for the
> spaces, so after a GC all the aligned chunks of the from-space are
> either returned to the OS or cached and immediately reused for young
> generation allocation.
>
>
> Aha, ok. This definitely affects future design decisions.
>
>
> - The policy when to start collecting seems a bit unclear to me. In my
> understanding, there is (almost) no use (for STW GCs) in starting a
> collection before allocation space is exhausted. Which means, it seems
> an obvious trigger to start collection on allocation failure. Yet, the
> policies I'm looking at are time-based or time-and-space-based. I said
> 'almost' because the single use for time-based collection would be
> periodic GC that is able to knock out lingering garbage during
> no/little-allocation phase of an application, and then only when the GC
> is also uncommitting the resulting unused pages (which, afaics,
> Substrate GC would do: bravo!). But that doesn't seem to be the
> point of
> the time-based policies: it looks like the goal of those policies is to
> balance time spent in young-vs-old-gen collections.?!
>
>
> The young generation size is fixed, and a collection is started when
> this space is filled. So from that point of view, the starting trigger
> of a GC is always space based.
>
> The policy whether to do an incremental (young) collection or a full
> collection is time based (but you can easily plug in any policy you
> want). The goal is to balance time between incremental and full
> collection. We certainly don't want to fill up the old generation
> because that is the maximum heap size, and it is by default provisioned
> to be very big.
>
>
> Ok, I see.
> Also, the way it's currently done, the old-gen needs to be able to
> absorb (worst-case) all of young-gen in next cycle, and therefore needs
> *plenty* of headroom. I.e. we need to collect old-gen much earlier than
> when it's full (like when remaining free space in old-gen is smaller
> than young-gen size). Alternatively, we could exhaust old-gen, and
> change young-gen-collection-policy to skip collection if old-gen doesn't
> have enough space left, and dive into full-GC right away. Or, even
> better, add an intermediate tenuring generation. :-)
>
>
> There is no fixed heap size, or fixed generation sizes. As long as the
> collector can allocate memory from the OS it can keep adding chunks as
> needed to the old generation (or the young generation, for that matter.
> E.g., to delay collection until it is "convenient".) If you run out of
> address space, or physical memory, then you are in trouble.
>
>
>
> With a little bit of distance and squinting of eyes, one can see that
> Substrate GC's young generation is really what is called 'nursery
> space'
> elsewhere, which aims to reduce the rate at which objects get
> introduced
> into young generation. And the old generation is really what is usually
> called young generation elsewhere. What's missing is a true old
> generation space.
>
>
> Not really, because the young generation can be collected independently,
> i.e., there are the generational write barriers, remembered sets, ...
>
> So the young generation is reduced to the nursery space, but I argue the
> old generation is really an old generation.
>
>
> Ok.
>
> Considering all this, I would like to propose some improvements:
> - Introduce a notion of tenuring objects. I guess we need like 2 age
> bits in the header or elsewhere for this. Do we have that room?
>
>
> You don't need the age bits in the header. You can easily go from the
> object to the aligned chunk that the object is in (we do that all the
> time, for example for the write barrier to do the card marking), and
> store the age in the chunk header. Requiring all objects in one chunk to
> have the same age is not much of a limitation.
>
> Right.
>
> Adding tenuring is definitely necessary to achieve reasonable GC
> performance.
>
>
> +1
>
> - Implement a true old-space (and rename the existing young to nursery
> and old to young?). In my experience, sliding/mark-compact collection
> using a mark bitmap works best for this: it tends to create a
> 'sediment'
> of permanent/very-long-lived objects at the bottom which would never
> get
> copied again. Using a bitmap, walking of live objects (e.g. during
> copying, updating etc) would be very fast: much faster than walking
> objects by their size.
>
>
> A mark-and-compact old generation algorithm definitely makes sense.
> Again, the only reason why we don't have it yet is that no one had time
> to implement it.
>
> Mark-and-compact is also great to reduce memory footprint. Right now,
> during GC the memory footprint can double because of the temporary space
> for copying.
>
>
> Yeah. However, as Peter noted, having no contiguous memory block
> complicates this. I'd need to see how to deal with it (per-chunk-bitmap
> probably, or maybe mark bit in object header, with some clever tricks to
> make scanning the heap fast like serial GC does).
>
> - I am not totally sure about the policies. My current thinking is that
> this needs some cleanup/straightening-out, or maybe I am
> misunderstanding something there. I believe (fairly strongly) that
> allocation failure is the single useful trigger for STW GC, and on top
> of that an (optional) periodic GC trigger that would kick in after X
> (milli)seconds no GC.
>
>
> As I mentioned above, the GC trigger is allocation failure for the young
> generation.
>
>
> Ok, good.
>
> - Low-hanging-fruit improvement that could be done right now: allocate
> large objects(arrays) straight into old-gen instead of copying them
> around. Those are usually long-lived anyway, and copying them
> back-and-forth just costs CPU time for no benefit. This will become
> even
> more pronounced with a true old-gen.
>
>
> Large arrays are allocated separately in unaligned chunks. Such arrays
> are never copied, but only logically moved from the young generation
> into the old generation. An unaligned chunk contains exactly one large
> array.
>
>
> Ok, good.
>
> Oh and a question: what's this pinned object/chunks/spaces all about?
>
>
> There are two mechanisms right now to get objects that are never moved
> by the GC:
> 1) A "normal" object can be temporarily pinned using
> org.graalvm.nativeimage.PinnedObject. The current implementation then
> keeps the whole aligned chunk that contains the object alive, i.e., it
> is designed for pinnings that are released quickly so that no objects
> are actually ever pinned when the GC runs, unless the GC runs in an
> unlucky moments. We use such pinning for example to pass pointers into
> byte[] arrays directly to C functions without copying.
>
> 2) A PinnedAllocator can be used to get objects that are non-moving for
> a long period of time. This is currently used for the metadata of
> runtime compiled code. We are actively working to make PinnedAllocator
> unnecessary by putting the metadata into C memory, and then hopefully we
> can remove PinnedAllocator and all code that is necessary for it in the
> GC, i.e., the notion of pinned spaces you mentioned before.
>
>
> Ok, I guessed so. I mostly wondered about it because it's got from-space
> and to-space:
>
> [pinnedFromSpace:
> aligned: 0/0 unaligned: 0/0]
> [pinnedToSpace:
> aligned: 0/0 unaligned: 0/0]]
>
> And would it ever copy between them? I guess not.
>
>
> The collector logically moves a pinned chunk from pinned from-space to
> pinned to-space by updating bookkeeping information in the chunk. The
> contents of the pinned chunk are not moved, and their addresses do not
> change. If a pinned chunk is unpinned by the application, it is moved
> to the unpinned from-space and at the next full collection the reachable
> objects in it are scavenged to the unpinned to-space, like any other
> objects in unpinned from-space. Between collections the pinned to-space
> is empty. In your example, the pinned from-space is also empty. Spaces
> do not represent address ranges, so an empty space is just a few null
> pointers in the space data structure. (Spaces not being address ranges
> also complicates answer questions like: Is this object in the young
> generation.)
>
> ... peter
>
>
> What do you think about all this? Somebody else might have thought
> about
> all this already, and have some insights that I don't have in my naive
> understanding? Maybe some of it is already worked on or planned? Maybe
> there are some big obstactles that I don't see yet, that make it less
> feasible?
>
>
> We certainly have ideas and plans, and they match your observations. If
> you are interested in contributing, we can definitely give you some
> guidance so that you immediately work into the right direction.
>
>
> Yes, I am. :-)
>
> Roman
>
>
>
>
>
> --
> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
More information about the graal-dev
mailing list