Thoughts about SubstrateVM GC

Fri Mar 1 23:17:58 UTC 2019

I am working on that ;-) I have users with a case, and we can prototype and measure with Serial GC vs EpsilonGC+ Aleksey's sliding GC hack (https://shipilev.net/jvm/diy-gc/), that should give us some ideas :-)

Cheers, Roman


Am 1. März 2019 23:13:09 MEZ schrieb "Peter B. Kessler" <Peter.B.Kessler at Oracle.COM>:
>I think we need requirements from users, representative benchmarks or
>real applications, inspired ideas, measurements of alternatives, and
>time to write garbage collectors. :-)
>
>So far, SubstrateVM has explored one point in the space, and we know
>that one size does not fit all when it comes to garbage collection.
>
>			... peter
>
>On 03/ 1/19 12:33 PM, Roman Kennke wrote:
>> Hi Peter,
>> 
>> picking up on our comments here:
>> 
>> https://github.com/oracle/graal/pull/1015
>> 
>> It seems like there is a significant class of applications that would
>be
>> attractive with SubstrateVM that would be happy to do no GC at all.
>> Shortlived microservices come to mind. For those applications, we'd
>> realistically do want a GC, but only as last resort. In other words,
>a
>> single-space heap, that would trigger a collection only when
>exhausted,
>> and then do sliding compaction, would do it, and arguably better than
>> the current GC: it would not require barriers overhead (i.e. pay for
>> something that we bet would never or very rarely happen), it would be
>> able to use the full space, rather than dividing up in generations
>and
>> semispaces, and as you say, even eliminate the safepoint checks
>overhead
>> (does SubstrateVM only do safepoints for GC? that would be cool!).
>> 
>> Even if it *does* GC, it might still be better off overall: objects
>> would have more time to die, GC frequency would be rarer. With
>current
>> SubstrateVM I see relatively frequent full-GCs anyway, so rarer GCs
>with
>> more relative garbage, combined with increased throughput: I guess
>that
>> might actually work well for certain classes of applications,
>especially
>> those that would want to run on SubstrateVM anyway?
>> 
>> You commented about GraalVM runtime compiler allocating a lot and
>> leaving behind lots of garbage: would that be a concern in
>SustrateVM's
>> closed world view?
>> 
>> WDYT?
>> 
>> Roman
>> 
>>> Two comments inline.  And more encouragement to send along your
>ideas.
>>>
>>>              ... peter
>>>
>>> On 03/ 1/19 02:16 AM, Roman Kennke wrote:
>>>> Hi Christian,
>>>>
>>>> thanks for your replies. This is very interesting. Some additional
>>>> comments below inline:
>>>>
>>>>>> The old generation is 2 semispaces (actually, 4 with the 2 pinned
>>>>>> spaces, which I'll ask about later). When collected, live objects
>get
>>>>>> scavenged back-and-forth between the two spaces.
>>>>>
>>>>> yes, in theory. In "reality" there is no contiguous memory for the
>>>>> spaces, so after a GC all the aligned chunks of the from-space are
>>>>> either returned to the OS or cached and immediately reused for
>young
>>>>> generation allocation.
>>>>
>>>> Aha, ok. This definitely affects future design decisions.
>>>>
>>>>
>>>>>> - The policy when to start collecting seems a bit unclear to me.
>In my
>>>>>> understanding, there is (almost) no use (for STW GCs) in starting
>a
>>>>>> collection before allocation space is exhausted. Which means, it
>seems
>>>>>> an obvious trigger to start collection on allocation failure.
>Yet, the
>>>>>> policies I'm looking at are time-based or time-and-space-based. I
>said
>>>>>> 'almost' because the single use for time-based collection would
>be
>>>>>> periodic GC that is able to knock out lingering garbage during
>>>>>> no/little-allocation phase of an application, and then only when
>the GC
>>>>>> is also uncommitting the resulting unused pages (which, afaics,
>>>>>> Substrate GC would do: bravo!). But that doesn't seem to be the
>>>>>> point of
>>>>>> the time-based policies: it looks like the goal of those policies
>is to
>>>>>> balance time spent in young-vs-old-gen collections.?!
>>>>>
>>>>> The young generation size is fixed, and a collection is started
>when
>>>>> this space is filled. So from that point of view, the starting
>trigger
>>>>> of a GC is always space based.
>>>>>
>>>>> The policy whether to do an incremental (young) collection or a
>full
>>>>> collection is time based (but you can easily plug in any policy
>you
>>>>> want). The goal is to balance time between incremental and full
>>>>> collection. We certainly don't want to fill up the old generation
>>>>> because that is the maximum heap size, and it is by default
>provisioned
>>>>> to be very big.
>>>>
>>>> Ok, I see.
>>>> Also, the way it's currently done, the old-gen needs to be able to
>>>> absorb (worst-case) all of young-gen in next cycle, and therefore
>needs
>>>> *plenty* of headroom. I.e. we need to collect old-gen much earlier
>than
>>>> when it's full (like when remaining free space in old-gen is
>smaller
>>>> than young-gen size). Alternatively, we could exhaust old-gen, and
>>>> change young-gen-collection-policy to skip collection if old-gen
>doesn't
>>>> have enough space left, and dive into full-GC right away. Or, even
>>>> better, add an intermediate tenuring generation. :-)
>>>
>>> There is no fixed heap size, or fixed generation sizes.  As long as
>the
>>> collector can allocate memory from the OS it can keep adding chunks
>as
>>> needed to the old generation (or the young generation, for that
>matter.
>>> E.g., to delay collection until it is "convenient".)  If you run out
>of
>>> address space, or physical memory, then you are in trouble.
>>>
>>>>
>>>>
>>>>>> With a little bit of distance and squinting of eyes, one can see
>that
>>>>>> Substrate GC's young generation is really what is called 'nursery
>>>>>> space'
>>>>>> elsewhere, which aims to reduce the rate at which objects get
>>>>>> introduced
>>>>>> into young generation. And the old generation is really what is
>usually
>>>>>> called young generation elsewhere. What's missing is a true old
>>>>>> generation space.
>>>>>
>>>>> Not really, because the young generation can be collected
>independently,
>>>>> i.e., there are the generational write barriers, remembered sets,
>...
>>>>>
>>>>> So the young generation is reduced to the nursery space, but I
>argue the
>>>>> old generation is really an old generation.
>>>>
>>>> Ok.
>>>>
>>>>>> Considering all this, I would like to propose some improvements:
>>>>>> - Introduce a notion of tenuring objects. I guess we need like 2
>age
>>>>>> bits in the header or elsewhere for this. Do we have that room?
>>>>>
>>>>> You don't need the age bits in the header. You can easily go from
>the
>>>>> object to the aligned chunk that the object is in (we do that all
>the
>>>>> time, for example for the write barrier to do the card marking),
>and
>>>>> store the age in the chunk header. Requiring all objects in one
>chunk to
>>>>> have the same age is not much of a limitation.
>>>> Right.
>>>>
>>>>> Adding tenuring is definitely necessary to achieve reasonable GC
>>>>> performance.
>>>>
>>>> +1
>>>>
>>>>>> - Implement a true old-space (and rename the existing young to
>nursery
>>>>>> and old to young?). In my experience, sliding/mark-compact
>collection
>>>>>> using a mark bitmap works best for this: it tends to create a
>>>>>> 'sediment'
>>>>>> of permanent/very-long-lived objects at the bottom which would
>never
>>>>>> get
>>>>>> copied again. Using a bitmap, walking of live objects (e.g.
>during
>>>>>> copying, updating etc) would be very fast: much faster than
>walking
>>>>>> objects by their size.
>>>>>
>>>>> A mark-and-compact old generation algorithm definitely makes
>sense.
>>>>> Again, the only reason why we don't have it yet is that no one had
>time
>>>>> to implement it.
>>>>>
>>>>> Mark-and-compact is also great to reduce memory footprint. Right
>now,
>>>>> during GC the memory footprint can double because of the temporary
>space
>>>>> for copying.
>>>>
>>>> Yeah. However, as Peter noted, having no contiguous memory block
>>>> complicates this. I'd need to see how to deal with it
>(per-chunk-bitmap
>>>> probably, or maybe mark bit in object header, with some clever
>tricks to
>>>> make scanning the heap fast like serial GC does).
>>>>
>>>>>> - I am not totally sure about the policies. My current thinking
>is that
>>>>>> this needs some cleanup/straightening-out, or maybe I am
>>>>>> misunderstanding something there. I believe (fairly strongly)
>that
>>>>>> allocation failure is the single useful trigger for STW GC, and
>on top
>>>>>> of that an (optional) periodic GC trigger that would kick in
>after X
>>>>>> (milli)seconds no GC.
>>>>>
>>>>> As I mentioned above, the GC trigger is allocation failure for the
>young
>>>>> generation.
>>>>
>>>> Ok, good.
>>>>
>>>>>> - Low-hanging-fruit improvement that could be done right now:
>allocate
>>>>>> large objects(arrays) straight into old-gen instead of copying
>them
>>>>>> around. Those are usually long-lived anyway, and copying them
>>>>>> back-and-forth just costs CPU time for no benefit. This will
>become
>>>>>> even
>>>>>> more pronounced with a true old-gen.
>>>>>
>>>>> Large arrays are allocated separately in unaligned chunks. Such
>arrays
>>>>> are never copied, but only logically moved from the young
>generation
>>>>> into the old generation. An unaligned chunk contains exactly one
>large
>>>>> array.
>>>>
>>>> Ok, good.
>>>>
>>>>>> Oh and a question: what's this pinned object/chunks/spaces all
>about?
>>>>>
>>>>> There are two mechanisms right now to get objects that are never
>moved
>>>>> by the GC:
>>>>> 1) A "normal" object can be temporarily pinned using
>>>>> org.graalvm.nativeimage.PinnedObject. The current implementation
>then
>>>>> keeps the whole aligned chunk that contains the object alive,
>i.e., it
>>>>> is designed for pinnings that are released quickly so that no
>objects
>>>>> are actually ever pinned when the GC runs, unless the GC runs in
>an
>>>>> unlucky moments. We use such pinning for example to pass pointers
>into
>>>>> byte[] arrays directly to C functions without copying.
>>>>>
>>>>> 2) A PinnedAllocator can be used to get objects that are
>non-moving for
>>>>> a long period of time. This is currently used for the metadata of
>>>>> runtime compiled code. We are actively working to make
>PinnedAllocator
>>>>> unnecessary by putting the metadata into C memory, and then
>hopefully we
>>>>> can remove PinnedAllocator and all code that is necessary for it
>in the
>>>>> GC, i.e., the notion of pinned spaces you mentioned before.
>>>>
>>>> Ok, I guessed so. I mostly wondered about it because it's got
>from-space
>>>> and to-space:
>>>>
>>>>       [pinnedFromSpace:
>>>>         aligned: 0/0 unaligned: 0/0]
>>>>       [pinnedToSpace:
>>>>         aligned: 0/0 unaligned: 0/0]]
>>>>
>>>> And would it ever copy between them? I guess not.
>>>
>>> The collector logically moves a pinned chunk from pinned from-space
>to
>>> pinned to-space by updating bookkeeping information in the chunk. 
>The
>>> contents of the pinned chunk are not moved, and their addresses do
>not
>>> change.  If a pinned chunk is unpinned by the application, it is
>moved
>>> to the unpinned from-space and at the next full collection the
>reachable
>>> objects in it are scavenged to the unpinned to-space, like any other
>>> objects in unpinned from-space.  Between collections the pinned
>to-space
>>> is empty.  In your example, the pinned from-space is also empty. 
>Spaces
>>> do not represent address ranges, so an empty space is just a few
>null
>>> pointers in the space data structure.  (Spaces not being address
>ranges
>>> also complicates answer questions like: Is this object in the young
>>> generation.)
>>>
>>>              ... peter
>>>
>>>>
>>>>>> What do you think about all this? Somebody else might have
>thought
>>>>>> about
>>>>>> all this already, and have some insights that I don't have in my
>naive
>>>>>> understanding? Maybe some of it is already worked on or planned?
>Maybe
>>>>>> there are some big obstactles that I don't see yet, that make it
>less
>>>>>> feasible?
>>>>>
>>>>> We certainly have ideas and plans, and they match your
>observations. If
>>>>> you are interested in contributing, we can definitely give you
>some
>>>>> guidance so that you immediately work into the right direction.
>>>>
>>>> Yes, I am. :-)
>>>>
>>>> Roman
>>>>
>>>
>> 

-- 
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.