RFR: bug: Timely Reducing Unused Committed Memory

Sat Sep 29 07:26:01 UTC 2018

On Tue, Sep 25, 2018 at 17:49 Stefan Johansson <stefan.johansson at oracle.com>
wrote:

> Thanks Ruslan for your input,
>
> On 2018-09-21 15:35, Ruslan Synytsky wrote:
> > Dear Stefan and Rodrigo, thank you for moving this forward.
> >
> > ---------- Forwarded message ---------
> >> From: *Stefan Johansson* <stefan.johansson at oracle.com
> >> <mailto:stefan.johansson at oracle.com>>
> >> Date: quarta, 19/09/2018 à(s) 10:45
> >> Subject: Re: RFR: bug: Timely Reducing Unused Committed Memory
> >> To: <hotspot-gc-dev at openjdk.java.net
> >> <mailto:hotspot-gc-dev at openjdk.java.net>>, <rbruno at gsd.inesc-id.pt
> >> <mailto:rbruno at gsd.inesc-id.pt>>
> >>
> >>
> >> Hi Rodrigo,
> >>
> >> I pasted your reply here to keep the discussion in one thread.
> >>
> >> >> I understand that it is hard to define what is idle. However, if we
> >> require the
> >> >> user to provide one, I guess that most regular users that suffer
> >> from the problem
> >> >> that this patch is trying to solve will simply not do it because it
> >> requires knowledge
> >> >> and effort. If we provide an idle check that we think will benefit
> >> most users, then
> >> >> we are probably helping a lot of users. For those that the default
> >> idle check is
> >> >> not good enough, they can always disable this idle check and
> >> implement the idle
> >> >> check logic it in an external tool.
> >> >>
> >> > I agree, if we can find a solution that benefits most users, we should
> >> > do it. And this is why I would like to hear from more users if this
> >> > would benefit their use cases.
> > I believe the default idle definition should be based on the major
> > bottlenecks: RAM, CPU and IO loads as well as the network. RAM - we try
> > to improve. IO - I’m not sure if we can measure IO load properly inside
> > JVM. If possible then it's good to add too. If not then we can skip it
> > for now, as it can be measured and triggered by outside logic. Network
> > is not involved in GC process, correct? So no need for that. CPU looks
> > the most obvious and already implemented, seems like a good option to
> > start from.
>
> I agree that CPU can look obvious, but making decisions in the VM based
> on the system load might be hard. For example the avg load might be low
> while the current process is fairly active.

Hi Stefan, you are right, it might be like this.

Another question, when
> running in the cloud, what load is the user expecting us to compare
> against, the overall system or the local container. I'm actually not
> entirely sure what the getloadavg() call return in case of running in a
> container.

Good question! It depends on the used container technology. In short, if
it’s a system container then it shows the load of the container, if it’s an
application container then the load of the host machine. There is an
article on a related topic
https://jelastic.com/blog/java-and-memory-limits-in-containers-lxc-docker-and-openvz/

Can we measure CPU usage of JVM process itself and use it for decisions?

>
> >
> >> > Another thing that I don't fully
> >> > understand is why the flags are manageable if there isn't supposed
> >> to be
> >> > some external logic that sets them?
> > Some advanced users, for example cloud platform or software vendors,
> > will be able to apply an additional logic based on their custom needs /
> > specifics. Such flexibility enables more use cases and it helps to
> > collect more feedback for the further default improvements.
>
> That's how I would expect it to be used as well, thanks for clarifying
> your viewpoint.
>
> >>
> >> >> We can also change the semantics of "idleness".  Currently it
> >> checks the load.
> >> >> I think that checking the allocation rate might be another good
> >> option (instead of
> >> >> load). The only corner case is  an application that does not
> >> allocate but consumes
> >> >> a lot of CPU. For this case, we might only trigger compaction at
> >> most once because,
> >> >> as it does not allocate memory, we will not get over committed
> >> memory (i.e., the other
> >> >> checks will prevent it). The opposite is also possible (almost idle
> >> application that allocates
> >> >> a lot of memory) but in this scenario I don't think we want to
> >> trigger an idle compaction.
> >> >>
> >>
> >> > This is my main problem when it comes to determine "idleness", for
> some
> >> > applications allocation rate will be the correct metric, for others it
> >> > will be the load and for a third something different. It feels like it
> >> > is always possible to come up with a case that needs something
> >> different.
> > I would prefer to start with the most obvious one - based on CPU, give
> > it to try to more people by promoting the fact that JVM is elastic now,
> > and we will get more feedback that can be converted into an additional
> > logic later.
> >
> So basically, the first version would have two flags, one to turn on
> periodic GCs (currently named GCFrequency) and one to control at which
> average load (MaxLoadGC) these GCs will kick in?

I think it’s a good starting point.

>
> >> >> Having said that, I am open to change this flag or even remove it
> >> as it is one of the
> >> >> hardest to get right.
> >> >>
> >>
> >> > As I said before, to me it feels like just having a periodic GC
> >> interval
> >> > flag that is manageable would be a good start. Maybe have constraint
> >> > that the periodic GC only occurs if no other GCs have happened during
> >> > the interval.
> >>
> > Decision based on the previous GC cycles is very good proposal. I think
> > we need to take it into account somehow, but I'm not so deep on it.
> > Input of others will be helpful here.
>
> I guess there are corner cases in this area as well, but I guess the
> simple constraint I described might be a good start. But as you say,
> input from others would be very helpful.
>
> >> > Could you explain how your use case would suffer from such
> >> > limitations?
> > In my opinion, CPU load spikes is clearly one of the major use cases
> > eligible for defaults.
>
> This is clear and good use case where I guess having a load threshold
> should really help.

Thanks

>
> Thanks,
> Stefan
>
> >
> > Thank you
> >
> >>
> >> > Thanks,
> >> > Stefan
> >>
> >> >> cheers,
> >> >> rodrigo
> >>
> >>
> >> On 2018-09-13 14:30, Stefan Johansson wrote:
> >> > Hi Rodrigo,
> >> >
> >> > Sorry for being a bit late into the discussion. We've had some
> internal
> >> > discussions and realized that there are some questions that I need to
> >> > bring up here.
> >> >
> >> > I'm trying to better understand under what circumstances this
> >> feature is
> >> > to be used and how a user should use the different flags to tweak it
> to
> >> > their use case. To me it feels like GCFrequency would be enough to
> make
> >> > sure that the VM returns memory on a timely basis. And if the flag is
> >> > managed, it can be controlled to not do periodic GCs during high load.
> >> > With that we get a good way to periodically try to reduce the
> committed
> >> > heap.
> >> >
> >> > The reason I ask is because I have a hard time seeing how we can
> >> > implement a generic policy for when the system is idle. A policy that
> >> > will apply well to most use cases. For some cases having the flags you
> >> > propose might be good, but for other there might be a different set of
> >> > options needed. If this is the case then maybe the logic and policy of
> >> > when to do this can live outside the VM, while the code to
> periodically
> >> > do GCs lives within the VM. What do you think about that? I understand
> >> > the problems you've stated with having the policy outside that VM, but
> >> > at least we have more information to act on there.
> >> >
> >> > We know that many have asked for features similar to this one and it
> >> > would be nice to get input from others on this to make sure we
> >> implement
> >> > something that benefits the whole user base as much as possible. So
> >> > anyone with a use case that could benefit from this, please chime in.
> >> >
> >> > Regards,
> >> > Stefan
> >> >
> >> >
> >> >
> >> > On 2018-09-07 17:37, Rodrigo Bruno wrote:
> >> >> Hi Per and Thomas,
> >> >>
> >> >> thank you for your comments.
> >> >>
> >> >> I think it is possible to implement this feature using the service
> >> >> thread or using a separate thread.
> >> >> I see some pros and cons of having a separate thread:
> >> >>
> >> >> Pros:
> >> >> - using the service thread exposes something that is G1 specific to
> >> >> the rest of the JVM.
> >> >> Thus, using a separate thread, hides this feature from the outsite.
> >> >>
> >> >> Cons:
> >> >> - Having a manageable timeout is a bit more tricky to implement in a
> >> >> separate/dedicated thread.
> >> >> We need to be able to handle switch on and off. It might require some
> >> >> variable pooling.
> >> >> - It requires some more memory.
> >> >>
> >> >> Regardless of the path taken, I can prepare a new version of the
> patch
> >> >> whenever we decide on this.
> >> >>
> >> >> cheers,
> >> >> rodrigo
> >> >>
> >> >> Per Liden <per.liden at oracle.com <mailto:per.liden at oracle.com>
> >> <mailto:per.liden at oracle.com <mailto:per.liden at oracle.com>>>
> >> >> escreveu no dia sexta, 7/09/2018 à(s) 11:58:
> >> >>
> >> >>     Hi Thomas,
> >> >>
> >> >>     On 09/07/2018 10:10 AM, Thomas Schatzl wrote:
> >> >>     [...]
> >> >>      >    overnight I thought a bit of the implementation, and
> >> given the
> >> >>      > problem with heap usage of the new thread, and the
> >> requirement of
> >> >>     being
> >> >>      > able to turn on/off that feature by a managed variable, the
> best
> >> >>     change
> >> >>      > would probably reusing the service thread as you did in the
> >> >> initial
> >> >>      > change.
> >> >>
> >> >>     I'm not convinced that this should be handled outside of G1. If
> >> >> there's
> >> >>     a need to have the flag manageable at runtime (is that really the
> >> >>     case?), you could just always start the G1DetectIdleThread and
> >> >> have it
> >> >>     check the flag. I wouldn't worry too much about the memory
> >> >> overhead for
> >> >>     the stack.
> >> >>
> >> >>     cheers,
> >> >>     Per
> >> >>
> >
> >
> >
> > --
> > Ruslan
> > CEO @ Jelastic <https://jelastic.com/>
>
-- 
Ruslan
CEO @ Jelastic
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20180929/89444856/attachment.htm>