<div><br><div class="gmail_quote"><div dir="auto">On Tue, Sep 25, 2018 at 17:49 Stefan Johansson <<a href="mailto:stefan.johansson@oracle.com">stefan.johansson@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks Ruslan for your input,<br>

<br>

On 2018-09-21 15:35, Ruslan Synytsky wrote:<br>

> Dear Stefan and Rodrigo, thank you for moving this forward.<br>

> <br>

> ---------- Forwarded message ---------<br>

>> From: *Stefan Johansson* <<a href="mailto:stefan.johansson@oracle.com" target="_blank">stefan.johansson@oracle.com</a> <br>

>> <mailto:<a href="mailto:stefan.johansson@oracle.com" target="_blank">stefan.johansson@oracle.com</a>>><br>

>> Date: quarta, 19/09/2018 à(s) 10:45<br>

>> Subject: Re: RFR: bug: Timely Reducing Unused Committed Memory<br>

>> To: <<a href="mailto:hotspot-gc-dev@openjdk.java.net" target="_blank">hotspot-gc-dev@openjdk.java.net</a> <br>

>> <mailto:<a href="mailto:hotspot-gc-dev@openjdk.java.net" target="_blank">hotspot-gc-dev@openjdk.java.net</a>>>, <<a href="mailto:rbruno@gsd.inesc-id.pt" target="_blank">rbruno@gsd.inesc-id.pt</a> <br>

>> <mailto:<a href="mailto:rbruno@gsd.inesc-id.pt" target="_blank">rbruno@gsd.inesc-id.pt</a>>><br>

>><br>

>><br>

>> Hi Rodrigo,<br>

>><br>

>> I pasted your reply here to keep the discussion in one thread.<br>

>><br>

>> >> I understand that it is hard to define what is idle. However, if we <br>

>> require the<br>

>> >> user to provide one, I guess that most regular users that suffer <br>

>> from the problem<br>

>> >> that this patch is trying to solve will simply not do it because it <br>

>> requires knowledge<br>

>> >> and effort. If we provide an idle check that we think will benefit <br>

>> most users, then<br>

>> >> we are probably helping a lot of users. For those that the default <br>

>> idle check is<br>

>> >> not good enough, they can always disable this idle check and <br>

>> implement the idle<br>

>> >> check logic it in an external tool.<br>

>> >><br>

>> > I agree, if we can find a solution that benefits most users, we should<br>

>> > do it. And this is why I would like to hear from more users if this<br>

>> > would benefit their use cases. <br>

> I believe the default idle definition should be based on the major <br>

> bottlenecks: RAM, CPU and IO loads as well as the network. RAM - we try <br>

> to improve. IO - I’m not sure if we can measure IO load properly inside <br>

> JVM. If possible then it's good to add too. If not then we can skip it <br>

> for now, as it can be measured and triggered by outside logic. Network <br>

> is not involved in GC process, correct? So no need for that. CPU looks <br>

> the most obvious and already implemented, seems like a good option to <br>

> start from.<br>

<br>

I agree that CPU can look obvious, but making decisions in the VM based <br>

on the system load might be hard. For example the avg load might be low <br>

while the current process is fairly active. </blockquote><div dir="auto">Hi Stefan, you are right, it might be like this. </div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Another question, when <br>

running in the cloud, what load is the user expecting us to compare <br>

against, the overall system or the local container. I'm actually not <br>

entirely sure what the getloadavg() call return in case of running in a <br>

container.</blockquote><div dir="auto">Good question! It depends on the used container technology. In short, if it’s a system container then it shows the load of the container, if it’s an application container then the load of the host machine. There is an article on a related topic <a href="https://jelastic.com/blog/java-and-memory-limits-in-containers-lxc-docker-and-openvz/">https://jelastic.com/blog/java-and-memory-limits-in-containers-lxc-docker-and-openvz/</a></div><div dir="auto"><br></div><div dir="auto">Can we measure CPU usage of JVM process itself and use it for decisions?</div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<br>

> <br>

>> > Another thing that I don't fully<br>

>> > understand is why the flags are manageable if there isn't supposed <br>

>> to be<br>

>> > some external logic that sets them?<br>

> Some advanced users, for example cloud platform or software vendors, <br>

> will be able to apply an additional logic based on their custom needs / <br>

> specifics. Such flexibility enables more use cases and it helps to <br>

> collect more feedback for the further default improvements.<br>

<br>

That's how I would expect it to be used as well, thanks for clarifying <br>

your viewpoint.<br>

<br>

>><br>

>> >> We can also change the semantics of "idleness".  Currently it <br>

>> checks the load.<br>

>> >> I think that checking the allocation rate might be another good <br>

>> option (instead of<br>

>> >> load). The only corner case is  an application that does not <br>

>> allocate but consumes<br>

>> >> a lot of CPU. For this case, we might only trigger compaction at <br>

>> most once because,<br>

>> >> as it does not allocate memory, we will not get over committed <br>

>> memory (i.e., the other<br>

>> >> checks will prevent it). The opposite is also possible (almost idle <br>

>> application that allocates<br>

>> >> a lot of memory) but in this scenario I don't think we want to <br>

>> trigger an idle compaction.<br>

>> >><br>

>><br>

>> > This is my main problem when it comes to determine "idleness", for some<br>

>> > applications allocation rate will be the correct metric, for others it<br>

>> > will be the load and for a third something different. It feels like it<br>

>> > is always possible to come up with a case that needs something <br>

>> different.<br>

> I would prefer to start with the most obvious one - based on CPU, give <br>

> it to try to more people by promoting the fact that JVM is elastic now, <br>

> and we will get more feedback that can be converted into an additional <br>

> logic later.<br>

> <br>

So basically, the first version would have two flags, one to turn on <br>

periodic GCs (currently named GCFrequency) and one to control at which <br>

average load (MaxLoadGC) these GCs will kick in?</blockquote><div dir="auto">I think it’s a good starting point. </div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<br>

>> >> Having said that, I am open to change this flag or even remove it <br>

>> as it is one of the<br>

>> >> hardest to get right.<br>

>> >><br>

>><br>

>> > As I said before, to me it feels like just having a periodic GC <br>

>> interval<br>

>> > flag that is manageable would be a good start. Maybe have constraint<br>

>> > that the periodic GC only occurs if no other GCs have happened during<br>

>> > the interval.<br>

>><br>

> Decision based on the previous GC cycles is very good proposal. I think <br>

> we need to take it into account somehow, but I'm not so deep on it. <br>

> Input of others will be helpful here.<br>

<br>

I guess there are corner cases in this area as well, but I guess the <br>

simple constraint I described might be a good start. But as you say, <br>

input from others would be very helpful.<br>

<br>

>> > Could you explain how your use case would suffer from such<br>

>> > limitations?<br>

> In my opinion, CPU load spikes is clearly one of the major use cases <br>

> eligible for defaults.<br>

<br>

This is clear and good use case where I guess having a load threshold <br>

should really help.</blockquote><div dir="auto">Thanks  </div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<br>

Thanks,<br>

Stefan<br>

<br>

> <br>

> Thank you<br>

> <br>

>><br>

>> > Thanks,<br>

>> > Stefan<br>

>><br>

>> >> cheers,<br>

>> >> rodrigo<br>

>><br>

>><br>

>> On 2018-09-13 14:30, Stefan Johansson wrote:<br>

>> > Hi Rodrigo,<br>

>> ><br>

>> > Sorry for being a bit late into the discussion. We've had some internal<br>

>> > discussions and realized that there are some questions that I need to<br>

>> > bring up here.<br>

>> ><br>

>> > I'm trying to better understand under what circumstances this <br>

>> feature is<br>

>> > to be used and how a user should use the different flags to tweak it to<br>

>> > their use case. To me it feels like GCFrequency would be enough to make<br>

>> > sure that the VM returns memory on a timely basis. And if the flag is<br>

>> > managed, it can be controlled to not do periodic GCs during high load.<br>

>> > With that we get a good way to periodically try to reduce the committed<br>

>> > heap.<br>

>> ><br>

>> > The reason I ask is because I have a hard time seeing how we can<br>

>> > implement a generic policy for when the system is idle. A policy that<br>

>> > will apply well to most use cases. For some cases having the flags you<br>

>> > propose might be good, but for other there might be a different set of<br>

>> > options needed. If this is the case then maybe the logic and policy of<br>

>> > when to do this can live outside the VM, while the code to periodically<br>

>> > do GCs lives within the VM. What do you think about that? I understand<br>

>> > the problems you've stated with having the policy outside that VM, but<br>

>> > at least we have more information to act on there.<br>

>> ><br>

>> > We know that many have asked for features similar to this one and it<br>

>> > would be nice to get input from others on this to make sure we <br>

>> implement<br>

>> > something that benefits the whole user base as much as possible. So<br>

>> > anyone with a use case that could benefit from this, please chime in.<br>

>> ><br>

>> > Regards,<br>

>> > Stefan<br>

>> ><br>

>> ><br>

>> ><br>

>> > On 2018-09-07 17:37, Rodrigo Bruno wrote:<br>

>> >> Hi Per and Thomas,<br>

>> >><br>

>> >> thank you for your comments.<br>

>> >><br>

>> >> I think it is possible to implement this feature using the service<br>

>> >> thread or using a separate thread.<br>

>> >> I see some pros and cons of having a separate thread:<br>

>> >><br>

>> >> Pros:<br>

>> >> - using the service thread exposes something that is G1 specific to<br>

>> >> the rest of the JVM.<br>

>> >> Thus, using a separate thread, hides this feature from the outsite.<br>

>> >><br>

>> >> Cons:<br>

>> >> - Having a manageable timeout is a bit more tricky to implement in a<br>

>> >> separate/dedicated thread.<br>

>> >> We need to be able to handle switch on and off. It might require some<br>

>> >> variable pooling.<br>

>> >> - It requires some more memory.<br>

>> >><br>

>> >> Regardless of the path taken, I can prepare a new version of the patch<br>

>> >> whenever we decide on this.<br>

>> >><br>

>> >> cheers,<br>

>> >> rodrigo<br>

>> >><br>

>> >> Per Liden <<a href="mailto:per.liden@oracle.com" target="_blank">per.liden@oracle.com</a> <mailto:<a href="mailto:per.liden@oracle.com" target="_blank">per.liden@oracle.com</a>> <br>

>> <mailto:<a href="mailto:per.liden@oracle.com" target="_blank">per.liden@oracle.com</a> <mailto:<a href="mailto:per.liden@oracle.com" target="_blank">per.liden@oracle.com</a>>>><br>

>> >> escreveu no dia sexta, 7/09/2018 à(s) 11:58:<br>

>> >><br>

>> >>     Hi Thomas,<br>

>> >><br>

>> >>     On 09/07/2018 10:10 AM, Thomas Schatzl wrote:<br>

>> >>     [...]<br>

>> >>      >    overnight I thought a bit of the implementation, and <br>

>> given the<br>

>> >>      > problem with heap usage of the new thread, and the <br>

>> requirement of<br>

>> >>     being<br>

>> >>      > able to turn on/off that feature by a managed variable, the best<br>

>> >>     change<br>

>> >>      > would probably reusing the service thread as you did in the<br>

>> >> initial<br>

>> >>      > change.<br>

>> >><br>

>> >>     I'm not convinced that this should be handled outside of G1. If<br>

>> >> there's<br>

>> >>     a need to have the flag manageable at runtime (is that really the<br>

>> >>     case?), you could just always start the G1DetectIdleThread and<br>

>> >> have it<br>

>> >>     check the flag. I wouldn't worry too much about the memory<br>

>> >> overhead for<br>

>> >>     the stack.<br>

>> >><br>

>> >>     cheers,<br>

>> >>     Per<br>

>> >><br>

> <br>

> <br>

> <br>

> -- <br>

> Ruslan<br>

> CEO @ Jelastic <<a href="https://jelastic.com/" rel="noreferrer" target="_blank">https://jelastic.com/</a>><br>

</blockquote></div></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Ruslan <br>CEO @ Jelastic</div>