<div><br><div class="gmail_quote"><div dir="auto">On Tue, Sep 25, 2018 at 17:49 Stefan Johansson <<a href="mailto:stefan.johansson@oracle.com">stefan.johansson@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks Ruslan for your input,<br>
<br>
On 2018-09-21 15:35, Ruslan Synytsky wrote:<br>
> Dear Stefan and Rodrigo, thank you for moving this forward.<br>
> <br>
> ---------- Forwarded message ---------<br>
>> From: *Stefan Johansson* <<a href="mailto:stefan.johansson@oracle.com" target="_blank">stefan.johansson@oracle.com</a> <br>
>> <mailto:<a href="mailto:stefan.johansson@oracle.com" target="_blank">stefan.johansson@oracle.com</a>>><br>
>> Date: quarta, 19/09/2018 à(s) 10:45<br>
>> Subject: Re: RFR: bug: Timely Reducing Unused Committed Memory<br>
>> To: <<a href="mailto:hotspot-gc-dev@openjdk.java.net" target="_blank">hotspot-gc-dev@openjdk.java.net</a> <br>
>> <mailto:<a href="mailto:hotspot-gc-dev@openjdk.java.net" target="_blank">hotspot-gc-dev@openjdk.java.net</a>>>, <<a href="mailto:rbruno@gsd.inesc-id.pt" target="_blank">rbruno@gsd.inesc-id.pt</a> <br>
>> <mailto:<a href="mailto:rbruno@gsd.inesc-id.pt" target="_blank">rbruno@gsd.inesc-id.pt</a>>><br>
>><br>
>><br>
>> Hi Rodrigo,<br>
>><br>
>> I pasted your reply here to keep the discussion in one thread.<br>
>><br>
>> >> I understand that it is hard to define what is idle. However, if we <br>
>> require the<br>
>> >> user to provide one, I guess that most regular users that suffer <br>
>> from the problem<br>
>> >> that this patch is trying to solve will simply not do it because it <br>
>> requires knowledge<br>
>> >> and effort. If we provide an idle check that we think will benefit <br>
>> most users, then<br>
>> >> we are probably helping a lot of users. For those that the default <br>
>> idle check is<br>
>> >> not good enough, they can always disable this idle check and <br>
>> implement the idle<br>
>> >> check logic it in an external tool.<br>
>> >><br>
>> > I agree, if we can find a solution that benefits most users, we should<br>
>> > do it. And this is why I would like to hear from more users if this<br>
>> > would benefit their use cases. <br>
> I believe the default idle definition should be based on the major <br>
> bottlenecks: RAM, CPU and IO loads as well as the network. RAM - we try <br>
> to improve. IO - I’m not sure if we can measure IO load properly inside <br>
> JVM. If possible then it's good to add too. If not then we can skip it <br>
> for now, as it can be measured and triggered by outside logic. Network <br>
> is not involved in GC process, correct? So no need for that. CPU looks <br>
> the most obvious and already implemented, seems like a good option to <br>
> start from.<br>
<br>
I agree that CPU can look obvious, but making decisions in the VM based <br>
on the system load might be hard. For example the avg load might be low <br>
while the current process is fairly active. </blockquote><div dir="auto">Hi Stefan, you are right, it might be like this. </div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Another question, when <br>
running in the cloud, what load is the user expecting us to compare <br>
against, the overall system or the local container. I'm actually not <br>
entirely sure what the getloadavg() call return in case of running in a <br>
container.</blockquote><div dir="auto">Good question! It depends on the used container technology. In short, if it’s a system container then it shows the load of the container, if it’s an application container then the load of the host machine. There is an article on a related topic <a href="https://jelastic.com/blog/java-and-memory-limits-in-containers-lxc-docker-and-openvz/">https://jelastic.com/blog/java-and-memory-limits-in-containers-lxc-docker-and-openvz/</a></div><div dir="auto"><br></div><div dir="auto">Can we measure CPU usage of JVM process itself and use it for decisions?</div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
> <br>
>> > Another thing that I don't fully<br>
>> > understand is why the flags are manageable if there isn't supposed <br>
>> to be<br>
>> > some external logic that sets them?<br>
> Some advanced users, for example cloud platform or software vendors, <br>
> will be able to apply an additional logic based on their custom needs / <br>
> specifics. Such flexibility enables more use cases and it helps to <br>
> collect more feedback for the further default improvements.<br>
<br>
That's how I would expect it to be used as well, thanks for clarifying <br>
your viewpoint.<br>
<br>
>><br>
>> >> We can also change the semantics of "idleness". Currently it <br>
>> checks the load.<br>
>> >> I think that checking the allocation rate might be another good <br>
>> option (instead of<br>
>> >> load). The only corner case is an application that does not <br>
>> allocate but consumes<br>
>> >> a lot of CPU. For this case, we might only trigger compaction at <br>
>> most once because,<br>
>> >> as it does not allocate memory, we will not get over committed <br>
>> memory (i.e., the other<br>
>> >> checks will prevent it). The opposite is also possible (almost idle <br>
>> application that allocates<br>
>> >> a lot of memory) but in this scenario I don't think we want to <br>
>> trigger an idle compaction.<br>
>> >><br>
>><br>
>> > This is my main problem when it comes to determine "idleness", for some<br>
>> > applications allocation rate will be the correct metric, for others it<br>
>> > will be the load and for a third something different. It feels like it<br>
>> > is always possible to come up with a case that needs something <br>
>> different.<br>
> I would prefer to start with the most obvious one - based on CPU, give <br>
> it to try to more people by promoting the fact that JVM is elastic now, <br>
> and we will get more feedback that can be converted into an additional <br>
> logic later.<br>
> <br>
So basically, the first version would have two flags, one to turn on <br>
periodic GCs (currently named GCFrequency) and one to control at which <br>
average load (MaxLoadGC) these GCs will kick in?</blockquote><div dir="auto">I think it’s a good starting point. </div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
>> >> Having said that, I am open to change this flag or even remove it <br>
>> as it is one of the<br>
>> >> hardest to get right.<br>
>> >><br>
>><br>
>> > As I said before, to me it feels like just having a periodic GC <br>
>> interval<br>
>> > flag that is manageable would be a good start. Maybe have constraint<br>
>> > that the periodic GC only occurs if no other GCs have happened during<br>
>> > the interval.<br>
>><br>
> Decision based on the previous GC cycles is very good proposal. I think <br>
> we need to take it into account somehow, but I'm not so deep on it. <br>
> Input of others will be helpful here.<br>
<br>
I guess there are corner cases in this area as well, but I guess the <br>
simple constraint I described might be a good start. But as you say, <br>
input from others would be very helpful.<br>
<br>
>> > Could you explain how your use case would suffer from such<br>
>> > limitations?<br>
> In my opinion, CPU load spikes is clearly one of the major use cases <br>
> eligible for defaults.<br>
<br>
This is clear and good use case where I guess having a load threshold <br>
should really help.</blockquote><div dir="auto">Thanks </div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
Thanks,<br>
Stefan<br>
<br>
> <br>
> Thank you<br>
> <br>
>><br>
>> > Thanks,<br>
>> > Stefan<br>
>><br>
>> >> cheers,<br>
>> >> rodrigo<br>
>><br>
>><br>
>> On 2018-09-13 14:30, Stefan Johansson wrote:<br>
>> > Hi Rodrigo,<br>
>> ><br>
>> > Sorry for being a bit late into the discussion. We've had some internal<br>
>> > discussions and realized that there are some questions that I need to<br>
>> > bring up here.<br>
>> ><br>
>> > I'm trying to better understand under what circumstances this <br>
>> feature is<br>
>> > to be used and how a user should use the different flags to tweak it to<br>
>> > their use case. To me it feels like GCFrequency would be enough to make<br>
>> > sure that the VM returns memory on a timely basis. And if the flag is<br>
>> > managed, it can be controlled to not do periodic GCs during high load.<br>
>> > With that we get a good way to periodically try to reduce the committed<br>
>> > heap.<br>
>> ><br>
>> > The reason I ask is because I have a hard time seeing how we can<br>
>> > implement a generic policy for when the system is idle. A policy that<br>
>> > will apply well to most use cases. For some cases having the flags you<br>
>> > propose might be good, but for other there might be a different set of<br>
>> > options needed. If this is the case then maybe the logic and policy of<br>
>> > when to do this can live outside the VM, while the code to periodically<br>
>> > do GCs lives within the VM. What do you think about that? I understand<br>
>> > the problems you've stated with having the policy outside that VM, but<br>
>> > at least we have more information to act on there.<br>
>> ><br>
>> > We know that many have asked for features similar to this one and it<br>
>> > would be nice to get input from others on this to make sure we <br>
>> implement<br>
>> > something that benefits the whole user base as much as possible. So<br>
>> > anyone with a use case that could benefit from this, please chime in.<br>
>> ><br>
>> > Regards,<br>
>> > Stefan<br>
>> ><br>
>> ><br>
>> ><br>
>> > On 2018-09-07 17:37, Rodrigo Bruno wrote:<br>
>> >> Hi Per and Thomas,<br>
>> >><br>
>> >> thank you for your comments.<br>
>> >><br>
>> >> I think it is possible to implement this feature using the service<br>
>> >> thread or using a separate thread.<br>
>> >> I see some pros and cons of having a separate thread:<br>
>> >><br>
>> >> Pros:<br>
>> >> - using the service thread exposes something that is G1 specific to<br>
>> >> the rest of the JVM.<br>
>> >> Thus, using a separate thread, hides this feature from the outsite.<br>
>> >><br>
>> >> Cons:<br>
>> >> - Having a manageable timeout is a bit more tricky to implement in a<br>
>> >> separate/dedicated thread.<br>
>> >> We need to be able to handle switch on and off. It might require some<br>
>> >> variable pooling.<br>
>> >> - It requires some more memory.<br>
>> >><br>
>> >> Regardless of the path taken, I can prepare a new version of the patch<br>
>> >> whenever we decide on this.<br>
>> >><br>
>> >> cheers,<br>
>> >> rodrigo<br>
>> >><br>
>> >> Per Liden <<a href="mailto:per.liden@oracle.com" target="_blank">per.liden@oracle.com</a> <mailto:<a href="mailto:per.liden@oracle.com" target="_blank">per.liden@oracle.com</a>> <br>
>> <mailto:<a href="mailto:per.liden@oracle.com" target="_blank">per.liden@oracle.com</a> <mailto:<a href="mailto:per.liden@oracle.com" target="_blank">per.liden@oracle.com</a>>>><br>
>> >> escreveu no dia sexta, 7/09/2018 à(s) 11:58:<br>
>> >><br>
>> >> Hi Thomas,<br>
>> >><br>
>> >> On 09/07/2018 10:10 AM, Thomas Schatzl wrote:<br>
>> >> [...]<br>
>> >> > overnight I thought a bit of the implementation, and <br>
>> given the<br>
>> >> > problem with heap usage of the new thread, and the <br>
>> requirement of<br>
>> >> being<br>
>> >> > able to turn on/off that feature by a managed variable, the best<br>
>> >> change<br>
>> >> > would probably reusing the service thread as you did in the<br>
>> >> initial<br>
>> >> > change.<br>
>> >><br>
>> >> I'm not convinced that this should be handled outside of G1. If<br>
>> >> there's<br>
>> >> a need to have the flag manageable at runtime (is that really the<br>
>> >> case?), you could just always start the G1DetectIdleThread and<br>
>> >> have it<br>
>> >> check the flag. I wouldn't worry too much about the memory<br>
>> >> overhead for<br>
>> >> the stack.<br>
>> >><br>
>> >> cheers,<br>
>> >> Per<br>
>> >><br>
> <br>
> <br>
> <br>
> -- <br>
> Ruslan<br>
> CEO @ Jelastic <<a href="https://jelastic.com/" rel="noreferrer" target="_blank">https://jelastic.com/</a>><br>
</blockquote></div></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Ruslan <br>CEO @ Jelastic</div>