RFR: bug: Timely Reducing Unused Committed Memory

Fri Sep 21 13:35:16 UTC 2018

Dear Stefan and Rodrigo, thank you for moving this forward.

---------- Forwarded message ---------

From: Stefan Johansson <stefan.johansson at oracle.com>
Date: quarta, 19/09/2018 à(s) 10:45
Subject: Re: RFR: bug: Timely Reducing Unused Committed Memory
To: <hotspot-gc-dev at openjdk.java.net>, <rbruno at gsd.inesc-id.pt>

Hi Rodrigo,

I pasted your reply here to keep the discussion in one thread.

>> I understand that it is hard to define what is idle. However, if we
require the
>> user to provide one, I guess that most regular users that suffer from
the problem
>> that this patch is trying to solve will simply not do it because it
requires knowledge
>> and effort. If we provide an idle check that we think will benefit most
users, then
>> we are probably helping a lot of users. For those that the default idle
check is
>> not good enough, they can always disable this idle check and implement
the idle
>> check logic it in an external tool.
>>
> I agree, if we can find a solution that benefits most users, we should
> do it. And this is why I would like to hear from more users if this
> would benefit their use cases.

I believe the default idle definition should be based on the major
bottlenecks: RAM, CPU and IO loads as well as the network. RAM - we try to
improve. IO - I’m not sure if we can measure IO load properly inside JVM.
If possible then it's good to add too. If not then we can skip it for now,
as it can be measured and triggered by outside logic. Network is not
involved in GC process, correct? So no need for that. CPU looks the most
obvious and already implemented, seems like a good option to start from.

> Another thing that I don't fully
> understand is why the flags are manageable if there isn't supposed to be
> some external logic that sets them?

Some advanced users, for example cloud platform or software vendors, will
be able to apply an additional logic based on their custom needs /
specifics. Such flexibility enables more use cases and it helps to collect
more feedback for the further default improvements.

>> We can also change the semantics of "idleness".  Currently it checks the
load.
>> I think that checking the allocation rate might be another good option
(instead of
>> load). The only corner case is  an application that does not allocate
but consumes
>> a lot of CPU. For this case, we might only trigger compaction at most
once because,
>> as it does not allocate memory, we will not get over committed memory
(i.e., the other
>> checks will prevent it). The opposite is also possible (almost idle
application that allocates
>> a lot of memory) but in this scenario I don't think we want to trigger
an idle compaction.
>>

> This is my main problem when it comes to determine "idleness", for some
> applications allocation rate will be the correct metric, for others it
> will be the load and for a third something different. It feels like it
> is always possible to come up with a case that needs something different.

I would prefer to start with the most obvious one - based on CPU, give it
to try to more people by promoting the fact that JVM is elastic now, and we
will get more feedback that can be converted into an additional logic
later.

>> Having said that, I am open to change this flag or even remove it as it
is one of the
>> hardest to get right.
>>

> As I said before, to me it feels like just having a periodic GC interval
> flag that is manageable would be a good start. Maybe have constraint
> that the periodic GC only occurs if no other GCs have happened during
> the interval.

Decision based on the previous GC cycles is very good proposal. I think we
need to take it into account somehow, but I'm not so deep on it. Input of
others will be helpful here.

> Could you explain how your use case would suffer from such
> limitations?

In my opinion, CPU load spikes is clearly one of the major use cases
eligible for defaults.

Thank you

> Thanks,
> Stefan

>> cheers,
>> rodrigo

On 2018-09-13 14:30, Stefan Johansson wrote:
> Hi Rodrigo,
>
> Sorry for being a bit late into the discussion. We've had some internal
> discussions and realized that there are some questions that I need to
> bring up here.
>
> I'm trying to better understand under what circumstances this feature is
> to be used and how a user should use the different flags to tweak it to
> their use case. To me it feels like GCFrequency would be enough to make
> sure that the VM returns memory on a timely basis. And if the flag is
> managed, it can be controlled to not do periodic GCs during high load.
> With that we get a good way to periodically try to reduce the committed
> heap.
>
> The reason I ask is because I have a hard time seeing how we can
> implement a generic policy for when the system is idle. A policy that
> will apply well to most use cases. For some cases having the flags you
> propose might be good, but for other there might be a different set of
> options needed. If this is the case then maybe the logic and policy of
> when to do this can live outside the VM, while the code to periodically
> do GCs lives within the VM. What do you think about that? I understand
> the problems you've stated with having the policy outside that VM, but
> at least we have more information to act on there.
>
> We know that many have asked for features similar to this one and it
> would be nice to get input from others on this to make sure we implement
> something that benefits the whole user base as much as possible. So
> anyone with a use case that could benefit from this, please chime in.
>
> Regards,
> Stefan
>
>
>
> On 2018-09-07 17:37, Rodrigo Bruno wrote:
>> Hi Per and Thomas,
>>
>> thank you for your comments.
>>
>> I think it is possible to implement this feature using the service
>> thread or using a separate thread.
>> I see some pros and cons of having a separate thread:
>>
>> Pros:
>> - using the service thread exposes something that is G1 specific to
>> the rest of the JVM.
>> Thus, using a separate thread, hides this feature from the outsite.
>>
>> Cons:
>> - Having a manageable timeout is a bit more tricky to implement in a
>> separate/dedicated thread.
>> We need to be able to handle switch on and off. It might require some
>> variable pooling.
>> - It requires some more memory.
>>
>> Regardless of the path taken, I can prepare a new version of the patch
>> whenever we decide on this.
>>
>> cheers,
>> rodrigo
>>
>> Per Liden <per.liden at oracle.com <mailto:per.liden at oracle.com>>
>> escreveu no dia sexta, 7/09/2018 à(s) 11:58:
>>
>>     Hi Thomas,
>>
>>     On 09/07/2018 10:10 AM, Thomas Schatzl wrote:
>>     [...]
>>      >    overnight I thought a bit of the implementation, and given the
>>      > problem with heap usage of the new thread, and the requirement of
>>     being
>>      > able to turn on/off that feature by a managed variable, the best
>>     change
>>      > would probably reusing the service thread as you did in the
>> initial
>>      > change.
>>
>>     I'm not convinced that this should be handled outside of G1. If
>> there's
>>     a need to have the flag manageable at runtime (is that really the
>>     case?), you could just always start the G1DetectIdleThread and
>> have it
>>     check the flag. I wouldn't worry too much about the memory
>> overhead for
>>     the stack.
>>
>>     cheers,
>>     Per
>>

-- 
Ruslan
CEO @ Jelastic <https://jelastic.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20180921/43bd393d/attachment.htm>