RFR 8012371: Adjust Tiered compile threshold according to available space in codecache

Tue May 14 13:25:54 PDT 2013

Yup, I had something similar in mind. But it didn't occur to me that you can just throttle C1 exclusively. Awesome solution!
That way you just throttle C1 compiles, and in the meanwhile you'll just use the interpreter for profiling.

Looks good!

igor

On May 14, 2013, at 11:21 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:

> http://cr.openjdk.java.net/~kvn/8012371/webrev
> 
> On 5/14/13 6:23 AM, Albert Noll wrote:
> 
> Hi,
> 
> I think I found a solution to the code cache fill-up-problem.
> 
> The problem with the previous solution was that the threshold for
> recompilation is increased equally
> for different tiers. As a result, peak performance was not reached if
> the code cache was "rather full", since frequently invoked methods that
> WERE compiled to C2 in a non-tiered version WERE NOT compiled with C2
> when using tiered compilation.
> 
> The proposed solution to the problem is that we start increasing the
> threshold rather early (e.g., if the code cache is filled up by 50%),
> and do not increase the threshold for C2 compilation. As a result, we
> have enough space for C2 code (we reach peak performance).
> The drawback of this solution, of course, is that tiered compilation
> potentially performs worse than if we provide more code cache. However,
> this solution should not perform worse compared to not using tiered
> compilation.
> 
> I evaluated the proposed changes using the nashorn benchmarks with
> ReservedCodeCacheSize=80m  letting all benchmarks run in the same JVM
> instance. We start increaseing the threshold for recompilation (not for
> recompiling to C2) when the code cache is filled up by 50%. The result
> is that
> the warning that the code cache is filled up and compilation stops is
> not printed out. Furthermore, we achieve similar peak performance
> compared to non-tiered but a faster startup time.
> 
> 
> Many thanks for your comments,
> Albert
> 
> On 07/05/2013 22:38, Vladimir Kozlov wrote:
>> And add product flag for initial ratio value so people can adjust it
>> as they wish.
>> 
>> Vladimir
>> 
>> On 5/7/13 9:40 AM, Vladimir Kozlov wrote:
>>> Albert,
>>> 
>>> You should start using Nashorn/octane for performance testing since
>>> TieredCompilatation has big effect on it. Roland can help you with it.
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>> On 5/7/13 9:08 AM, Vladimir Kozlov wrote:
>>>> On 5/7/13 6:49 AM, Albert Noll wrote:
>>>>> Hi Vladimir,
>>>>> 
>>>>> I performed a preliminary evaluation of the effects on the size of
>>>>> generated code.
>>>>> I used the eclipse benchmark from the DaCapo benchmarks.
>>>>> In the test, I limited the ReservedCodeCacheSize to 32m
>>>>> 
>>>>> With the changes in advancedThresholdPolicy, 2 runs generate 76mb
>>>>> code.
>>>>> Without the changes, 2 runs generate 116mb code.
>>>> 
>>>> This is good but I mostly concern about effect on performance, startup
>>>> and peek. Also look on codecache usage with default size at the end of
>>>> execution. Use -XX:PrintCompilation which has time stamps (first
>>>> number)
>>>> in output to see have behavior change. Note, third number in output is
>>>> compilation type: 3 - C1 with profiling, 4 - C2.
>>>> 
>>>> Sorry about dexp() suggestion, yes it needs to be called at correct
>>>> thread state.
>>>> 
>>>> And I made mistake with my suggested expressions. If we want to scale
>>>> only for 25% and less space we need:
>>>> 
>>>>  if (free_reverse_ratio > 4.) {
>>>>    k *= exp(free_reverse_ratio - 4.);
>>>> 
>>>> But I will leave it to you to determine best ratio value by experiments
>>>> to get best results: get the same startup and peek with less codecache.
>>>> May be your 50% will be better value.
>>>> 
>>>> Thanks,
>>>> Vladimir
>>>> 
>>>>> 
>>>>> Best,
>>>>> Albert
>>>>> 
>>>>> On 05/07/2013 02:13 PM, Albert Noll wrote:
>>>>>> Hi Vladimir,
>>>>>> 
>>>>>> thank you very much for your feedback. I made the changes as you
>>>>>> proposed.
>>>>>> I could not use SharedRuntime::dexp(d), since the VM crashed (see
>>>>>> below). Rick
>>>>>> explained me why: (the current thread is in a wrong state).
>>>>>> 
>>>>>> What do you think of the current version? Do you think we need to
>>>>>> evaluate the
>>>>>> performance impact of that change?
>>>>>> 
>>>>>> Best,
>>>>>> Albert
>>>>>> 
>>>>>> P.S.: Is it OK if I ask you for early feedback, or should I just send
>>>>>> out an RFR?
>>>>>> 
>>>>>> # To suppress the following error report, specify this argument
>>>>>> # after -XX: or in .hotspotrc: SuppressErrorAt=/gcLocker.cpp:223
>>>>>> #
>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>> #
>>>>>> #  Internal Error
>>>>>> (/export/anoll/JDK-8012371/src/share/vm/memory/gcLocker.cpp:223),
>>>>>> pid=5663, tid=140406973806336
>>>>>> #  Error: ShouldNotReachHere()
>>>>>> #
>>>>>> # JRE version: Java(TM) SE Runtime Environment (8.0-b86) (build
>>>>>> 1.8.0-ea-b86)
>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM
>>>>>> (25.0-b32-internal-fastdebug mixed mode linux-amd64 compressed oops)
>>>>>> # Failed to write core dump. Core dumps have been disabled. To enable
>>>>>> core dumping, try "ulimit -c unlimited" before starting Java again
>>>>>> #
>>>>>> # An error report file with more information is saved as:
>>>>>> # /export/anoll/hs_err_pid5663.log
>>>>>> #
>>>>>> # If you would like to submit a bug report, please visit:
>>>>>> #   http://bugreport.sun.com/bugreport/crash.jsp
>>>>>> #
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 05/06/2013 11:01 PM, Vladimir Kozlov wrote:
>>>>>>> I don't think it should be only C1 specific code. Before Tiered code
>>>>>>> we had decay counters code for all compilers. So I think it
>>>>>>> should be
>>>>>>> the same now.
>>>>>>> 
>>>>>>> Make new method:
>>>>>>>  double CodeCache::free_space_ratio()
>>>>>>> 
>>>>>>> Also subtract CodeCacheMinimumFreeSpace to get correct size
>>>>>>> available
>>>>>>> for JIT code.
>>>>>>> 
>>>>>>> I would prefer to have one scaling expression which starts with *1
>>>>>>> and end with e**k. But it would be nice to have switch (from one
>>>>>>> expression to an other) the same (graph without steps). For example,
>>>>>>> the next code will sharply increase scale by 2 which is not good:
>>>>>>> +        k += (free_ratio < 0.50) ? 1/free_ratio : 0;
>>>>>>> 
>>>>>>> Also you use 2 divisions when you could use just one. And 50% empty
>>>>>>> is too early. If we start at 25% and use SharedRuntime::dexp(d) I
>>>>>>> think we can simplify code:
>>>>>>> 
>>>>>>> double free_reverse_ratio = max_capacity / unallocated_capacity;
>>>>>>> if (free_reverse_ratio > 2.) {
>>>>>>>  k *= SharedRuntime::dexp(free_reverse_ratio - 2.);
>>>>>>> }
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Vladimir
>>>>>>> 
>>>>>>> On 5/6/13 6:00 AM, Albert Noll wrote:
>>>>>>>> Hi Vladimir,
>>>>>>>> 
>>>>>>>> I looked at: https://jbs.oracle.com/bugs/browse/JDK-8012371 .
>>>>>>>> I attached a possible solution to this mail. Could I get some
>>>>>>>> early feedback from you?
>>>>>>>> 
>>>>>>>> Many thanks,
>>>>>>>> Albert
>>>>>> 
>>>>> 
> 
> 
> 
>