RFR 8012371: Adjust Tiered compile threshold according to available space in codecache

Wed May 15 02:15:31 PDT 2013

Another thought is that I it might be useful for future debugging to add "TieredStopAtLevel == CompLevel_full_optimization" to the predicate at advancedThresholdPolicy.cpp:211, otherwise the c1-only mode is penalized. That is if you guys haven't pushed yet, not really important.

Thanks!
igor

On May 14, 2013, at 1:45 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:

> Thank you, Igor
> 
> For the record, these changes are based on your suggestion :)
> 
> Thanks,
> Vladimir
> 
> On 5/14/13 1:25 PM, Igor Veresov wrote:
>> Yup, I had something similar in mind. But it didn't occur to me that you can just throttle C1 exclusively. Awesome solution!
>> That way you just throttle C1 compiles, and in the meanwhile you'll just use the interpreter for profiling.
>> 
>> Looks good!
>> 
>> igor
>> 
>> On May 14, 2013, at 11:21 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>> 
>>> http://cr.openjdk.java.net/~kvn/8012371/webrev
>>> 
>>> On 5/14/13 6:23 AM, Albert Noll wrote:
>>> 
>>> Hi,
>>> 
>>> I think I found a solution to the code cache fill-up-problem.
>>> 
>>> The problem with the previous solution was that the threshold for
>>> recompilation is increased equally
>>> for different tiers. As a result, peak performance was not reached if
>>> the code cache was "rather full", since frequently invoked methods that
>>> WERE compiled to C2 in a non-tiered version WERE NOT compiled with C2
>>> when using tiered compilation.
>>> 
>>> The proposed solution to the problem is that we start increasing the
>>> threshold rather early (e.g., if the code cache is filled up by 50%),
>>> and do not increase the threshold for C2 compilation. As a result, we
>>> have enough space for C2 code (we reach peak performance).
>>> The drawback of this solution, of course, is that tiered compilation
>>> potentially performs worse than if we provide more code cache. However,
>>> this solution should not perform worse compared to not using tiered
>>> compilation.
>>> 
>>> I evaluated the proposed changes using the nashorn benchmarks with
>>> ReservedCodeCacheSize=80m  letting all benchmarks run in the same JVM
>>> instance. We start increaseing the threshold for recompilation (not for
>>> recompiling to C2) when the code cache is filled up by 50%. The result
>>> is that
>>> the warning that the code cache is filled up and compilation stops is
>>> not printed out. Furthermore, we achieve similar peak performance
>>> compared to non-tiered but a faster startup time.
>>> 
>>> 
>>> Many thanks for your comments,
>>> Albert
>>> 
>>> On 07/05/2013 22:38, Vladimir Kozlov wrote:
>>>> And add product flag for initial ratio value so people can adjust it
>>>> as they wish.
>>>> 
>>>> Vladimir
>>>> 
>>>> On 5/7/13 9:40 AM, Vladimir Kozlov wrote:
>>>>> Albert,
>>>>> 
>>>>> You should start using Nashorn/octane for performance testing since
>>>>> TieredCompilatation has big effect on it. Roland can help you with it.
>>>>> 
>>>>> Thanks,
>>>>> Vladimir
>>>>> 
>>>>> On 5/7/13 9:08 AM, Vladimir Kozlov wrote:
>>>>>> On 5/7/13 6:49 AM, Albert Noll wrote:
>>>>>>> Hi Vladimir,
>>>>>>> 
>>>>>>> I performed a preliminary evaluation of the effects on the size of
>>>>>>> generated code.
>>>>>>> I used the eclipse benchmark from the DaCapo benchmarks.
>>>>>>> In the test, I limited the ReservedCodeCacheSize to 32m
>>>>>>> 
>>>>>>> With the changes in advancedThresholdPolicy, 2 runs generate 76mb
>>>>>>> code.
>>>>>>> Without the changes, 2 runs generate 116mb code.
>>>>>> 
>>>>>> This is good but I mostly concern about effect on performance, startup
>>>>>> and peek. Also look on codecache usage with default size at the end of
>>>>>> execution. Use -XX:PrintCompilation which has time stamps (first
>>>>>> number)
>>>>>> in output to see have behavior change. Note, third number in output is
>>>>>> compilation type: 3 - C1 with profiling, 4 - C2.
>>>>>> 
>>>>>> Sorry about dexp() suggestion, yes it needs to be called at correct
>>>>>> thread state.
>>>>>> 
>>>>>> And I made mistake with my suggested expressions. If we want to scale
>>>>>> only for 25% and less space we need:
>>>>>> 
>>>>>>  if (free_reverse_ratio > 4.) {
>>>>>>    k *= exp(free_reverse_ratio - 4.);
>>>>>> 
>>>>>> But I will leave it to you to determine best ratio value by experiments
>>>>>> to get best results: get the same startup and peek with less codecache.
>>>>>> May be your 50% will be better value.
>>>>>> 
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>> 
>>>>>>> 
>>>>>>> Best,
>>>>>>> Albert
>>>>>>> 
>>>>>>> On 05/07/2013 02:13 PM, Albert Noll wrote:
>>>>>>>> Hi Vladimir,
>>>>>>>> 
>>>>>>>> thank you very much for your feedback. I made the changes as you
>>>>>>>> proposed.
>>>>>>>> I could not use SharedRuntime::dexp(d), since the VM crashed (see
>>>>>>>> below). Rick
>>>>>>>> explained me why: (the current thread is in a wrong state).
>>>>>>>> 
>>>>>>>> What do you think of the current version? Do you think we need to
>>>>>>>> evaluate the
>>>>>>>> performance impact of that change?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Albert
>>>>>>>> 
>>>>>>>> P.S.: Is it OK if I ask you for early feedback, or should I just send
>>>>>>>> out an RFR?
>>>>>>>> 
>>>>>>>> # To suppress the following error report, specify this argument
>>>>>>>> # after -XX: or in .hotspotrc: SuppressErrorAt=/gcLocker.cpp:223
>>>>>>>> #
>>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>>> #
>>>>>>>> #  Internal Error
>>>>>>>> (/export/anoll/JDK-8012371/src/share/vm/memory/gcLocker.cpp:223),
>>>>>>>> pid=5663, tid=140406973806336
>>>>>>>> #  Error: ShouldNotReachHere()
>>>>>>>> #
>>>>>>>> # JRE version: Java(TM) SE Runtime Environment (8.0-b86) (build
>>>>>>>> 1.8.0-ea-b86)
>>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM
>>>>>>>> (25.0-b32-internal-fastdebug mixed mode linux-amd64 compressed oops)
>>>>>>>> # Failed to write core dump. Core dumps have been disabled. To enable
>>>>>>>> core dumping, try "ulimit -c unlimited" before starting Java again
>>>>>>>> #
>>>>>>>> # An error report file with more information is saved as:
>>>>>>>> # /export/anoll/hs_err_pid5663.log
>>>>>>>> #
>>>>>>>> # If you would like to submit a bug report, please visit:
>>>>>>>> #   http://bugreport.sun.com/bugreport/crash.jsp
>>>>>>>> #
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 05/06/2013 11:01 PM, Vladimir Kozlov wrote:
>>>>>>>>> I don't think it should be only C1 specific code. Before Tiered code
>>>>>>>>> we had decay counters code for all compilers. So I think it
>>>>>>>>> should be
>>>>>>>>> the same now.
>>>>>>>>> 
>>>>>>>>> Make new method:
>>>>>>>>>  double CodeCache::free_space_ratio()
>>>>>>>>> 
>>>>>>>>> Also subtract CodeCacheMinimumFreeSpace to get correct size
>>>>>>>>> available
>>>>>>>>> for JIT code.
>>>>>>>>> 
>>>>>>>>> I would prefer to have one scaling expression which starts with *1
>>>>>>>>> and end with e**k. But it would be nice to have switch (from one
>>>>>>>>> expression to an other) the same (graph without steps). For example,
>>>>>>>>> the next code will sharply increase scale by 2 which is not good:
>>>>>>>>> +        k += (free_ratio < 0.50) ? 1/free_ratio : 0;
>>>>>>>>> 
>>>>>>>>> Also you use 2 divisions when you could use just one. And 50% empty
>>>>>>>>> is too early. If we start at 25% and use SharedRuntime::dexp(d) I
>>>>>>>>> think we can simplify code:
>>>>>>>>> 
>>>>>>>>> double free_reverse_ratio = max_capacity / unallocated_capacity;
>>>>>>>>> if (free_reverse_ratio > 2.) {
>>>>>>>>>  k *= SharedRuntime::dexp(free_reverse_ratio - 2.);
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Vladimir
>>>>>>>>> 
>>>>>>>>> On 5/6/13 6:00 AM, Albert Noll wrote:
>>>>>>>>>> Hi Vladimir,
>>>>>>>>>> 
>>>>>>>>>> I looked at: https://jbs.oracle.com/bugs/browse/JDK-8012371 .
>>>>>>>>>> I attached a possible solution to this mail. Could I get some
>>>>>>>>>> early feedback from you?
>>>>>>>>>> 
>>>>>>>>>> Many thanks,
>>>>>>>>>> Albert
>>>>>>>> 
>>>>>>> 
>>> 
>>> 
>>> 
>>> 
>>