[9] RFR(S): 8059606: Enable per-method usage of CompileThresholdScaling (per-method compilation thresholds)

Mon Jan 19 16:56:03 UTC 2015

Hi John,

On 01/16/2015 08:59 PM, John Rose wrote:
> On Jan 16, 2015, at 5:05 AM, Zoltán Majó <zoltan.majo at oracle.com> wrote:
>> Here is the new webrev: http://cr.openjdk.java.net/~zmajo/8059606/webrev.02/
> Reviewed, with some comments.

thank you for your review and for the feedback! Please see detailed 
comments below.

> Overall, I like the cleanups along the way.
>
> The basic idea of replacing a hard-coded 'mask' with an addressable variable is sound and nicely executed.
>
> I suppose that idea by itself is "S" small, but this really is a "M" or even "L" change, as Vladimir says, especially since the enhanced logic is spread all around many files.

I agree that this is not a small change any more, but I would like to 
keep the subject of this discussion unchanged so that we don't move to a 
different thread (unless you or Vladimir think it is better to change it).

> How have you regression tested this?

I used our performance infrastructure. I collected performance data on 6 
different architectures for 41 benchmark programs/suites. The data show 
that per-method compilation thresholds do not result in a statistically 
significant change of performance. One benchmark, Footprint3-Client, 
degrades ~0.5% on the X86 Client VM, but I think that is negligible.

> Have you verified that the compilation sequence doesn't change for a non-trivial use case?  A slip in the assembly code (for example) might cause a comparison against a garbage mask or limit that could cause compilation decisions to go crazy quietly.  I didn't spot any such bug, but they are hard to spot and sometimes quiet.

I did extensive manual testing on all architectures targeted by the patch.

I checked that a target method (a method for which per-method 
compilation thresholds have been specified) is indeed compiled 
sooner/later than methods for which global compilation thresholds are in 
effect. I ran tests for all combinations of +/-TieredCompilation, 
+/-ProfileInterpreter, and +/-UseOnStackReplacement
on each architecture that we support. While working on this issue I've 
discovered and reported two interpreter-related bugs, 8068652 and 8068505.

I cannot, unfortunately, guarantee that my changes are error-free, but I 
did my best to catch any possible error that I can think of and that I 
can check.

> In the sparc assembly code (more than one file), the live range of Rcounters has increased, since it is used to supply limits as well as to update the counter (which happens early in the code).
>
> To make it easier to maintain the code, I suggest renaming Rcounters to G3_method_counters.
> (As you can see, when a register has a logical name but has a complicated live range, we add the hardware name is to the logical name, to make it easier to spot interfering uses, when manually editing code.)

Thanks for the suggestion, I've changed the register's name.

> If scale==0.0 is a valid input checked specially in compileBroker, perhaps the effect of a zero should be documented?
> Suggest adding to globals.hpp:
>     "; values greater than one delay counter overflow; zero forces counter overflow immediately"
>     "; can be set as a per-method  option."

If CompileThresholdScaling is set to 0.0, all methods are interpreted.

The reason is that CompileThreshold==0 has been historically equivalent 
to setting -Xint to true. Setting CompileThresholdScaling to 0.0 scales 
down CompileThreshold to 0 and we wanted to keep VM behavior consistent 
when we've added support for the global CompileThresholdScaling flag in 
JDK-8059604.

If you think it would be good if we changed the meaning of 
CompileThreshold==0, please let me know. I'll file an RFE for it and 
change it. As CompileThreshold is a product flag, we will need CCC 
approval for that change.

As you've suggested, I added a comment to globals.hpp that precisely 
describes the behavior of CompileThresholdScaling.

> Question:  What if both a global scale and a method option for scale are both set?  Is the global one ignored?  Do they multiply?  It's worth specifying it explicitly (and checking that the logic DTRT).

Global and per-method values multiply. That behavior is now described in 
the comment in globals.hpp.

> Question:  How are the log values (like Tier0InvokeNotifyFreqLog or the result of get_scaled_freq_log) constrained to fit in a reasonable range?  (I would suppose that range is 0..31 or less.)  Should we have range clipping code in the scaler functions?

Currently InvocationCounter::number_of_count_bits=29 bits are reserved 
for counting. As a result, the value of the log2 of the notification 
frequency can be at most 30. I updated the source code accordingly.

> It would give notably simpler code, in MethodData::init, to use a branch-free setup of tier_0_invoke_notify_freq_log etc.  Set scale = 1.0 and then update it conditionally.  Special-case scale=1.0 in get_scaled_freq_log to taste.

I did that.

> Same comment about less-branchy code for methodCounters.hpp.  (It's better to have a one-way branch that sets up 'scale' than a two-way branch with duplicate setups of one or more variables.)

I changed the code according to your suggestions.

> In MethodCounters, I think the conditional scaling of _interpreter_backward_branch_limit is going to confuse someone, at some point.  It should be scaled, unconditionally, like its sibling variables.  (That would remove another somewhat verbose initialization branch!)

The value of the *global* interpreter backward branch limit 
(InvocationCounter::InterpreterBackwardBranchLimit) is computed based on 
the value of CompileThreshold (in InvocationCounter::reinitialize()). 
The backward branch limit is assigned a different value depending on 
whether interpreter profiling is enabled or not.

The logic of the *per-method* interpreter backward branch limit 
(MethodCounters::_interpreter_backward_branch_limit) is intended to be 
identical to that of the global branch limit. Therefore, I'm afraid that 
we have to keep the if-then-else construct initializing 
_interpreter_backward_branch_limit in methodCounters.hpp. I hope that I 
understood right what you've previously suggested.

> Small style nit:  The noise-word "get_" is discouraged in the style doc:
>
>>   • Getter accessor names are noun phrases, with no "get_" noise word. Boolean getters can also begin with "is_" or "has_".
>>    https://wiki.openjdk.java.net/display/HotSpot/StyleGuide
> Arguments.cpp follows this rule partially (in older code maybe?).  It would be better to decrease counter-examples to the rule instead of increase them.
>
> Bigger style nit:  Since the functions are not getting a preset value (from the arguments) but rather normalizing a provided argument value, I suggest naming them "scale_compile_threshold" (i.e., a verb phrase instead of a noun phrase).  Again from the style doc:
>
>>   • Other method names are verb phrases, as if commands to the receiver.

I did not know that, thanks for pointing it out to me. I changed

get_scaled_compile_threshold -> scaled_compiled_threshold
get_scaled_freq_log -> scaled_freq_log.

> Since you are providing overloads of the scaling functions, the header file should either contain inline code for the convenience methods, or else document how the optional argument ('scale') defaults.  I'd prefer inline code, since it is simple.  It's as much text to document with a comment as just to supply the inline overload.

I inlined the convenience methods into the header file.

Here is the new webrev: http://cr.openjdk.java.net/~zmajo/8059606/webrev.03/

>
> As I said before, nice work!

Thank you!

Best regards,

Zoltan

>
> — John