Detecting range check elimination with PrintAssembly
Vladimir Kozlov
vladimir.kozlov at oracle.com
Sun Jan 15 14:30:12 PST 2012
Manohar,
As Kris pointed you need to fix your example:
void method1(){
...
for(int k = 0; j < cols; k++){
^ 'k' ?
void method2(){
...
for(int j=0 ; i< rows2; j++){
^ 'j' ?
Second, your two test methods are different so you can't directly compare them. method1() iterates over rows using
middle loop index 'j' and method2() uses external loop index 'i'. Unless they are typos again.
Third, PrintOptoAssembly flag is not 'old'. It can be only used in debug VM to print 'pseudo' assembler by JIT compiler
itself and not by hsdis disassembler. PrintAssembly flag can be used with product VM but it needs hsdis library.
If you are using jdk7 there are few flags you can use to print loop optimizations information. They need debug version
of VM but it is not problem for you, I think, since you can use debug PrintOptoAssembly flag.
-XX:+TraceLoopOpts prints all happened loop optimizations and loop tree after each round of loop opts,
-XX:+TraceLoopPredicate prints RC information when it is moved from a loop,
-XX:+TraceRangeLimitCheck prints additional information for RC elimination optimization.
Fourth, range check expression in your example is not what you think. RC expression should be next:
(i*stride+offset) where 'i' is loop variable, 'stride' is constant and 'offset' is loop invariant.
In your example 'offset' is (j * cols) since it is loop invariant, 'k' is loop variable and stride is '1' (one).
In both your methods RC will be moved out of inner loop so the code for it will be the same. The only difference in
these methods will be where and how (j * cols) and (i * cols) expressions are calculated.
Regards,
Vladimir
On 1/15/12 12:22 PM, Krystal Mok wrote:
> Hi,
>
> In your PrintOptoAssembly output snippet, the instruction at 0x13e is a LoadRange, which loads the range from the header
> of an array:
>
> (from x86_64.ad <http://x86_64.ad>)
> // Load Range
> instruct loadRange(rRegI dst, memory mem)
> %{
> match(Set dst (LoadRange mem));
>
> ins_cost(125); // XXX
> format %{ "movl $dst, $mem\t# range" %}
> opcode(0x8B);
> ins_encode(REX_reg_mem(dst, mem), OpcP, reg_mem(dst, mem));
> ins_pipe(ialu_reg_mem);
> %}
>
> That's not a range check just yet; the real check, if any, should come after the null check, in the form of comparing
> something else with RSI. But you didn't show what's after the null check, how RSI is used, so it's hard to say what
> you're seeing in your example.
>
> As for the two test examples, could you paste the entire source code, with the PrintOptoAssembly output of method1() and
> method2() ? The first example looks weird, maybe it's a typo but you're using "j < cols" as the loop condition for the
> inner loop.
>
> I'd guess it's the difference in locality that made the difference in performance in your two tests.
>
> - Kris
>
> On Mon, Jan 16, 2012 at 1:59 AM, Manohar Jonnalagedda <manojo10386 at gmail.com <mailto:manojo10386 at gmail.com>> wrote:
>
> Hello,
>
> following this reference on Range Check Elimination done by the Hotspot compiler [1], I was keen in knowing how I
> can detect whether range checks are taking place in loops by inspecting output using the PrintAssembly flag; with
> the old PrintOptoAssembly flag, I have seen output such as the following, which I assume to be range checks :
>
> B11: # B73 B12 <- B10 Freq: 1.21365
> 139 movq RAX, [rsp + #24] # spill
> 13e movl RSI, [RAX + #12 (8-bit)] # range
> 141 NullCheck RAX
>
> What is the equivalent with the new PrintAssembly flag (using hsdis)?
>
> Moreover, as stated on the wiki page [1], loops are optimized if the stride is a compile-time constant. I performed
> a few tests on a kmeans program, with 3 nested loops, having the following (high-level) structure:
>
> ===
> void method1(){
> //loop 1
> for(int i = 0; i< rows1; i++){
> //...
> for(int j = 0; j< rows2; j++){
> //...
> for(int k = 0; j < cols; k++){ array[j * cols + k] = //...}
> }
> }
> }
>
> void method2(){
> //loop 2
> for(int i =0; i < rows1; i++){
> for(int j=0 ; i< rows2; j++){
> for(int k=0 ; k< cols; k++){
> array[i*cols+k] = //...
> }
> }
> }
> }
>
> void main(){
>
> do{
> method1(); method2();
> }while(!converged)
>
> }
> ====
>
> In the first test, cols is an int whose value is determined at runtime (by reading a file), in the second test, it
> is given as a compile-time constant(3). In the second test, there is a */significant*/ speed-up (around 40%).
> However, when studying the diff of the output of PrintOptoAssembly for both method1 and method2, there is no
> difference (apart from slight value changes in frequency). Would you have any hints as to where I could look for
> differences?
>
> Thanks a lot,
> Manohar
>
> [1] https://wikis.oracle.com/display/HotSpotInternals/RangeCheckElimination
>
>
More information about the hotspot-dev
mailing list