Detecting range check elimination with PrintAssembly
Krystal Mok
rednaxelafx at gmail.com
Sun Jan 15 12:22:59 PST 2012
Hi,
In your PrintOptoAssembly output snippet, the instruction at 0x13e is a
LoadRange, which loads the range from the header of an array:
(from x86_64.ad)
// Load Range
instruct loadRange(rRegI dst, memory mem)
%{
match(Set dst (LoadRange mem));
ins_cost(125); // XXX
format %{ "movl $dst, $mem\t# range" %}
opcode(0x8B);
ins_encode(REX_reg_mem(dst, mem), OpcP, reg_mem(dst, mem));
ins_pipe(ialu_reg_mem);
%}
That's not a range check just yet; the real check, if any, should come
after the null check, in the form of comparing something else with RSI. But
you didn't show what's after the null check, how RSI is used, so it's hard
to say what you're seeing in your example.
As for the two test examples, could you paste the entire source code, with
the PrintOptoAssembly output of method1() and method2() ? The first example
looks weird, maybe it's a typo but you're using "j < cols" as the loop
condition for the inner loop.
I'd guess it's the difference in locality that made the difference in
performance in your two tests.
- Kris
On Mon, Jan 16, 2012 at 1:59 AM, Manohar Jonnalagedda <manojo10386 at gmail.com
> wrote:
> Hello,
>
> following this reference on Range Check Elimination done by the Hotspot
> compiler [1], I was keen in knowing how I can detect whether range checks
> are taking place in loops by inspecting output using the PrintAssembly
> flag; with the old PrintOptoAssembly flag, I have seen output such as the
> following, which I assume to be range checks :
>
> B11: # B73 B12 <- B10 Freq: 1.21365
> 139 movq RAX, [rsp + #24] # spill
> 13e movl RSI, [RAX + #12 (8-bit)] # range
> 141 NullCheck RAX
>
> What is the equivalent with the new PrintAssembly flag (using hsdis)?
>
> Moreover, as stated on the wiki page [1], loops are optimized if the
> stride is a compile-time constant. I performed a few tests on a kmeans
> program, with 3 nested loops, having the following (high-level) structure:
>
> ===
> void method1(){
> //loop 1
> for(int i = 0; i< rows1; i++){
> //...
> for(int j = 0; j< rows2; j++){
> //...
> for(int k = 0; j < cols; k++){ array[j * cols + k] = //...}
> }
> }
> }
>
> void method2(){
> //loop 2
> for(int i =0; i < rows1; i++){
> for(int j=0 ; i< rows2; j++){
> for(int k=0 ; k< cols; k++){
> array[i*cols+k] = //...
> }
> }
> }
> }
>
> void main(){
>
> do{
> method1(); method2();
> }while(!converged)
>
> }
> ====
>
> In the first test, cols is an int whose value is determined at runtime
> (by reading a file), in the second test, it is given as a compile-time
> constant(3). In the second test, there is a **significant** speed-up
> (around 40%). However, when studying the diff of the output of
> PrintOptoAssembly for both method1 and method2, there is no difference
> (apart from slight value changes in frequency). Would you have any hints as
> to where I could look for differences?
>
> Thanks a lot,
> Manohar
>
> [1]
> https://wikis.oracle.com/display/HotSpotInternals/RangeCheckElimination
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-dev/attachments/20120116/691adbe4/attachment.html
More information about the hotspot-dev
mailing list