JRuby/Seph/PHP.reboot/... SwitchPoint usage

Christian Thalinger christian.thalinger at oracle.com
Wed Aug 10 07:26:21 PDT 2011


On Aug 8, 2011, at 8:21 PM, Christian Thalinger wrote:

> 
> On Aug 8, 2011, at 6:39 PM, Charles Oliver Nutter wrote:
> 
>> On Mon, Aug 8, 2011 at 9:51 AM, Christian Thalinger
>> <christian.thalinger at oracle.com> wrote:
>>> Since I have the basic push-notification of CallSites I'm now looking into push-notification of SwitchPoints:
>>> 
>>> 7071709: JSR 292: switchpoint invalidation should be pushed not pulled
>>> 
>>> Basically it should be the same, just needs some additional love in the compiler.
>>> 
>>> I looked into JRuby's usage of SwitchPoints and it seems it has something to do with constants.  Is there an existing benchmark that would benefit from the SwitchPoint optimization?  Seph also seems to use SwitchPoints, PHP.reboot does not (that's what grep tells me).
>> 
>> Yes, currently SwitchPoint is only used for constant lookup, since
>> constant modification invalidates globally. A good benchmark to use
>> would be this one:
>> 
>> bench/language/bench_const_lookup.rb <number of iters>
>> 
>> Here's numbers with a recent openjdk-osx-build with and without
>> invokedynamic enabled
>> 
>> WITHOUT:
>> 
>> 100k * 100 nested const get               0.059000   0.000000
>> 0.059000 (  0.059000)
>> 100k * 100 nested const get               0.059000   0.000000
>> 0.059000 (  0.059000)
>> 100k * 100 nested const get               0.058000   0.000000
>> 0.058000 (  0.058000)
>> 100k * 100 nested const get               0.059000   0.000000
>> 0.059000 (  0.059000)
>> 100k * 100 nested const get               0.057000   0.000000
>> 0.057000 (  0.057000)
>> 100k * 100 inherited const get            0.058000   0.000000
>> 0.058000 (  0.058000)
>> 100k * 100 inherited const get            0.059000   0.000000
>> 0.059000 (  0.059000)
>> 100k * 100 inherited const get            0.058000   0.000000
>> 0.058000 (  0.058000)
>> 100k * 100 inherited const get            0.058000   0.000000
>> 0.058000 (  0.058000)
>> 100k * 100 inherited const get            0.063000   0.000000
>> 0.063000 (  0.064000)
>> 100k * 100 both                           0.060000   0.000000
>> 0.060000 (  0.060000)
>> 100k * 100 both                           0.060000   0.000000
>> 0.060000 (  0.060000)
>> 100k * 100 both                           0.059000   0.000000
>> 0.059000 (  0.059000)
>> 100k * 100 both                           0.058000   0.000000
>> 0.058000 (  0.058000)
>> 100k * 100 both                           0.059000   0.000000
>> 0.059000 (  0.059000)
>> 
>> WITH: (specify -Xinvokedynamic.constants=true to JRuby, or
>> -Djruby.invokedynamic.constants=true to JVM)
>> 
>> 100k * 100 nested const get               1.321000   0.000000
>> 1.321000 (  1.321000)
>> 100k * 100 nested const get               1.311000   0.000000
>> 1.311000 (  1.311000)
>> 100k * 100 nested const get               1.305000   0.000000
>> 1.305000 (  1.305000)
>> 100k * 100 nested const get               1.293000   0.000000
>> 1.293000 (  1.294000)
>> 100k * 100 nested const get               1.292000   0.000000
>> 1.292000 (  1.293000)
>> 100k * 100 inherited const get            1.295000   0.000000
>> 1.295000 (  1.295000)
>> 100k * 100 inherited const get            1.241000   0.000000
>> 1.241000 (  1.241000)
>> 100k * 100 inherited const get            1.241000   0.000000
>> 1.241000 (  1.241000)
>> 100k * 100 inherited const get            1.244000   0.000000
>> 1.244000 (  1.244000)
>> 100k * 100 inherited const get            1.236000   0.000000
>> 1.236000 (  1.236000)
>> 100k * 100 both                           1.280000   0.000000
>> 1.280000 (  1.280000)
>> 100k * 100 both                           1.236000   0.000000
>> 1.236000 (  1.236000)
>> 100k * 100 both                           1.229000   0.000000
>> 1.229000 (  1.230000)
>> 100k * 100 both                           1.236000   0.000000
>> 1.236000 (  1.236000)
>> 100k * 100 both                           1.248000   0.000000
>> 1.248000 (  1.248000)
>> 
>> You can see there's some room for improvement :) The number should be
>> faster with invokedynamic, since the SwitchPoint form has no active
>> guard.
> 
> That's perfect!  Let's see what numbers I can come up with...

Here are the numbers for JDK 7 b147, 7071307+7071653, and 7071307+7071653+7071709:

7071307: MethodHandle bimorphic inlining should consider the frequency
7071653: JSR 292: call site change notification should be pushed not pulled 
7071709: JSR 292: switchpoint invalidation should be pushed not pulled 

JDK 7 b147:

$ jruby --server -Xinvokedynamic.constants=true bench/language/bench_const_lookup.rb 1
                                              user     system      total        real
100k * 100 nested const get               1.301000   0.000000   1.301000 (  1.176000)
100k * 100 nested const get               1.057000   0.000000   1.057000 (  1.057000)
100k * 100 nested const get               1.052000   0.000000   1.052000 (  1.052000)
100k * 100 nested const get               1.051000   0.000000   1.051000 (  1.052000)
100k * 100 nested const get               1.052000   0.000000   1.052000 (  1.052000)
100k * 100 inherited const get            1.188000   0.000000   1.188000 (  1.188000)
100k * 100 inherited const get            1.126000   0.000000   1.126000 (  1.126000)
100k * 100 inherited const get            1.125000   0.000000   1.125000 (  1.125000)
100k * 100 inherited const get            1.126000   0.000000   1.126000 (  1.126000)
100k * 100 inherited const get            1.130000   0.000000   1.130000 (  1.130000)
100k * 100 both                           1.214000   0.000000   1.214000 (  1.214000)
100k * 100 both                           1.134000   0.000000   1.134000 (  1.134000)
100k * 100 both                           1.134000   0.000000   1.134000 (  1.134000)
100k * 100 both                           1.135000   0.000000   1.135000 (  1.135000)
100k * 100 both                           1.135000   0.000000   1.135000 (  1.135000)

7071307+7071653:

$ jruby --server -Xinvokedynamic.constants=true bench/language/bench_const_lookup.rb 1
                                              user     system      total        real
100k * 100 nested const get               0.552000   0.000000   0.552000 (  0.522000)
100k * 100 nested const get               0.325000   0.000000   0.325000 (  0.325000)
100k * 100 nested const get               0.345000   0.000000   0.345000 (  0.345000)
100k * 100 nested const get               0.339000   0.000000   0.339000 (  0.338000)
100k * 100 nested const get               0.343000   0.000000   0.343000 (  0.343000)
100k * 100 inherited const get            0.477000   0.000000   0.477000 (  0.477000)
100k * 100 inherited const get            0.307000   0.000000   0.307000 (  0.308000)
100k * 100 inherited const get            0.309000   0.000000   0.309000 (  0.309000)
100k * 100 inherited const get            0.309000   0.000000   0.309000 (  0.309000)
100k * 100 inherited const get            0.307000   0.000000   0.307000 (  0.307000)
100k * 100 both                           0.486000   0.000000   0.486000 (  0.486000)
100k * 100 both                           0.346000   0.000000   0.346000 (  0.346000)
100k * 100 both                           0.340000   0.000000   0.340000 (  0.340000)
100k * 100 both                           0.347000   0.000000   0.347000 (  0.347000)
100k * 100 both                           0.340000   0.000000   0.340000 (  0.340000)

7071307+7071653+7071709:

$ jruby --server -Xinvokedynamic.constants=true bench/language/bench_const_lookup.rb 1
                                              user     system      total        real
100k * 100 nested const get               0.468000   0.000000   0.468000 (  0.438000)
100k * 100 nested const get               0.238000   0.000000   0.238000 (  0.238000)
100k * 100 nested const get               0.251000   0.000000   0.251000 (  0.251000)
100k * 100 nested const get               0.242000   0.000000   0.242000 (  0.242000)
100k * 100 nested const get               0.254000   0.000000   0.254000 (  0.254000)
100k * 100 inherited const get            0.403000   0.000000   0.403000 (  0.403000)
100k * 100 inherited const get            0.260000   0.000000   0.260000 (  0.260000)
100k * 100 inherited const get            0.255000   0.000000   0.255000 (  0.255000)
100k * 100 inherited const get            0.252000   0.000000   0.252000 (  0.252000)
100k * 100 inherited const get            0.254000   0.000000   0.254000 (  0.254000)
100k * 100 both                           0.384000   0.000000   0.384000 (  0.384000)
100k * 100 both                           0.227000   0.000000   0.227000 (  0.227000)
100k * 100 both                           0.221000   0.000000   0.221000 (  0.221000)
100k * 100 both                           0.233000   0.000000   0.233000 (  0.233000)
100k * 100 both                           0.238000   0.000000   0.238000 (  0.238000)

That's pretty nice but compared to non-indy it sucks:

JDK 7 b147:

$ jruby --server bench/language/bench_const_lookup.rb 1
                                              user     system      total        real
100k * 100 nested const get               0.271000   0.000000   0.271000 (  0.242000)
100k * 100 nested const get               0.065000   0.000000   0.065000 (  0.065000)
100k * 100 nested const get               0.052000   0.000000   0.052000 (  0.052000)
100k * 100 nested const get               0.052000   0.000000   0.052000 (  0.052000)
100k * 100 nested const get               0.051000   0.000000   0.051000 (  0.051000)
100k * 100 inherited const get            0.224000   0.000000   0.224000 (  0.224000)
100k * 100 inherited const get            0.053000   0.000000   0.053000 (  0.053000)
100k * 100 inherited const get            0.053000   0.000000   0.053000 (  0.053000)
100k * 100 inherited const get            0.054000   0.000000   0.054000 (  0.054000)
100k * 100 inherited const get            0.054000   0.000000   0.054000 (  0.054000)
100k * 100 both                           0.230000   0.000000   0.230000 (  0.230000)
100k * 100 both                           0.058000   0.000000   0.058000 (  0.058000)
100k * 100 both                           0.059000   0.000000   0.059000 (  0.059000)
100k * 100 both                           0.058000   0.000000   0.058000 (  0.058000)
100k * 100 both                           0.059000   0.000000   0.059000 (  0.059000)

Some assembly inspection showed that the performance difference between indy vs. non-indy is mostly the out-of-line calls that fall off the threshold cliff (10-15 call sites).  When we rewrite the benchmark to loop more often (10M times) but only do 50 constant lookups then it gets interesting:

JDK 7 b147:

$ jruby --server -Xinvokedynamic.constants=true bench_const_lookup.rb 1
                                              user     system      total        real
10M * 50 nested const get                37.918000   0.000000  37.918000 ( 37.844000)
10M * 50 nested const get                37.448000   0.000000  37.448000 ( 37.448000)
10M * 50 nested const get                36.845000   0.000000  36.845000 ( 36.845000)
10M * 50 nested const get                36.841000   0.000000  36.841000 ( 36.841000)
10M * 50 nested const get                36.864000   0.000000  36.864000 ( 36.864000)
10M * 50 inherited const get             37.907000   0.000000  37.907000 ( 37.907000)
10M * 50 inherited const get             37.117000   0.000000  37.117000 ( 37.117000)
10M * 50 inherited const get             37.399000   0.000000  37.399000 ( 37.399000)
10M * 50 inherited const get             37.555000   0.000000  37.555000 ( 37.555000)
10M * 50 inherited const get             37.640000   0.000000  37.640000 ( 37.640000)
10M * 50 both                            37.946000   0.000000  37.946000 ( 37.946000)
10M * 50 both                            37.928000   0.000000  37.928000 ( 37.928000)
10M * 50 both                            38.140000   0.000000  38.140000 ( 38.140000)
10M * 50 both                            38.186000   0.000000  38.186000 ( 38.186000)
10M * 50 both                            37.956000   0.000000  37.956000 ( 37.956000)

JDK 7 b147:

$ jruby --server bench_const_lookup.rb 1
                                              user     system      total        real
10M * 50 nested const get                 2.790000   0.000000   2.790000 (  2.756000)
10M * 50 nested const get                 2.576000   0.000000   2.576000 (  2.576000)
10M * 50 nested const get                 2.499000   0.000000   2.499000 (  2.499000)
10M * 50 nested const get                 2.501000   0.000000   2.501000 (  2.501000)
10M * 50 nested const get                 2.497000   0.000000   2.497000 (  2.497000)
10M * 50 inherited const get              2.556000   0.000000   2.556000 (  2.556000)
10M * 50 inherited const get              2.419000   0.000000   2.419000 (  2.419000)
10M * 50 inherited const get              2.419000   0.000000   2.419000 (  2.419000)
10M * 50 inherited const get              2.414000   0.000000   2.414000 (  2.414000)
10M * 50 inherited const get              2.418000   0.000000   2.418000 (  2.418000)
10M * 50 both                             2.546000   0.000000   2.546000 (  2.546000)
10M * 50 both                             2.419000   0.000000   2.419000 (  2.419000)
10M * 50 both                             2.417000   0.000000   2.417000 (  2.417000)
10M * 50 both                             2.414000   0.000000   2.414000 (  2.415000)
10M * 50 both                             2.421000   0.000000   2.421000 (  2.421000)

7071307+7071653+7071709:

$ jruby --server -Xinvokedynamic.constants=true bench_const_lookup.rb 1 
                                              user     system      total        real
10M * 50 nested const get                 0.590000   0.000000   0.590000 (  0.560000)
10M * 50 nested const get                 0.466000   0.000000   0.466000 (  0.466000)
10M * 50 nested const get                 0.305000   0.000000   0.305000 (  0.305000)
10M * 50 nested const get                 0.310000   0.000000   0.310000 (  0.310000)
10M * 50 nested const get                 0.304000   0.000000   0.304000 (  0.303000)
10M * 50 inherited const get              0.461000   0.000000   0.461000 (  0.461000)
10M * 50 inherited const get              0.426000   0.000000   0.426000 (  0.426000)
10M * 50 inherited const get              0.353000   0.000000   0.353000 (  0.353000)
10M * 50 inherited const get              0.355000   0.000000   0.355000 (  0.355000)
10M * 50 inherited const get              0.356000   0.000000   0.356000 (  0.356000)
10M * 50 both                             0.459000   0.000000   0.459000 (  0.458000)
10M * 50 both                             0.435000   0.000000   0.435000 (  0.435000)
10M * 50 both                             0.363000   0.000000   0.363000 (  0.363000)
10M * 50 both                             0.360000   0.000000   0.360000 (  0.360000)
10M * 50 both                             0.364000   0.000000   0.364000 (  0.364000)

Well that's really nice!  The compiler is able to optimize away all constant lookups because all guards in between are eliminated and it can prove that the constant is not used.  The method is basically empty except a little JRuby boilerplate.  Now we need a real benchmark ;-)

-- Christian

> 
> -- Christian
> 
>> 
>> - Charlie
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
> 
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev



More information about the mlvm-dev mailing list