AW: Caliper CharMatcher Confusion

Fri Aug 8 09:12:56 UTC 2014

Hello,

Not a full analysis, but two comments:

first of all you should level the playing field by outlining your implementation into an external utilities class (like guava) which works in a CharSequence and secondly I wonder if benchmarking additionally with a much bigger haystack (longer search input by using a length and type parameter matrix (no match, all match, 10%match; len 10,100,200,500,2k) gives additional insight.

gruss
Bernd

-- 
http://bernd.eckenfels.net

----- Ursprüngliche Nachricht -----
Von: "Eugen Rabii" <eugen.rabii at gmail.com>
Gesendet: ‎08.‎08.‎2014 10:54
An: "jmh-dev at openjdk.java.net" <jmh-dev at openjdk.java.net>
Betreff: Caliper CharMatcher Confusion

So basically a weird piece of code in the famous guava library for 
CharMatcher.removeFrom
Utility method for removing a char from a String, sound not that 
complicated, until I looked at the sources:

   public String removeFrom(CharSequence sequence) {
     String string = sequence.toString();
     int pos = indexIn(string);
     if (pos == -1) {
       return string;
     }

     char[] chars = string.toCharArray();
     int spread = 1;

     // This unusual loop comes from extensive benchmarking
     OUT: while (true) {
       pos++;
       while (true) {
         if (pos == chars.length) {
           break OUT;
         }
         if (matches(chars[pos])) {
           break;
         }
         chars[pos - spread] = chars[pos];
         pos++;
       }
       spread++;
     }
     return new String(chars, 0, pos - spread);
   }

So that comment there : "This unusual loop comes from extensive 
benchmarking" got me thinking, I do not trust Caliper very much (JMH to 
blame), thus as a result I do not trust
their benchmarks too much, so I decided to code mine. In the end I 
reached almost the same logic that they did, almost. And decided to test 
it with JMH.

Can some of you professionals tell me if from a *JMH stand of point is 
this a correct testing approach*? I'm thinking to test at least cold 
start too.

package org.madmonky.guava;

import java.util.concurrent.TimeUnit;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import com.google.common.base.CharMatcher;

@Warmup(iterations=5, time=1, timeUnit=TimeUnit.SECONDS)
@BenchmarkMode(Mode.AverageTime)
@Measurement(iterations=3, time=1, timeUnit=TimeUnit.SECONDS)
@Fork(3)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
public class CharMatcherTest {
     public static void main(String[] args) throws RunnerException {
          Options opt = new OptionsBuilder()
                  .include(".*" + CharMatcherTest.class.getSimpleName() 
+ ".*")
                  .threads(4)
                  .build();

          new Runner(opt).run();
     }

     CharMatcher charMatcher;

     @Param({"e", "eeeee", "efefefefe", "eeeemeeeeseeeeer", "dfgrry", 
"wertyeoiuyeeeeeteee"})
     String input;

     char searched;

     @Setup
     public void prepare(){
         charMatcher = CharMatcher.is('e');
         searched = 'e';
     }

     @Benchmark
     public String mineFirstVersion(){
         char [] array = input.toCharArray();
         boolean reachedTheEnd = false;
         int totalCount = 0;
         for(int i=0;i<array.length;++i){
             int howManyInIteration = 0;
             while(array[i] == searched){
                 ++i;
                 ++howManyInIteration;
                 if(i == array.length) {
                     reachedTheEnd = true;
                     break;
                 }
             }
             totalCount += howManyInIteration;
             if(!reachedTheEnd) array[i-totalCount] = array[i];
         }
         return new String(array, 0, (array.length - totalCount));
     }

     @Benchmark
     public String guavaRemoveFrom(){
         return charMatcher.removeFrom(input);
     }
}

P.S. The results (if the approach is correct) show that their 
implementation is a bit slower.

Thank you very much for your time,
Eugene.