Proposal: parameters and programmatic API

Mon May 13 08:55:01 PDT 2013

Hi,

Please see the proposal for introducing parameters and programmatic API
to JMH. These two things are coming together, and the choices in one
area mandate appropriate choices in the other area. Hence we'd try to
resolve both improvements at the same time.

We will appreciate the reviews, comments, and general feedback about this.

I. PROBLEM STATEMENT

a. Parameters. Many of the workloads we have here explore some
configuration space. To iterate that space, we end up doing something
like this:

  @State
  class B {
      private List<Integer> list;

      @Setup
      public void setup() {
          int size = Integer.getInteger("size");
          list = new ArrayList<>(size);
          init(list, size);
      }

      @GenerateMicroBenchmark
      public void test() {
          for (int i = 0; i < size; i++) { ... list.get(i) ... }
      }
  }

...which requires the external script to iterate with -Dsize=...

b. Programmatic API. It turns out some of the use cases for JMH include
embedding JMH as the part of bigger experiment. This also subsumes the
need for explicit scenarios language, as building out the scenario using
pure Java API seems to be less error-prone than winding up yet another
explicit DSL.

However, API is heavily tied with the parameters, because API should
have the ability to set the parameters declared in the benchmarks. Vice
versa, parameters should be able to pick up some of the environment
settings from the JMH launcher.

II. APPROACH

+++ a. Parametrized @Env-s

Since most of the parameters are required to initialize @State-s, it
seem beneficial to start from there. While it is tempting to parametrize
@States, it is better for all various reasons to have another class
which bears the settings. The proposed syntax follows:

  @State
  class B {

      @Env
      class Settings {
          @Param("N")
          int size;
      }

      private List<Integer> list;

      @Setup
      public void setup(Env e) {
          list = new ArrayList<>(e.size);
          init(list, e.size);
      }

      @GenerateMicroBenchmark
      public void test(Env e) {
          for (int i = 0; i < e.size; i++) { ... list.get(i) ... }
      }
  }

Here, we inject the field with @Param describing the label to be used to
set it externally. The convention is that JMH should set all parameters
before calling any of the fixture methods, which will allow to naturally
init the state depending on the parameter value. It should be the
semantic error to modify the parameter field from the microbenchmark.

Naturally, for many usages, we may want to simplify this to:

  // @Env is implicit
  class B {

      @Param("N")
      int size;

      private List<Integer> list;

      @Setup
      public void setup() {
          list = new ArrayList<>(size);
          init(list, size);
      }

      @GenerateMicroBenchmark
      public void test() {
          for (int i = 0; i < size; i++) { ... list.get(i) ... }
      }
  }

Note the @Env on benchmark class is implicit, and the no intervention is
needed to get the parameter field to use either in fixtures, or the
benchmark method.

This annotation will allow us to also set the default values for the
parameters:
  @Param(name = "N", value = 100)

Also, it would be nice to allow JMH to produce the results for each
parameter configuration, where configurations are defined as the
Cartesian products for all the parameter domains. E.g., if we have the
ability to set the domains:

  class B {

      @Param(name = "N", values = {1, 10, 100, 1000, 10000})
      int size;

      @Param(name = "stride", values = {1, 2, 3, 4})
      int stride;

      private List<Integer> list;

      @Setup
      public void setup() {
          list = new ArrayList<>(size);
          init(list, size);
      }

      @GenerateMicroBenchmark
      public void test() {
          for (int i = 0; i < size; i += stride) { ... list.get(i) ... }
      }
  }

...should treat test() with (N=1, stride=1), (N=1, stride=2), ...,
(N=10000, stride=4) as separate benchmarks, and (by default) run them all.

Caveat: since Java annotations are not generic, we will end up with
type-specific annotations: @IntParam, @LongParam, @DoubleParam,
@StringParam.

+++ b. Parametric Java API

This parameter notation opens up the way to execute the benchmark from
the Java API. The proposed syntax is as follows:

  Result execute(Class<T> benchmarkClass,
                 String test,
                 Settings settings);

...where Settings is the special class holding the parameters (and also
environmental parameters, see below). Letting Settings have the proper
builder will allow us to execute the example benchmark from the previous
section as follows:

 Result r = execute(B.class, "test",
                      Settings.set("N", 1000)
                              .set("stride", 10));
 // process r, get the metrics, etc.

Fixing only one of the parameters will still allow JMH to traverse the
projection of the configuration space, e.g. traverse the strides with N
fixed, etc. We will amend this API as we unfold other parts of parameter
story.

+++ c. Command line parameters

We would need some good way to map these parameters to appropriate
command-line acceptors. Our general line of thinking is that command
line executors should ultimately use the Java API to invoke JMH. That
means the mapping of command line parameters to API calls is localized
in the specific cli module; which is currently TBD.

+++ d. Environment parameters

At times you need to get the current running modes from the JMH to drive
your initialization or even the benchmark itself. We have the doorway in
another direction: some annotations, like @Threads, @Fork,
@OperationsPerInvocations, etc, allow to request the specific running
mode from JMH. It would be tempting to use the same doorway other way
around.

For example:

   class B {
      @OperationsPerInvocation(42) // each operation is 42 times larger
      @GenerateMicroBenchmark
      public void test() {
          // do something
      }
   }

This can be modified to:

   class B {
      @OperationsPerInvocation(42) // each operation is 42 times larger
      private int ops;

      @GenerateMicroBenchmark
      public void test() {
          // do something
      }
   }

Notice the symmetry between these two cases: if we don't need the
environmental value, it's perfectly fine to omit the field, and place
the relevant annotation to the method as usual. There is a difference
though: the annotation on field means the environment is shared for all
the @GMB methods:

   class B {
      @OperationsPerInvocation(42)
      private int ops; // shared for both test1 and test2

      @GenerateMicroBenchmark
      public void test1() {
          // do something
      }

      @GenerateMicroBenchmark
      public void test2() {
          // do something
      }
   }

Luckily, we can still isolate the environments for different methods, as
with States:

   class B {

      @Env
      class E1 {
         @OperationsPerInvocation(42)
         private int ops;
      }

      @Env
      class E2 {
         @OperationsPerInvocation(84)
         private int ops;
      }

      @GenerateMicroBenchmark
      public void test1(E1 e) {
          // do something
      }

      @GenerateMicroBenchmark
      public void test1(E2 e) {
          // do something
      }
   }

Of course, this is also the better way to do this, by allowing these
annotation to accept the range of values;

   class B {
      @OperationsPerInvocation({42, 43})
      private int ops;

      @GenerateMicroBenchmark
      public void test1() {
          // do something
      }
   }

We will see why splitting the environment sometimes is a good idea in
the next section.

The programmatic API invocation for this test can be seen as follows. We
would like to be refactoring-resistant and use the class literals for
the environement and relevant annotation when referencing the parameter
we want to set. This also helps to set the environment parameter not
having the field it is bound to:

 execute(B.class, "test1",
           Settings.set(B.class, OperationsPerInvocation.class, 42);

+++ e. Asymmetric benchmarks

The real problem with the programmatic API is embracing asymmetric
benchmarks. There, we sometimes need to set the environment settings for
distinct thread types in isolation. This can be achieved by splitting
the environments:

   class B {

      @Env
      class RE {
         @Threads(4) // four readers in the group
         int readers;
      }

      @Env
      class WE {
         @Threads(1) // single writer in the group
         int writers;
      }

      @State
      class G {
         Target t = ...;
      }

      @Groups(1) // single group by default
      int groups;

      @GenerateMicroBenchmark
      @Group("asymmetric")
      public void doReads(RE e, G g) {
          g.t.read();
      }

      @GenerateMicroBenchmark
      @Group("asymmetric")
      public void doWrites(WE e, G g) {
          g.t.write();
      }
   }

@Env-s are naturally providing the namespaces for environmental
parameters. Hence, we can execute this via programmatic API with:
  execute(B.class, "asymmetric",
    Settings.set(RE.class, Threads.class, 1)
            .set(WE.class, Threads.class, 3)
            .set(B.class, Groups.class, 4)
  );

Unfortunately, the example above does not solve the most frequent case:
what if I need *both* readers and writers to initialize G? This can be
solved by allowing the fixture methods in States to accept environments:

   class B {

      @Env
      class RE {
         @Threads(4) // four readers in the group
         int readers;
      }

      @Env
      class WE {
         @Threads(1) // single writer in the group
         int writers;
      }

      @State
      class G {
         Target t = ...;

         @Setup
         public void init(RE r, WE w) {
		t = new Target(r.readers + w.writers);
         }
      }

      @Groups(1) // single group by default
      int groups;

      @GenerateMicroBenchmark
      @Group("asymmetric")
      public void doReads(RE e, G g) {
          g.t.read();
      }

      @GenerateMicroBenchmark
      @Group("asymmetric")
      public void doWrites(WE e, G g) {
          g.t.write();
      }
   }

Thus, we will provide both customizable parameters, default values and
iteration, feedback from JMH back to microbenchmark, and fold it all
into Java API. Please voice your concerns about this line of thinking
before we start to implement this.

Thanks,
Aleksey.