What can we improve in JSR292 for Java 9?

Rémi Forax forax at univ-mlv.fr
Wed Aug 26 13:06:40 UTC 2015


Hi John,
comment inlined

Le 26 août 2015 00:30:00 CEST, John Rose <john.r.rose at oracle.com> a écrit :
>On Feb 25, 2015, at 6:29 PM, John Rose <john.r.rose at oracle.com> wrote:
>> 
>> Maybe this is general enough:
>> 
>>    MHs.loop(init, predicate, body)(*a)
>>    => { let i = init(*a); while (predicate(i, *a)) { i = body(i, *a);
>} return i; }
>> 
>> ...where the type of i depends on init, and if init returns void then
>you
>> have a classic side-effect-only loop.
>
>FTR, I've been thinking more about this one, with help from Michael
>Haupt.
>
>The above loop is pretty good, except for a couple of things: 'i' ought
>to be tuple of loop-varying values, and the exit path needs a more
>general post-processing step, a "finalizer" that corresponds to the
>initializers that set up the loop variables.
>
>A generalized (linear) loop has two kinds of values flowing within it:
>loop-invariant and loop-varying. (The control predicate has a special
>role, but may be classed as loop-varying. The finalizer is also
>special, and loop-invariant.) Loop-varying values can be inputs,
>control values (array indices, etc.), and outputs, but can all, I
>think, be modeled as "step" functions using a common signature.
>
>This takes us to something like the C for-loop, but with a sprinkle of
>tuples, with a finishing bow around it to tie it up:
>for ((v1,v2,v3,…) = inits; predicate(v1…); (v1…) = step(v1…))  { … }; 
>return finish(v1…);
>
>Loop-local but invariant values can be modeled as non-updated loop
>variables, as in this C idiom:
>  for (int i, limit = computeLimit(); i < limit; i++) …
>
>Pure side effects can be modeled as void-valued step functions, so we
>want to allow void as a corner case for loop variable types.  (I.e.,
>where there is a step but not a value.)  I don't want to make any more
>special provision for pure side effects (such as a unique Runnable-like
>"body" for a loop), both because it isn't necessary, and because it
>encourages sloppy side-effect-driven loops.
>
>If we allow each loop variable to have its own predicate and finalizer,
>we can properly model the effects of ordered guards and special
>early-exit paths.
> 
>loop({init*}, {pred*}, {step*}, {fini*}) := lambda (arg*) {
>  let var* = init*(arg*)
>  for (;;) {
>    for ((v,p,s,f) in (var*, pred*, step*, fini*))
>      if (!p(var*, arg*))
>        return f(var*, arg*)
>      v = s(var*, arg*)
 // all vars help update each var
>    }
>  }
>}
>
>All invocations are exact.
>
>As a bonus we can easily derive post-checked loops (do-while), and
>time-shifted index variables (p++) simply by adding more variables
>and/or steps.

but i think i prefer to have pre and post updates instead of several predicates/finishers.

Like this:
loop({init*}, {prestep*}, {pred}, {poststep*}, {fini}) := lambda (arg*) {
  let var* = init*(arg*)
  for (;;) {
    foreach v in vars:
      v = prestep(var*, args*)
    if (!pred(var*, arg*))
      return fini(var*, arg*)
    foreach v in vars:
      v = poststep(var*, args*)
  }

it's less powerful but it seems you can encode all your examples of loops too.

>Constraints on parameter and return types: 
>- There must be the same number of inits, preds, steps, and finis.
>- There must be at least one input function (e.g., a pred, so that the
>loop has a chance to exit).
>- Parameter lists of the init functions must all be the same.
>- Parameter lists of the pred, step, and fini functions must all be the
>same.
>- Parameter lists of the step functions must be a suffix of the
>parameter lists of the init functions.
>- Parameters of the step functions must be the same as the return types
>of the init functions (with voids removed), followed by the common
>suffix.
>- Return types of the predicates must all be boolean.
>- Return types of the finalizers must all be the same (and can be
>void).
>- The return type of each init function must match the return type of
>the corresponding step function (a "var" type or void).
>
>The result, a looping method handle, has a return type taken from the
>finalizer function(s), and argument types taken from the init
>functions.  The loop has one iteration variable for each non-void
>init/step function pair.  Each step/pred/fini call takes all of the
>iteration variable values, followed by all the external
>(loop-invariant) arguments.  No throws are caught by the loop; throws
>cause the loop to terminate abnormally.
>
>In some loops, such as "find first match", the predicate naturally
>follows the step function, because it needs to evaluate the result of
>the step.  To model post-checks (and the classic do/while shape), a
>second "void-valued" loop variable is required:
>
>loop(...) := lambda (arg*) {
>  let (var, _) = (null, void)
>  for (;;) {
>    if (_) return _ // trivial predicate: exit branch never taken
>    var = s(_, arg*)  // pick up next search key
>    if (!p(var, arg*))  // did we find it?
>      return f(var, arg*)  // extract value from search key as needed
>    _ = _(var, arg*) // void value, no side effect
>  }
>}
>
>This variable-doubling could be compressed out if there were a further
>boolean (per pred/step pair) which says which order to run them in.  To
>me this seems like too much ad hoc structure; such structure tends to
>creep over time as we discover other important patterns.  I think it
>would be better to support optionality for common "no op" loop
>components, as follows:
>
>Optionalities:
>- If an init is null, a constant zero (or null/false/void) function is
>supplied by inspecting the return type of the corresponding step
>function.
>- If a pred is null, a constant "true" function is supplied.
>- If a step is null, a suitable identity function is supplied by
>inspecting the return type of the corresponding init function.
>- If a fini is null, a constant zero function is supplied based on a
>non-null fini; if all are null then void is assumed.
>- Missing parameters (of any function) corresponding to loop-invariant
>arg* values are silently filled in (as if by dropArguments).
>- Other missing parameters (of loop variables passed to non-init
>functions) are also silently filled in (as if by dropArguments).
>- However, all missing parameters must be supplied after the present
>parameters; they may not be "mixed into" present parameters.
>
>The missing-parameter rules are as with GWT and CE; they are easy for
>the runtime to implement and grossly inconvenient for users if absent. 
>Perhaps they can be made a little more one-sided, by forcing the step
>functions (if present) to carry all arguments.
>
>Concrete type:
>
>MethodHandle loop(MethodHandle[] inits, MethodHandle[] preds,
>MethodHandle[] steps, MethodHandle[] finis);
>
>Another possibility would be to allow a free intermixing of any number
>of steps (var updates) with any other number of pred/fini pairs:
>
>MethodHandle loop({MethodHandle init, MethodHandle step  || 
>MethodHandle pred, MethodHandle fini}…);
>
>But, it is trivial to convert such a structure to the above structure
>by injecting nulls for the missing pairs, to make up a full set of
>quadruples.  So I think the fixed-quadruple format is general enough
>without sacrificing expressiveness.
>
>Various convenience specializations are possible, but can all be
>code-generated via the general template.  The bytecodes to be generated
>from the general template are obvious from the formula above, and will
>JIT well.
>
>Here are some examples:
>
>MethodHandle finderLoop(MethodHandle body, MethodHandle finder,
>MethodHandle fini=null);
>// do () { v = body(a*); } while (!finder(v,a*)); return fini(v,a*);
>
>MethodHandle cursorLoop(MethodHandle init, MethodHandle pred,
>MethodHandle body, MethodHandle step, MethodHandle fini);
>// for (v=init(a*); pred(v,a*); v=step(v,a*)) { body(v,a*); }; return
>fini(v,a*);
>
>MethodHandle countedLoop(int start=0, MethodHandle limit, int incr=1,
>MethodHandle init2, MethodHandle pred2, MethodHandle body2,
>MethodHandle fini=null);
>// for (v'=start, vlim=limit(a*), v2=init2(a*), v=v'; (v'++)<vlim &&
>pred2(v,vlim,v2,a*); v=v') { v2=body2(v,vlim,v2,a*); }; return
>fini(v,vlim,v2,a*);
>
>MethodHandle loop(MethodHandle[/*4*/]... quadOfInitPredStepFini);
>
>(What's your favorite loop?  How well does it fit into this scheme?)
>
>Library functions for loop clauses (encoded as quads) might also be
>interesting, but this starts to take us into territory where streams
>can do a better job:
>
>MethodHandle[][] countedLoopClause(int start, MethodHandle limit, int
>incr, MethodHandle[] moreClauses…) {
>MethodHandle[] setLimitQuad = { limit, /*no pred/step/exit*/ null,
>null, null };
>MethodHandle[] checkLimitQuad = { /*no init/pred*/ null, null,
>Integer::compareLT, null };
>  MethodHandle[][] res = new MethodHandle[2+moreClauses.length][];
>  res[0] = setLimitQuad; res[1] = checkLimitQuad;
>  System.arraycopy(moreClauses, 0, res, 2, moreClauses.length);
>  return res;
>}
>
>Not so good, that one.
>
>Anyway, my basic position here is that there ought to be a loop shape
>which is (a) general enough to encode the normal special cases, and (b)
>easy to generate into bytecodes.  If we can build everything from that
>one loop shape, our code generation and optimization will be made
>easier to implement, hence more effective.
>
>And, I claim this is a worthwhile puzzle to puzzle on.
>
>— John

Rémi 



More information about the mlvm-dev mailing list