Variants/case classes/algebraic data types/sums/oh my!

Sun Jun 12 17:30:21 UTC 2016

On Saturday, June 11, 2016, John Rose <john.r.rose at oracle.com> wrote:

> On Jun 11, 2016, at 5:12 PM, forax at univ-mlv.fr <javascript:;> wrote:
> >
> >
> > De: "John Rose" <john.r.rose at oracle.com <javascript:;>>
> > À: "Rémi Forax" <forax at univ-mlv.fr <javascript:;>>
> > Cc: "org openjdk" <org.openjdk at io7m.com <javascript:;>>,
> valhalla-dev at openjdk.java.net <javascript:;>
> > Envoyé: Dimanche 12 Juin 2016 01:02:52
> > Objet: Re: Variants/case classes/algebraic data types/sums/oh my!
> > Hi John,
> >
> >>> On Jun 11, 2016, at 4:18 AM, Remi Forax <forax at univ-mlv.fr
> <javascript:;>> wrote:
> >>>
> >>> because K is reified, there will be two methods eval() at runtime, one
> specialized for K = 0 and an other for K = 1,
> >>> which means that the switch will be constant fold (the real 'switch'
> is done when selecting the specialization).
> >>>
> >> That is an interesting use case for non-type class template parameters!
> >>
> > Here is another one,
> > class ArrayList<any E> {
> >   final E[] array;
> >   int size;
> >
> >   public <Consumer<? super E> U> void forEach(U consumer) {
> >      for(int i = 0; i < size; i++) {
> >        consumer.accept(array[i]);
> >      }
> >   }
> > }
>
> I think there are two simple ways to interpret your suggestion.  One is to
> split and specialize forEach by concrete consumer *type*, and one is to
> specialize by consumer *instance*.
>
>   public <__SpecializePerType U extends Consumer<? super E>> void
> forEach(U consumer) {
>      for(int i = 0; i < size; i++) {
>        consumer.accept(array[i]);
>      }
>   }
>   public <__SpecializePerInstanceOf U extends Consumer<? super E>> void
> forEach(U consumer) {
>      for(int i = 0; i < size; i++) {
>        consumer.accept(array[i]);
>      }
>   }
>
>
> > In the snippet above, forEach is specialized by the value of the lambda
> taken as argument, which is another way to say that he code of forEach and
> the code of the lambda should be inlined together.
>
> Yes; function parameters to templates are an important organizing feature
> for C++ STL algorithms.  The sort method is also a good candidate for such
> treatment.  C++ expresses per-type specialization using type parameters and
> per-instance specialization using non-type parameters.

Goes without saying, but non-type parameters are very useful to propagate
constants, possibly quite "far" with sufficient inlining, and open up more
optimization opportunities.  I'd love something similar in Java.

>
> The general problem here is what I call the loop customization problem,
> which is to code loops once and compile them into enough specialized
> versions to unlock a desirable level of optimization.  The JVM ought to
> create and maintain a library of specializations of forEach (or sort), one
> for each significantly different "loop shape syndrome" determined by a
> specific consumer (or comparator, for sort).
>
> A syndrome must specify all the information needed to produce well
> optimized native code for the algorithm loop, but it should not
> over-specify details not needed to optimize the loop.  For example, the
> consumer may be packing results into an array; the syndrome should express
> that fact, but not over-specify by working only for one specific array
> instance.  Should the syndrome specify the exact type of the array?  That's
> an open question.
>
> What if the consumer is packing away only those elements that match some
> filter predicate?  In that case the filter predicate is part of the
> syndrome too, since the loop needs to open-code it.  What if it's a
> bound-checking predicate; should the exact bounds be part of the syndrome?
> Probably not but that's an open question too.  Or, if the collection is has
> a non-unit stride (into the sink array), should the stride be syndrome or
> not?  Again, more open questions, to be solved by a mixture of static
> guidance from the programmer and online feedback from profile and type
> hierarchy.
>
> You can see how streams intensify the loop customization problem by mixing
> together syndrome and non-syndrome data into a single stream; the terminal
> operation needs to sort it out and select from a repertoire of efficient
> loop bodies.  This is not soluble by lots of inlining (and so it's not
> exactly what some call the "inlining problem"), because the control flow
> might spread across multiple threads (for parallel streams).

Hmm, why would control flow being spread across threads imply this isn't
the "inlining problem"? Seems like two orthogonal issues.

>
> If you use the C++ approach, each function is a distinct syndrome
> triggering a distinct specialization of the algorithm.  I think the STL
> libraries are organized carefully so that template parameters are *always*
> used as syndromes, and normal parameters are *never* used as syndromes.
> This makes sense, but the cost is a static decision between what's shared
> and what's split for optimization, with a bias towards splitting and code
> bloat.  But in the JVM we like to give the runtime environment a share in
> decisions about sharing vs. splitting for optimization, so it's a more
> delicate question.  The JVM likes to start with code sharing (even
> interpretation) and then "ramp up" splitting and optimization where there
> is clear evidence that the investment makes sense.  We get to good loops
> even though there is no static splitting or optimization.

The JVM has the advantage of knowing the hot code paths (well, C++
compilers with PGO are similar).  Splitting/specializing for those and
"incurring code bloat" ought to pay off assuming the
splitting/specialization provides more pertinent info to the optimizer
(fairly safe assumption for most cases, I think).  So what's needed is to
embed enough information into the bytecode/metadata such that the JIT can
exercise its optionality on splitting/specialization.

>
> So in the case of List.forEach either formulation of specialization (by
> type, or by instance) is only an imperfect starting point, if our goal is
> to get the best loops.
>
> In our current setup, we get per-type specialization *if* the provided
> consumer is a value type.  If it's a legacy reference type, we will use
> erasure.  Perhaps the closest we can get in today's draft design space is
> this:
>
>   public <val U extends Consumer<? super E>> void forEach(U consumer) {
>      for(int i = 0; i < size; i++) {
>        consumer.accept(array[i]);
>      }
>   }
>
> This would seem to say, "I need a value type which implements the given
> interface U, please, and I want you to create a specialization for each
> such type."  (To be clear:  Such a use of "val" is only a suggestion, not
> any plan of record.  But it does fall out of some of our intensive
> conversations last month, in which it appears that interfaces may have
> "val", "ref", and "any" specializations, with respect to their "this" type.)

Indicating the need/desire for specialization by using value types seems
subpar as it's mixing two different concerns.  What if my consumer needs to
mutate itself upon visiting the elements? Or has some other requirement
that makes it unsuitable/undesirable to be a value type? Seems like we'd
want a different mechanism to indicate this intent.

>
> In that case, the JVM has to activate some sort of library mechanism for
> maintaining specializations of forEach, which is good.  Initially, there
> will be a separate specialization only per *type* of consumer, which will
> probably capture the presence of a filter predicate (if there is one) but
> not the nature of the predicate.  So this goes a certain way towards
> defining a syndrome that will solve the loop customization problem, but not
> all the way.
>
> >> (The C++ world is full of non-type template parameters; the question
> there is which small slice of that complexity we might want to adopt.)
> >
> > For now, i would say 'try to come with a specialization mechanism as
> general as possible' and see if it's possible,
> > a user defined SpecializeDynamic being the ultimate golden hammer
> (Mjölnir ?).
> > All the specializations by a value can be implemented by doing constant
> pool patching (as this is done by unsafe.defineAnonymousClass),
> > but here we don't want the anonymous part (and we don't need the host
> class access because it's superseded by the nestmate idea).
> >
> >> Even without int-typed or enum-typed template parameters, you could do
> what you need for ADTs by using enums and a sealed top type.
> >>
> >> Here's a variation that works today:
> >> ...
> > yes, but it's less valhalla-ish no :)
>
> Totally.  But it's also a heuristic for better code shapes in the Valhalla
> future.
>
> >> We will want to give users a way to code such things with values also.
> But since values can cannot take part in subtype relations abstract
> classes, the pattern above would have to be reformulated to use wildcards
> or interfaces.  (Abstract classes are pretty old fashioned anyway.)
> > since we have default method in interface now, designing with a public
> abstract class is an error, an abstract class is a class so it leaks
> implementation details.
>
> Yep.  An abstract class is not abstract enough, and interfaces are getting
> concrete enough, by stealing features from abstract classes.  Which is why
> I am toying with the idea of putting pseudo-constructors in interfaces, in
> order to steal away the ability of a supertype to control its subtypes by
> restricting access to its "constructor".  Why can't an interface have a
> nullary no-op constructor, just like Object?  (We can make it abstract to
> avoid questions of behavior.)  Anyway, abstract classes can seal their
> subtypes, and that's what I want to steal away from them.

Abstract classes still optimize better (eg single impl CHA case) and
dispatch quicker (in megamorphic situations).  One could define an abstract
class with nothing but virtual methods, in which case it doesn't leak any
impl details, but just shapes the type topology of implementers.

>
> >> With wildcards, you could have Expr<Value>, Expr<Add>, and Expr<?>,
> where Value and Add would be contained in Expr instead of deriving from
> Expr; all would be values.
> >
> > being an Expr instead of being a subtype of an Expr is important, but i
> don't know if it's not just for type safety, so it can be a problem for the
> compiler and not the runtime.
>
> Expr becomes a box for either a Value or an Add (but not both, and not
> anything else).  That models the categorical mathematics well enough, and
> everything else is sugar.  (From a certain distance.)
>
> >
> >> With interfaces, you'd have Value, Add, and Expr, where Expr is an
> interface and the others are values.
> >> ...
> >> In all of the sample code above, the use of enums and accessors,
> instead of lambdas and visitors, is a old fashioned "retro" style.  It is
> exactly analogous to the external iterators of early Java instead of the
> stream-shaped internal iterators of Java 8.  I like to work with the retro
> external decoding style because it leads to simple bytecode shapes, and can
> be sugared pretty well (cf. Java extended-for).
> >>
> > For a human, in term of Java code, the internal iteration is easier to
> write (at least until we have some ways to define a coroutine or a
> generator), but it's harder for Hotspot.
>
> Yes.  That's why extended-for is such a brilliant bit of sugar.  Codes
> like a catamorphism, JITs like Fortran.  (I also look forward to
> for-comprehensions and "extended-do" some day, but sequential-code monads
> come after streams, values, and ADTs.)
>
> > I want to see that as an implementation issue more than a fundamental
> rule.If Hotspot was able to not share the profiles between different inline
> blobs of the same code by c1, then c2 will have less
> polymorphic/megamorphic callsites to deal with (i beleive that both
> TurboFan(v8/chrome) and FTL(nitro/safari) doesn't have that issue).
>
> That will help, but IMO split profiles are not a full solution, not even
> an 80% solution.  It's too automagic, depending on obscure differences
> between inlining policies of tiers.  The problem is the programmer can't
> easily predict and control the splits.  It will work for demos and
> benchmarks and when it fails there's no recourse.  We need some way to
> (gently, provisionally) distinguish syndrome from non-syndrome, and point
> out the points where syndromes have to be baked into customized loops.

The existing JIT is very automagic as is, why stop now? :) Jokes aside,
having more determinism would be great.

>
> >>  But in the end we might want to use a more modern lambda-driven style
> (unapply or some other "morphism").  The main design problem with internal
> iterators or matchers is they tend to require lots of auxiliary types to
> express the functional APIs.
> > but less functional types means a way to define structural types, no ?
>
> I guess I want to say I think we should reach for both.  A "syndrome" as
> I've been describing it is really a bit of structural typing, and it can
> "hide" under a simple-looking function or stream.  It's like syntax sugar
> for loops that the JIT expands into simple code.
>
> — John

-- 
Sent from my phone