Variants/case classes/algebraic data types/sums/oh my!

Sun Jun 12 22:50:52 UTC 2016

----- Mail original -----
> De: "John Rose" <john.r.rose at oracle.com>
> À: "Rémi Forax" <forax at univ-mlv.fr>
> Cc: "org openjdk" <org.openjdk at io7m.com>, valhalla-dev at openjdk.java.net
> Envoyé: Dimanche 12 Juin 2016 03:34:48
> Objet: Re: Variants/case classes/algebraic data types/sums/oh my!
> 
> On Jun 11, 2016, at 5:12 PM, forax at univ-mlv.fr wrote:
> > 
> > 
> > De: "John Rose" <john.r.rose at oracle.com>
> > À: "Rémi Forax" <forax at univ-mlv.fr>
> > Cc: "org openjdk" <org.openjdk at io7m.com>, valhalla-dev at openjdk.java.net
> > Envoyé: Dimanche 12 Juin 2016 01:02:52
> > Objet: Re: Variants/case classes/algebraic data types/sums/oh my!
> > Hi John,
> > 
> >>> On Jun 11, 2016, at 4:18 AM, Remi Forax <forax at univ-mlv.fr> wrote:
> >>> 
> >>> because K is reified, there will be two methods eval() at runtime, one
> >>> specialized for K = 0 and an other for K = 1,
> >>> which means that the switch will be constant fold (the real 'switch' is
> >>> done when selecting the specialization).
> >>> 
> >> That is an interesting use case for non-type class template parameters!
> >> 
> > Here is another one,
> > class ArrayList<any E> {
> >   final E[] array;
> >   int size;
> > 
> >   public <Consumer<? super E> U> void forEach(U consumer) {
> >      for(int i = 0; i < size; i++) {
> >        consumer.accept(array[i]);
> >      }
> >   }
> > }
> 
> I think there are two simple ways to interpret your suggestion.  One is to
> split and specialize forEach by concrete consumer *type*, and one is to
> specialize by consumer *instance*.
> 
>   public <__SpecializePerType U extends Consumer<? super E>> void forEach(U
>   consumer) {
>      for(int i = 0; i < size; i++) {
>        consumer.accept(array[i]);
>      }
>   }
>   public <__SpecializePerInstanceOf U extends Consumer<? super E>> void
>   forEach(U consumer) {
>      for(int i = 0; i < size; i++) {
>        consumer.accept(array[i]);
>      }
>   }
> 

I was suggesting the later, hence the <Consumer<? super E> U> and not <U extends Consumer<? super E>>.

> 
> > In the snippet above, forEach is specialized by the value of the lambda
> > taken as argument, which is another way to say that he code of forEach and
> > the code of the lambda should be inlined together.

> 
> Yes; function parameters to templates are an important organizing feature for
> C++ STL algorithms.  The sort method is also a good candidate for such
> treatment.  C++ expresses per-type specialization using type parameters and
> per-instance specialization using non-type parameters.
> 
> The general problem here is what I call the loop customization problem, which
> is to code loops once and compile them into enough specialized versions to
> unlock a desirable level of optimization.  The JVM ought to create and
> maintain a library of specializations of forEach (or sort), one for each
> significantly different "loop shape syndrome" determined by a specific
> consumer (or comparator, for sort).

It depends when you do the specialization. If the specialization occurs only at JITing time, not in the interpreter,
you don't have to maintain such library, or said differently, the code cache is your library because your unit is a method that inline possibly several specializations. 

> 
> A syndrome must specify all the information needed to produce well optimized
> native code for the algorithm loop, but it should not over-specify details
> not needed to optimize the loop.  For example, the consumer may be packing
> results into an array; the syndrome should express that fact, but not
> over-specify by working only for one specific array instance.  Should the
> syndrome specify the exact type of the array?  That's an open question.
> 
> What if the consumer is packing away only those elements that match some
> filter predicate?  In that case the filter predicate is part of the syndrome
> too, since the loop needs to open-code it.  What if it's a bound-checking
> predicate; should the exact bounds be part of the syndrome?  Probably not
> but that's an open question too.  Or, if the collection is has a non-unit
> stride (into the sink array), should the stride be syndrome or not?  Again,
> more open questions, to be solved by a mixture of static guidance from the
> programmer and online feedback from profile and type hierarchy.
> 
> You can see how streams intensify the loop customization problem by mixing
> together syndrome and non-syndrome data into a single stream; the terminal
> operation needs to sort it out and select from a repertoire of efficient
> loop bodies.  This is not soluble by lots of inlining (and so it's not
> exactly what some call the "inlining problem"), because the control flow
> might spread across multiple threads (for parallel streams).

I don't think you need to separate between syndrome and non-syndrome if the specialization occurs only at JIT time.

> 
> If you use the C++ approach, each function is a distinct syndrome triggering
> a distinct specialization of the algorithm.  I think the STL libraries are
> organized carefully so that template parameters are *always* used as
> syndromes, and normal parameters are *never* used as syndromes.  This makes
> sense, but the cost is a static decision between what's shared and what's
> split for optimization, with a bias towards splitting and code bloat.  But
> in the JVM we like to give the runtime environment a share in decisions
> about sharing vs. splitting for optimization, so it's a more delicate
> question.  The JVM likes to start with code sharing (even interpretation)
> and then "ramp up" splitting and optimization where there is clear evidence
> that the investment makes sense.  We get to good loops even though there is
> no static splitting or optimization.

yes, i fully agree.

> 
> So in the case of List.forEach either formulation of specialization (by type,
> or by instance) is only an imperfect starting point, if our goal is to get
> the best loops.

The goal is to have a user defined way to ask for specialization. As you said the runtime is free to not do any optimizations at all,
or the runtime is free to consider the lambda instance as a constant and uses that information to merge the code of the loop with the code of the lambda
(at least in the case of the instance specialization). 

> 
> In our current setup, we get per-type specialization *if* the provided
> consumer is a value type.  If it's a legacy reference type, we will use
> erasure.  Perhaps the closest we can get in today's draft design space is
> this:
> 
>   public <val U extends Consumer<? super E>> void forEach(U consumer) {
>      for(int i = 0; i < size; i++) {
>        consumer.accept(array[i]);
>      }
>   }
> 
> This would seem to say, "I need a value type which implements the given
> interface U, please, and I want you to create a specialization for each such
> type."  (To be clear:  Such a use of "val" is only a suggestion, not any
> plan of record.  But it does fall out of some of our intensive conversations
> last month, in which it appears that interfaces may have "val", "ref", and
> "any" specializations, with respect to their "this" type.)

Ok, here we are talking about specialization per type not specialization per instance.
I'm not sure we need that, the VM already does a good job in this area so i fail to see why we need that.

> 
> In that case, the JVM has to activate some sort of library mechanism for
> maintaining specializations of forEach, which is good.  Initially, there
> will be a separate specialization only per *type* of consumer, which will
> probably capture the presence of a filter predicate (if there is one) but
> not the nature of the predicate.  So this goes a certain way towards
> defining a syndrome that will solve the loop customization problem, but not
> all the way.

I really looks like lambda forms to me. I still think that the VM should not try cache several specialization in a kind of template form, it seems an unnecessary step for me. 

> 
> >> (The C++ world is full of non-type template parameters; the question there
> >> is which small slice of that complexity we might want to adopt.)
> > 
> > For now, i would say 'try to come with a specialization mechanism as
> > general as possible' and see if it's possible,
> > a user defined SpecializeDynamic being the ultimate golden hammer (Mjölnir
> > ?).
> > All the specializations by a value can be implemented by doing constant
> > pool patching (as this is done by unsafe.defineAnonymousClass),
> > but here we don't want the anonymous part (and we don't need the host class
> > access because it's superseded by the nestmate idea).
> > 
> >> Even without int-typed or enum-typed template parameters, you could do
> >> what you need for ADTs by using enums and a sealed top type.
> >> 
> >> Here's a variation that works today:
> >> ...
> > yes, but it's less valhalla-ish no :)
> 
> Totally.  But it's also a heuristic for better code shapes in the Valhalla
> future.
> 
> >> We will want to give users a way to code such things with values also.
> >> But since values can cannot take part in subtype relations abstract
> >> classes, the pattern above would have to be reformulated to use wildcards
> >> or interfaces.  (Abstract classes are pretty old fashioned anyway.)
> > since we have default method in interface now, designing with a public
> > abstract class is an error, an abstract class is a class so it leaks
> > implementation details.
> 
> Yep.  An abstract class is not abstract enough, and interfaces are getting
> concrete enough, by stealing features from abstract classes.  Which is why I
> am toying with the idea of putting pseudo-constructors in interfaces, in
> order to steal away the ability of a supertype to control its subtypes by
> restricting access to its "constructor".  Why can't an interface have a
> nullary no-op constructor, just like Object?  (We can make it abstract to
> avoid questions of behavior.)  Anyway, abstract classes can seal their
> subtypes, and that's what I want to steal away from them.

That's true that restricting the access to the constructor of an abstract class is a way to control the hierarchy but i prefer the proposal of Brian, just a supplementary bit in the nestmate description.
Adding a constructor to an interface will require to add at least a way to specify package level accessibility and do not solve the problem of finding all the 'subtypes' of a type. 

> 
> >> With wildcards, you could have Expr<Value>, Expr<Add>, and Expr<?>, where
> >> Value and Add would be contained in Expr instead of deriving from Expr;
> >> all would be values.
> > 
> > being an Expr instead of being a subtype of an Expr is important, but i
> > don't know if it's not just for type safety, so it can be a problem for
> > the compiler and not the runtime.
> 
> Expr becomes a box for either a Value or an Add (but not both, and not
> anything else).  That models the categorical mathematics well enough, and
> everything else is sugar.  (From a certain distance.)
> 
> > 
> >> With interfaces, you'd have Value, Add, and Expr, where Expr is an
> >> interface and the others are values.
> >> ...
> >> In all of the sample code above, the use of enums and accessors, instead
> >> of lambdas and visitors, is a old fashioned "retro" style.  It is exactly
> >> analogous to the external iterators of early Java instead of the
> >> stream-shaped internal iterators of Java 8.  I like to work with the
> >> retro external decoding style because it leads to simple bytecode shapes,
> >> and can be sugared pretty well (cf. Java extended-for).
> >> 
> > For a human, in term of Java code, the internal iteration is easier to
> > write (at least until we have some ways to define a coroutine or a
> > generator), but it's harder for Hotspot.
> 
> Yes.  That's why extended-for is such a brilliant bit of sugar.  Codes like a
> catamorphism, JITs like Fortran.  (I also look forward to for-comprehensions
> and "extended-do" some day, but sequential-code monads come after streams,
> values, and ADTs.)
> 
> > I want to see that as an implementation issue more than a fundamental
> > rule.If Hotspot was able to not share the profiles between different
> > inline blobs of the same code by c1, then c2 will have less
> > polymorphic/megamorphic callsites to deal with (i beleive that both
> > TurboFan(v8/chrome) and FTL(nitro/safari) doesn't have that issue).
> 
> That will help, but IMO split profiles are not a full solution, not even an
> 80% solution.  It's too automagic, depending on obscure differences between
> inlining policies of tiers.  The problem is the programmer can't easily
> predict and control the splits.  It will work for demos and benchmarks and
> when it fails there's no recourse.  We need some way to (gently,
> provisionally) distinguish syndrome from non-syndrome, and point out the
> points where syndromes have to be baked into customized loops.

yes, maybe.

> 
> >>  But in the end we might want to use a more modern lambda-driven style
> >>  (unapply or some other "morphism").  The main design problem with
> >>  internal iterators or matchers is they tend to require lots of auxiliary
> >>  types to express the functional APIs.
> > but less functional types means a way to define structural types, no ?
> 
> I guess I want to say I think we should reach for both.  A "syndrome" as I've
> been describing it is really a bit of structural typing, and it can "hide"
> under a simple-looking function or stream.  It's like syntax sugar for loops
> that the JIT expands into simple code.

I still think we do not need a way to describe partially instantiated code at runtime, we obviously need something like this at compile time and i like the typed/TypeVar solution, if we can add something like an InstanceVar and a way to specify arguments of TypeVar/InstanceVar not only for a class but also for a method, i will declare victory :)

> 
> — John

Rémi