Typed variants of primitives

Wed Dec 2 14:17:24 UTC 2020

This generally goes under the name of "restriction types" in the 
literature, where you take a type (e.g., int), restrict its value set 
("zero or one"), and give it a new name ("bit").  This is a form of 
subtyping; the two have the same representation and meaning, but one has 
a value set that is a subset of the other.  (Subtyping implies you can 
freely assign a bit to an int, but would require a conversion (with 
value checking) in the other direction.)

In practice, a problem with restriction types is that they are usually 
domain-specific, and domain-specific types cause friction at boundaries 
between domains.  Lots of libraries deal out ints, but if your library 
wants to deal in non-negative ints, you usually can't just do a straight 
assignment, which means code at the boundary to convert and deal with 
domain failures.  But, this just says that you have to use them in the 
places where they add enough value.

<digression>

Another issue in practice is that in many languages with restriction 
types, there is an element of "trying to have it both ways."  It is 
often harder to reason about the difference between the base type and a 
restriction, because they share some things and not others.  A good 
example here is the Haskell `type` declaration:

     type Name = String

At first, this looks like C's typedef; it defines the type Name to mean 
String, and you can use Name where String is needed, and vice versa.  
But there's a subtlety here that is not obvious, which is that an 
aliased typed has a _different dictionary for type class witnesses_.  
(Type classes are like Java's interfaces; witnesses are the 
implementations for a given set of types, which are declares separately 
from the types, and can be thought of as a vtable.)

To make this concrete, suppose we have a type class Printable:

     class Printable t where
         print :: t -> String

This is like:

     interface Printable<T> {
         String print();
     }

Instead of declaring "String implements Printable" as we do in Java, we 
declare a witness:

     instance Printable String where
         print s = ... implementation of print for String ...

Then, in any context that requires printing, when a string is presented, 
this implementation is used.  But ... when we declare an alias, we can 
declare a _different_ witness for Printable Name:

     instance Printable Name where
         print n = ... different implementation of print for Name ...

so, sometimes Name and String are interchangeable, and sometimes they 
are not!  This looseness on the part of the language requires more 
vigilence on the part of the programmer.  (With great power comes great 
responsibility.)

</digression>

OK, let's come back to Valhalla.  First, let me point out that the "I 
want it to be an int, but just with a restricted value set" may well be 
premature optimization.  You don't actually care that it shares a 
carrier type with 32-bit ints, what you care about is that it can be 
easily and cheaply used where an int is needed.

Suppose we define:

     inline class PositiveInt {
         private int v;

         public PositiveInt(int x) {
             if (x < 0)
                 throw new IllegalArgumentException(...);
             this.v = x;
        }

        int val() { return v; }
     }

This provides part of the puzzle; while it looks like it would be 
expensive, `new PositiveInt(x)` will JIT down to a value check and a 
32-bit move.  So the "cheap to convert" box is ticked.  But, the other 
weight of boxing is the syntactic weight; `p = new PositiveInt(x)` is a 
mouthful compared to an ordinary assignment.

Let's ignore the existing getYear method for a moment, and pretend that 
you had instead published:

     Year getYear()

where Year is an inline class that encapsulates an int.  Then if the 
user wants an int, they could say:

     int y = date.getYear().val()

But, even that `.val()` might feel like noise; what you're appealing to 
a is a _primitive widening conversion_ (JLS 5.1.2) from Year to int.  
Since the range of Year is (isomorphic to) a restriction of int, this is 
a reasonable thing to want, that Year could define such a conversion.  
This would get you to:

     int y = date.getYear()  // implicit primitive widening conversion

We are indeed investigating how we might generalize JLS 5.x to support 
such things.

The other half of your question is whether you can compatibly change the 
already-published method signature.  As John mentioned, we've 
investigated how bridging might be used profitably here.  Java doesn't 
let you overload on return types, and changing this is a big lift (for 
various mostly-accidental reasons.)  But there might be an intermediate 
ground that is still helpful, which is to allow certain changes in 
return type to be source- and binary-compatible.  Ignoring Valhalla for 
a moment, let's say you wanted to change the return type from `int` to 
`long`.  Conveniently there is a primitive widening conversion from int 
to long!  Then it would be source-compatible to change an int-returning 
method to a long-returning one; on recompilation, existing uses would 
just do the widening silently.  To make this binary-compatible, we'd 
also need a way of inserting a bridge method to intercept linkage to the 
old signature and do the int->long adaptation.  (This technique is 
considerably easier for static or final methods than for those that 
might have overrides, but even the latter is theoretically possible, 
just at much higher cost.)  So, such a thing is workable, but clearly 
there's work to do.

TO recap the above: the conditions on when this would be a compatible 
change is when the new return type is "wider" than the old one (and 
ideally, when the method is static or final.) If we could define a 
primitive widening conversion from Year to int, then yes, we could 
migrate the int-returning method to return Year, in a compatible manner, 
and existing int-consumers wouldn't notice the difference.

Very short answer: Yes, this is all being considered!

On 11/29/2020 7:13 PM, Stephen Colebourne wrote:
> I wanted to raise a concept that I don't remember seeing as part of
> the valhalla work so far, and I'll do so via a java.time.* example.
>
>
> `java.time.*` contains a `Year` value-based class that effectively
> acts as a "typed int" with two key purposes:
> - to provide additional type safety if desired for the concept of "year"
> - to restrict the valid int values to -999_999_999 to 999_999_999.
>
> `LocalDate` has a method `getYear()`, but it returns an `int`, rather
> than the `Year` class. Was this a mistake? Not really, it was a
> pragmatic decision to say that most users of the API would want the
> int, not the `Year` value type (and the performance hit of an
> additional object).
>
> In an ideal valhalla world, `LocalDate.getYear()` would be changed to
> return `Year`, not `int`, and this change would be entirely backwards
> compatible. The implication is that a valhalla `Year` value type could
> be freely unboxed to an `int`.
>
>
> Now of course, it is almost certainly pie-in-the-sky to try and make
> something this backwards compatible. But what about new types? In API
> design terms, there is appeal in defining a type that restricts the
> valid set of ints, for example a `PositiveInt` value type. But without
> the associated boxing/unboxing to `int` and maths operator-overloading
> it is generally more pain than it is worth to design an API that way.
> Has this concept been considered?
>
> Stephen