Typed variants of primitives
Brian Goetz
brian.goetz at oracle.com
Wed Dec 2 14:17:24 UTC 2020
This generally goes under the name of "restriction types" in the
literature, where you take a type (e.g., int), restrict its value set
("zero or one"), and give it a new name ("bit"). This is a form of
subtyping; the two have the same representation and meaning, but one has
a value set that is a subset of the other. (Subtyping implies you can
freely assign a bit to an int, but would require a conversion (with
value checking) in the other direction.)
In practice, a problem with restriction types is that they are usually
domain-specific, and domain-specific types cause friction at boundaries
between domains. Lots of libraries deal out ints, but if your library
wants to deal in non-negative ints, you usually can't just do a straight
assignment, which means code at the boundary to convert and deal with
domain failures. But, this just says that you have to use them in the
places where they add enough value.
<digression>
Another issue in practice is that in many languages with restriction
types, there is an element of "trying to have it both ways." It is
often harder to reason about the difference between the base type and a
restriction, because they share some things and not others. A good
example here is the Haskell `type` declaration:
type Name = String
At first, this looks like C's typedef; it defines the type Name to mean
String, and you can use Name where String is needed, and vice versa.
But there's a subtlety here that is not obvious, which is that an
aliased typed has a _different dictionary for type class witnesses_.
(Type classes are like Java's interfaces; witnesses are the
implementations for a given set of types, which are declares separately
from the types, and can be thought of as a vtable.)
To make this concrete, suppose we have a type class Printable:
class Printable t where
print :: t -> String
This is like:
interface Printable<T> {
String print();
}
Instead of declaring "String implements Printable" as we do in Java, we
declare a witness:
instance Printable String where
print s = ... implementation of print for String ...
Then, in any context that requires printing, when a string is presented,
this implementation is used. But ... when we declare an alias, we can
declare a _different_ witness for Printable Name:
instance Printable Name where
print n = ... different implementation of print for Name ...
so, sometimes Name and String are interchangeable, and sometimes they
are not! This looseness on the part of the language requires more
vigilence on the part of the programmer. (With great power comes great
responsibility.)
</digression>
OK, let's come back to Valhalla. First, let me point out that the "I
want it to be an int, but just with a restricted value set" may well be
premature optimization. You don't actually care that it shares a
carrier type with 32-bit ints, what you care about is that it can be
easily and cheaply used where an int is needed.
Suppose we define:
inline class PositiveInt {
private int v;
public PositiveInt(int x) {
if (x < 0)
throw new IllegalArgumentException(...);
this.v = x;
}
int val() { return v; }
}
This provides part of the puzzle; while it looks like it would be
expensive, `new PositiveInt(x)` will JIT down to a value check and a
32-bit move. So the "cheap to convert" box is ticked. But, the other
weight of boxing is the syntactic weight; `p = new PositiveInt(x)` is a
mouthful compared to an ordinary assignment.
Let's ignore the existing getYear method for a moment, and pretend that
you had instead published:
Year getYear()
where Year is an inline class that encapsulates an int. Then if the
user wants an int, they could say:
int y = date.getYear().val()
But, even that `.val()` might feel like noise; what you're appealing to
a is a _primitive widening conversion_ (JLS 5.1.2) from Year to int.
Since the range of Year is (isomorphic to) a restriction of int, this is
a reasonable thing to want, that Year could define such a conversion.
This would get you to:
int y = date.getYear() // implicit primitive widening conversion
We are indeed investigating how we might generalize JLS 5.x to support
such things.
The other half of your question is whether you can compatibly change the
already-published method signature. As John mentioned, we've
investigated how bridging might be used profitably here. Java doesn't
let you overload on return types, and changing this is a big lift (for
various mostly-accidental reasons.) But there might be an intermediate
ground that is still helpful, which is to allow certain changes in
return type to be source- and binary-compatible. Ignoring Valhalla for
a moment, let's say you wanted to change the return type from `int` to
`long`. Conveniently there is a primitive widening conversion from int
to long! Then it would be source-compatible to change an int-returning
method to a long-returning one; on recompilation, existing uses would
just do the widening silently. To make this binary-compatible, we'd
also need a way of inserting a bridge method to intercept linkage to the
old signature and do the int->long adaptation. (This technique is
considerably easier for static or final methods than for those that
might have overrides, but even the latter is theoretically possible,
just at much higher cost.) So, such a thing is workable, but clearly
there's work to do.
TO recap the above: the conditions on when this would be a compatible
change is when the new return type is "wider" than the old one (and
ideally, when the method is static or final.) If we could define a
primitive widening conversion from Year to int, then yes, we could
migrate the int-returning method to return Year, in a compatible manner,
and existing int-consumers wouldn't notice the difference.
Very short answer: Yes, this is all being considered!
On 11/29/2020 7:13 PM, Stephen Colebourne wrote:
> I wanted to raise a concept that I don't remember seeing as part of
> the valhalla work so far, and I'll do so via a java.time.* example.
>
>
> `java.time.*` contains a `Year` value-based class that effectively
> acts as a "typed int" with two key purposes:
> - to provide additional type safety if desired for the concept of "year"
> - to restrict the valid int values to -999_999_999 to 999_999_999.
>
> `LocalDate` has a method `getYear()`, but it returns an `int`, rather
> than the `Year` class. Was this a mistake? Not really, it was a
> pragmatic decision to say that most users of the API would want the
> int, not the `Year` value type (and the performance hit of an
> additional object).
>
> In an ideal valhalla world, `LocalDate.getYear()` would be changed to
> return `Year`, not `int`, and this change would be entirely backwards
> compatible. The implication is that a valhalla `Year` value type could
> be freely unboxed to an `int`.
>
>
> Now of course, it is almost certainly pie-in-the-sky to try and make
> something this backwards compatible. But what about new types? In API
> design terms, there is appeal in defining a type that restricts the
> valid set of ints, for example a `PositiveInt` value type. But without
> the associated boxing/unboxing to `int` and maths operator-overloading
> it is generally more pain than it is worth to design an API that way.
> Has this concept been considered?
>
> Stephen
More information about the valhalla-dev
mailing list