Revisiting field references

Mon Jun 3 22:19:10 UTC 2019

Hello, amber-spec-experts. I understand that "field references" is an
idea that was considered when other member references were being
implemented, and it seems to have been a "well, maybe someday"
feature: nothing fundamentally wrong with it, just not worth delaying
method references for. Google is interested in reopening that
discussion, and in working on the implementation if a satisfactory
design can be found.

For the remainder of this message, we will use this class definition
for examples:

public class Holder {
    private int field;
}

This class contains only an instance field, but everything in this
document applies equally in the case of static fields, except of
course that they can’t be bound and won’t expect a receiver argument.

Additionally, most of this document will assume that Holder::field is
the syntax used for creating an unbound reference to that field. This
feels very natural of course, but there is a section about the
tradeoffs of reusing the :: token for fields.

Getter as Supplier

The most obvious thing you can do with a field is read it.
Holder::field could be a ToIntFunction<Holder>, and this::field an
IntSupplier (assuming this is a Holder). I suspect that a feature that
does this and no more would actually cover a majority of use cases:
most people who today want a field reference probably just want a
shorter version of a lambda that reads a field. However, this by
itself is not really a very compelling addition, simply because it
doesn’t buy us much: the “workaround” of writing the lambda by hand is
not very painful or error-prone, so permitting a reference instead,
while nice, is not transformative. However, there are some other
things we could do with field references, which may make the feature
more worthwhile.

Setter as Consumer

The most obvious difference between fields and methods is that while
there's only one thing to do with a method (invoke it), you can either
read a field or write it. So, while we’ve already established that
this::field could be an IntSupplier, it could, depending on context,
instead be an IntConsumer instead, setting the field when invoked.
Likewise Holder::field could be an ObjIntConsumer<Holder> instead of a
ToIntFunction<Holder>.

This seems natural enough, but merits discussion instead of just being
included in the feature because it is “obvious”. Setters are more
complicated than getters. The most obvious complication is that they
should be illegal if the field is final. More subtly, source code may
become harder to understand when the expression this::field may mean
two very different things, either a read or a write. The compiler
should have enough type information in context to disambiguate, or
give an appropriate diagnostic when a use site is ambiguous, e.g. due
to overloading, but this information can be difficult for a developer
to sort out manually, making every use site a debugging puzzle. They
must figure out the target type of the reference to determine whether
it is a read or a write.

Increased Transparency

Another appealing thing to do with a field reference is to make it
more transparent than a simple lambda. We could have some sort of
FieldReference object describing the class in which the field lives,
the name and type of the field, and which object (if any) is bound as
the receiver. This FieldReference object would expose get, and
possibly set, methods for the referred-to field. Of course this looks
a lot like java.lang.reflect.Field; but instead of one final class
using reflection to handle all fields of all classes, we can use the
lambda meta-factory (or something like it) to generate specialized
subclasses, which can conveniently be bound to receivers as well as
being faster.

An advantage of supporting this is that it could enable libraries that
currently accept lambdas to generate more efficient code. For example,
consider

Comparator<Animal> c =
  Comparator.comparing(a -> a.name)
            .thenComparing(a -> a.species)
            .thenComparingInt(a -> a.mass);

A perfectly reasonable Comparator, and much more readable than a nest
of if-conditions written by hand. But if used in a tight loop to
compare many animals, this is quite expensive compared to the
hand-written version, because each comparison may dispatch through
many lambdas, and this it not easy for the JIT to inline. If we really
wanted to allow Comparator combinators to be used in
performance-sensitive situations, Comparator could have an optimize()
method that attempts to generate bytecode for an efficient comparator
in a way similar to what the lambda meta-factory does.

Even without field references, that optimize() method could eliminate
some lambda calls: instead of a chain of lambdas for each
.thenComparing call, it could be unrolled into 3 if statements. But
we’d still have 3 lambdas left, to compute the values to compare to
each other. If we could pass in a field reference, the optimize()
method could introspect on those, allowing it to emit getfield
bytecodes directly, saving more indirection, and resulting in the same
bytecode you could get by writing this comparator by hand.

I hope it goes without saying that I am not proposing to actually
implement Comparator.optimize any time soon: it’s just a convenient,
well-known example of the kind of library that could be gradually
improved by promoting field references from “sugar for a lambda” to
reified objects.

Note that if we reify field references, there will surely be some
people who ask, “why not method references?” I think it is much more
difficult to do this, because methods can be overloaded. Which
overload of String::valueOf did you want to reify as a MethodReference
object? When we use these as lambdas, context can give us a hint; when
crystalizing them as a descriptor object we will have no context. So,
there seems to me to be good reason to push back against this request,
but it is a choice we should make deliberately.

Annotation parameters

Last, if we had such a FieldRef descriptor, we might like to be able
to use them as annotation parameters, making it possible to be more
formal about annotations like

class Stream {
  private Lock lock;
  @GuardedBy(Stream::lock) // next() only called while holding lock
  public int next() {...}
}

Probably this would mean having FieldReference implement Constable, so
that Holder::field could be put in the constant pool, along with other
annotation parameters. This also suggests that a FieldReference object
should not directly store the bound receiver, since that could not be
put in the constant pool; instead we would want a FieldReference to
always be unbound, and then some sort of decorator or wrapper that
holds a FieldReference tied to a captured receiver.

Open Questions

The first set of questions is: are these all reasonable, useful
features? Am I missing any pitfalls that they imply?

One looming design question is unfortunately syntax: is Foo::x really
the best syntax? It's very natural, but it will be ambiguous if Foo
also has a method named x. To preserve backwards compatibility with
code written before the introduction of field references, we would
obviously need to resolve this ambiguity in favor of any applicable
method reference over any applicable field reference. It would surely
be too extreme to say that it's impossible to get a field reference
when a method with the same name exists. So if you really want the
field reference in a context like this, we could introduce some
alternate syntax to clarify that: Foo:::x, or Foo..x, for example: the
details don't have to be sorted out at this time, as much as we need
to decide whether to use any new token at all or just reuse the ::
token.

But this tie-breaker strategy has a problem: it solves backwards
compatibility, while leaving a subtle forward-compatibility pitfall.
Holder::field currently resolves to a field reference, but suppose in
the future someone adds a method with the same name. As discussed
before, we must resolve conflicts in favor of methods, and so
Holder::field suddenly becomes a method reference next time you
compile the client code. Now class authors can change which member is
being accessed by adding a new member, which seems dangerous. But
maybe it's fine - adding new overloads of an existing method can
already do that, if clients were relying on autoboxing or other type
coercions.

We could avoid the difficulty by having no syntactic overlap between
field and method references: Holder::toString for methods only,
Holder:::field for fields only. That's unlikely to be popular, and
indeed it is a bit ugly. Is it better to accept the small danger of
ambiguity?

Finally, if anyone has implementation tips I would be happy to hear
them. I am pretty new to javac, and while I've thrown together an
implementation that desugars field references into getter lambdas it’s
far from a finished feature, and I’m sure what I’ve already done
wasn't done the best way. Finding all the places that would need to
change is no small task.