[patch] 6746458 support for exotic identifiers (identifier superquote)
John Rose
John.Rose at Sun.COM
Wed Sep 10 13:52:10 PDT 2008
On Sep 9, 2008, at 8:19 PM, John Rose wrote:
> On Sep 9, 2008, at 6:12 PM, Per Bothner wrote:
>
>> Saying that you can use #"+" but not #"/" seems hard to justify.
>
> Except for the fact that javac is a compiler for the JVM. That
> makes it much easier to justify.
>
> And the mangling I proposed (or any similar mangling) is very
> lightweight and can even be intuitive (kind of like learning a
> regex language).
So I showed how the cost of not auto-mangling names like "/" is non-
zero but tolerable.
The other missing bit is the cost of auto-mangling. It is painful,
even (this is a big "even") if everyone were to use my mangling
scheme. Two reasons:
1. My mangling scheme disallows dollar '$" (at your suggestion) as a
dangerous character, so the set of extended identifiers would not be
a superset of regular identifiers. This feels very surprising.
2. My mangling scheme reserves the colon ':' as a delimiter for
building compound names, more complex than a simple sequence of
unicode characters. If we auto-mangled, those compound names would
be inaccessible to Java code. This again feels surprising.
On balance, let's take the small legibility hit in the Java source
and keep the channel down to the JVM as simple as possible, allowing
the possibility of '$' and compound names.
-- John
P.S. More about compound names:
Most complicated languages have use cases for compound names, names
which express not only a simple spelling, but something modal about
how the name is being used or defined. Java has a field-vs-method
distinction baked into the bytecodes. But a field-vs-property
distinction would have to be expressed with a compound name. Also,
languages which allow abbreviated or curried method references which
look like field references (mylist.size or let f = mylist.get in f
(10)) need a way to express the choice between a field and a method,
in a context where the JVM doesn't supply a baked-in distinction:
Compound names can sometimes help with this.
Or, Common Lisp features package prefixed names and SETF-names;
encoding them requires compound names, either in the generic mangling
scheme, or in another language-specific mangling built on top of the
Unicode-to-JVM mangling. In either case, an implementor might
consider using #"CL:CONS" and #"CL:CAR:SETF" as bytecode names. (A
loadable module could be a class which defines a bunch of names like
that, and they could be imported, bean-like, into the Common Lisp
runtime without further information.)
It's slightly better for the Common Lisp implementor if the shared
mangling package supports compound names. It's decisively better on
the whole, if every language implementor gets that same benefit.
More information about the compiler-dev
mailing list