[patch] 6746458 support for exotic identifiers (identifier superquote)

Wed Sep 10 13:52:10 PDT 2008

On Sep 9, 2008, at 8:19 PM, John Rose wrote:
> On Sep 9, 2008, at 6:12 PM, Per Bothner wrote:
>
>> Saying that you can use #"+" but not #"/" seems hard to justify.
>
> Except for the fact that javac is a compiler for the JVM.  That  
> makes it much easier to justify.
>
> And the mangling I proposed (or any similar mangling) is very  
> lightweight and can even be intuitive (kind of like learning a  
> regex language).

So I showed how the cost of not auto-mangling names like "/" is non- 
zero but tolerable.

The other missing bit is the cost of auto-mangling.  It is painful,  
even (this is a big "even") if everyone were to use my mangling  
scheme.  Two reasons:

1. My mangling scheme disallows dollar '$"  (at your suggestion) as a  
dangerous character, so the set of extended identifiers would not be  
a superset of regular identifiers.  This feels very surprising.

2. My mangling scheme reserves the colon ':' as a delimiter for  
building compound names, more complex than a simple sequence of  
unicode characters.  If we auto-mangled, those compound names would  
be inaccessible to Java code.  This again feels surprising.

On balance, let's take the small legibility hit in the Java source  
and keep the channel down to the JVM as simple as possible, allowing  
the possibility of '$' and compound names.

-- John

P.S.  More about compound names:

Most complicated languages have use cases for compound names, names  
which express not only a simple spelling, but something modal about  
how the name is being used or defined.  Java has a field-vs-method  
distinction baked into the bytecodes.  But a field-vs-property  
distinction would have to be expressed with a compound name.  Also,  
languages which allow abbreviated or curried method references which  
look like field references (mylist.size or let f = mylist.get in f 
(10)) need a way to express the choice between a field and a method,  
in a context where the JVM doesn't supply a baked-in distinction:   
Compound names can sometimes help with this.

Or, Common Lisp features package prefixed names and SETF-names;  
encoding them requires compound names, either in the generic mangling  
scheme, or in another language-specific mangling built on top of the  
Unicode-to-JVM mangling.  In either case, an implementor might  
consider using #"CL:CONS" and #"CL:CAR:SETF" as bytecode names.  (A  
loadable module could be a class which defines a bunch of names like  
that, and they could be imported, bean-like, into the Common Lisp  
runtime without further information.)

It's slightly better for the Common Lisp implementor if the shared  
mangling package supports compound names.  It's decisively better on  
the whole, if every language implementor gets that same benefit.