Updated: Language Escape Operator

Bruce Chapman brucechapman at paradise.net.nz
Mon Mar 30 01:27:32 PDT 2009


Following feedback, I have changed the proposal to use \ (was `) for the 
operator and expanded some of the background on that choice. 
Unfortunately there is some added complexity in the spec tiptoing around 
the unicode escape - and a little more might still be needed.

Bruce


Title Language Escape Operator

latest html version at http://docs.google.com/Doc?docid=dcvp3mkv_9gdg3jfhg

AUTHOR(S): Bruce Chapman

OVERVIEW


FEATURE SUMMARY:

There is great potential for syntactic sugars to be implemented in tools 
rather than the Java language. One requirement of such a mechanism is 
that the sugared form is NOT valid java source code (otherwise the 
meaning is ambiguous - is a piece of code intended as is, or intended to 
be desugared). In order to future proof such mechanisms it is also 
desirable that such a syntactic sugar form will never become a valid 
java form due to some unanticipated future language change.


A simple way to ensure this is to reserve a character as a language 
escape, effectively it indicates that source code is at that point 
escaping from the java language.


Such a character should be a printable ascii character to enable it to 
be typed and read, and it should have no existing defined meaning in the 
java language within the contexts in which it may be used.


Inside commments, String literals and character literals all printable 
ascii characters have meaning and therefore there is no way we can 
escape from these. This is not thought to be an overly burdensome 
restriction.


The candidate characters for the escape operator are therefore # (hash), 
\ (backslash) and ` (back quote). Neither # or ` have any currently 
defined meaning in the Java language specification. \  does have defined 
meanings but only as a unicode escape and inside String and character 
literals and so is also suitable.


The intention of this proposal is to reserve one of these three 
characters for a range of uses that are outside the language. Although 
the syntactic sugar example is mentioned above it is but one example. A 
crystal ball would be advantageous here - alas we are limited to our 
ability to dream.


Several proposals and ideas are circulating which suggest uses for each 
of these few remaining characters which are becoming a precious resource 
for language developers due to their scarcity. It is the author's belief 
that use of any one of these characters as a language escape operator 
offers far more long term value than use in a single new syntactic form. 
Let us reserve one of the three characters as a language escape and let 
all other language proposals squabble over the remaining two. Either of 
the three characters could be used equally from a technical point of view


Of the three characters # has a legitimate claim against it for JSR 292 
Support, and worthy proposed uses for method, field and possibly type 
references. It is also aesthetically least attractive of the three as an 
escape operator since the other two already have escape like meaning in 
the language or other contexts.


It has been pointed out that ` is hard to type on many European language 
keyboards so \ would be the preferred language escape character.


\ has both an advantage and disadvantage as being THE java escape 
character. There is a conceptual advantage from its use as the String 
and character escape code. Its existing use as a unicode escape is 
slightly problematic because a single \ followed by 'u' must either be a 
unicode escape (the subsequent 4 characters are all hex digits) or a 
compiler error is raised. This implies that it would be unwise to have a 
language escape where the escaped text started with u. To prevent the 
compiler error in this case, the \ must be escaped. That is probably a 
price worth paying compared with the always difficult to type on 
European keyboards problem with `.


For the rest of this proposal it is assumed the back-slash character is 
the language escape operator.



MAJOR ADVANTAGE:
Shuts the gate before the horse has bolted. Once all these characters 
are assigned meaning by the language, then there is no single character 
that can be used as a language escape.


MAJOR DISADVANTAGE:

Prevents the back-slash character from being used in other language 
features, other than its already defined meaning as a unicode escape, 
and String and character literal escape. However any anticipated other 
use would have the same unicode escaping issue as mentioned for this use 
above. Further, any other use would tend to fight against the "\ is 
escape" concept. So back-slash is probably undesirable for other uses 
unrelated to escaping.

ALTERNATIVES:

Alternative characters have been discussed above. Another alternative is 
to do nothing and let language tools develop further (using any of the 
three available characters) in order to get a slightly better idea of 
where this might be heading before committing to a language change. The 
risk with this approach is that all three characters might get defined 
uses by other language changes before one can be reserved as a language 
escape. At that point it would be too late and would restrict language 
tools.

EXAMPLES
SIMPLE EXAMPLE:
\Property String name;

is not valid Java source code, and can be automatically desugared by an 
IDE. See references section.


ADVANCED EXAMPLE:


N/A



DETAILS
SPECIFICATION:


Add new section 3.13 after 3.12 operators.


3.12 Language Escape

When used outside of comments, String and character literals ,the 
backslash character \ is reserved for use by tools that process mixed 
language content to indicate to their parsers, while parsing java source 
code, the start of something that is not Java source code. It is the 
responsibility of such tools to pass only java source to the java 
compiler. Note that \ has meaning as a unicode escape. Uses of language 
escape should attempt to not follow the \ with a u (otherwise the \ will 
need to be escaped to \\).


It is a compile time error for a \ character to appear in other than a 
comment, a String literal or a Character literal.

COMPILATION:
Generate the error if the \ character is encountered in other than a 
comment, String or Character literal.

While the protoype does this, a more specific error message might be useful.

TESTING:

Test for the compiler error.

LIBRARY SUPPORT:

No library support required.

REFLECTIVE APIS:

N/A

OTHER CHANGES:

N/A

MIGRATION:

N/A

COMPATIBILITY
BREAKING CHANGES:

N/A

EXISTING PROGRAMS:

N/A

REFERENCES


http://docs.google.com/Doc?docid=dcvp3mkv_8sn3ccbkk describes a working 
proof of concept tool which uses ` as a language escape.

http://weblogs.java.net/blog/brucechapman/archive/JUG%20Aug%202008%20Mark%20II%20Pecha%20Kucha.ppt 
backgrounds and justifies one possible use case


EXISTING BUGS:

?

URL FOR PROTOTYPE (optional):




JDK6's javac is the protoype implementation for this language change.

http://java.sun.com/javase/downloads





More information about the coin-dev mailing list