Updated: Language Escape Operator
Bruce Chapman
brucechapman at paradise.net.nz
Mon Mar 30 01:27:32 PDT 2009
Following feedback, I have changed the proposal to use \ (was `) for the
operator and expanded some of the background on that choice.
Unfortunately there is some added complexity in the spec tiptoing around
the unicode escape - and a little more might still be needed.
Bruce
Title Language Escape Operator
latest html version at http://docs.google.com/Doc?docid=dcvp3mkv_9gdg3jfhg
AUTHOR(S): Bruce Chapman
OVERVIEW
FEATURE SUMMARY:
There is great potential for syntactic sugars to be implemented in tools
rather than the Java language. One requirement of such a mechanism is
that the sugared form is NOT valid java source code (otherwise the
meaning is ambiguous - is a piece of code intended as is, or intended to
be desugared). In order to future proof such mechanisms it is also
desirable that such a syntactic sugar form will never become a valid
java form due to some unanticipated future language change.
A simple way to ensure this is to reserve a character as a language
escape, effectively it indicates that source code is at that point
escaping from the java language.
Such a character should be a printable ascii character to enable it to
be typed and read, and it should have no existing defined meaning in the
java language within the contexts in which it may be used.
Inside commments, String literals and character literals all printable
ascii characters have meaning and therefore there is no way we can
escape from these. This is not thought to be an overly burdensome
restriction.
The candidate characters for the escape operator are therefore # (hash),
\ (backslash) and ` (back quote). Neither # or ` have any currently
defined meaning in the Java language specification. \ does have defined
meanings but only as a unicode escape and inside String and character
literals and so is also suitable.
The intention of this proposal is to reserve one of these three
characters for a range of uses that are outside the language. Although
the syntactic sugar example is mentioned above it is but one example. A
crystal ball would be advantageous here - alas we are limited to our
ability to dream.
Several proposals and ideas are circulating which suggest uses for each
of these few remaining characters which are becoming a precious resource
for language developers due to their scarcity. It is the author's belief
that use of any one of these characters as a language escape operator
offers far more long term value than use in a single new syntactic form.
Let us reserve one of the three characters as a language escape and let
all other language proposals squabble over the remaining two. Either of
the three characters could be used equally from a technical point of view
Of the three characters # has a legitimate claim against it for JSR 292
Support, and worthy proposed uses for method, field and possibly type
references. It is also aesthetically least attractive of the three as an
escape operator since the other two already have escape like meaning in
the language or other contexts.
It has been pointed out that ` is hard to type on many European language
keyboards so \ would be the preferred language escape character.
\ has both an advantage and disadvantage as being THE java escape
character. There is a conceptual advantage from its use as the String
and character escape code. Its existing use as a unicode escape is
slightly problematic because a single \ followed by 'u' must either be a
unicode escape (the subsequent 4 characters are all hex digits) or a
compiler error is raised. This implies that it would be unwise to have a
language escape where the escaped text started with u. To prevent the
compiler error in this case, the \ must be escaped. That is probably a
price worth paying compared with the always difficult to type on
European keyboards problem with `.
For the rest of this proposal it is assumed the back-slash character is
the language escape operator.
MAJOR ADVANTAGE:
Shuts the gate before the horse has bolted. Once all these characters
are assigned meaning by the language, then there is no single character
that can be used as a language escape.
MAJOR DISADVANTAGE:
Prevents the back-slash character from being used in other language
features, other than its already defined meaning as a unicode escape,
and String and character literal escape. However any anticipated other
use would have the same unicode escaping issue as mentioned for this use
above. Further, any other use would tend to fight against the "\ is
escape" concept. So back-slash is probably undesirable for other uses
unrelated to escaping.
ALTERNATIVES:
Alternative characters have been discussed above. Another alternative is
to do nothing and let language tools develop further (using any of the
three available characters) in order to get a slightly better idea of
where this might be heading before committing to a language change. The
risk with this approach is that all three characters might get defined
uses by other language changes before one can be reserved as a language
escape. At that point it would be too late and would restrict language
tools.
EXAMPLES
SIMPLE EXAMPLE:
\Property String name;
is not valid Java source code, and can be automatically desugared by an
IDE. See references section.
ADVANCED EXAMPLE:
N/A
DETAILS
SPECIFICATION:
Add new section 3.13 after 3.12 operators.
3.12 Language Escape
When used outside of comments, String and character literals ,the
backslash character \ is reserved for use by tools that process mixed
language content to indicate to their parsers, while parsing java source
code, the start of something that is not Java source code. It is the
responsibility of such tools to pass only java source to the java
compiler. Note that \ has meaning as a unicode escape. Uses of language
escape should attempt to not follow the \ with a u (otherwise the \ will
need to be escaped to \\).
It is a compile time error for a \ character to appear in other than a
comment, a String literal or a Character literal.
COMPILATION:
Generate the error if the \ character is encountered in other than a
comment, String or Character literal.
While the protoype does this, a more specific error message might be useful.
TESTING:
Test for the compiler error.
LIBRARY SUPPORT:
No library support required.
REFLECTIVE APIS:
N/A
OTHER CHANGES:
N/A
MIGRATION:
N/A
COMPATIBILITY
BREAKING CHANGES:
N/A
EXISTING PROGRAMS:
N/A
REFERENCES
http://docs.google.com/Doc?docid=dcvp3mkv_8sn3ccbkk describes a working
proof of concept tool which uses ` as a language escape.
http://weblogs.java.net/blog/brucechapman/archive/JUG%20Aug%202008%20Mark%20II%20Pecha%20Kucha.ppt
backgrounds and justifies one possible use case
EXISTING BUGS:
?
URL FOR PROTOTYPE (optional):
JDK6's javac is the protoype implementation for this language change.
http://java.sun.com/javase/downloads
More information about the coin-dev
mailing list