-encoding and -source in module-info.java

Mon Jan 30 08:53:17 PST 2012

+1 on this excellent idea.

In regards to neal's hammering on how 'source' is not particularly
conducive to inclusion in a language spec:

There's an easy anwer to this problem: The same answer used to skirt around
the classpath issue. The JLS would simply say that the format of this
directive is something along the lines of:

source 1.8;

where the exact format of the 'parameter' to source is:

Sequence(ZeroOrMore(Sequence(DIGITS, '.')), DIGITS)

and exactly 0 or 1 source directives may exist in module-info.java,
probably with some further restrictions on where this directive is legally
allowed to appear. I'll leave the exact details to be hashed out later; the
point is: The JLS does not have to convey legal values for this directive,
it merely needs to define the format for it. This parameter will be
interpreted by the compiler, and the compiler is then entirely free to
figure out what this means all by itself. At best, the spec can (and
probably should) declare that 1.8 (or 8) _MUST_ be a legal value, with
nothing said about any other value.

This is 100% analogous to how classpath is handled: The language spec is
quite clear on what "import java.util.Arrays;" means, and the language spec
is quite clear on how one should resolve the statement
"Arrays.asList(someInteger, someDouble, someInteger);" _once the compiler
knows the signatures present in java.util.Arrays_, but it gives absolutely
no hint whatsoever as to how the compiler is supposed to figure out what
those signatures are, when provided with only an import statement. Without
the spec defining how to get these signatures, the correct answer to 'how
should I translate Arrays.asList(a, b, c) to bytecode' is dependent on
external factors too, i.e. the spec alone cannot give a definitive answer.

The JLS does not even mention the classpath nor the JVMS which needs to be
used to parse the classfiles one would find there in order to find the
signatures of Arrays, which is necessary to correctly resolve method calls.
- Let alone specify any of these things.

Thus, we arrive at a perhaps somewhat uncomfortable fact: Given just the
JLS, it is impossible to write a compiler that can compile anything except
the simplest source files (namely: Ones with 0 dependencies in them, not
even a dependency on java.lang.String, as the JLS does not specify the
signatures present in the String class). It can't even give you an LST,
 because it cannot resolve method calls.

Given that, I don't see any problem with being similarly unspecified for
parameters to the 'source' directive that aren't "1.8". Specifying 'source'
that way (1.8 is legal, everything else - who knows?) thus doesn't change
anything about how the JLS itself is not actually enough, you need a
'meta-spec' a level above that to end up with a practically usable compiler.

The encoding issue can be similarly solved (do not actually define in the
JLS spec what to do with this directive, just define what it should look
like and add a note if needed to explain its general intent, leaving it
clear that it's up to the compiler to do sensible things with this
meta-information), but the case here is not quite as strong as for 'source'.

It might be a good idea to write a sister specification which lists minimum
legal compiler switches and defines what a compiler is supposed to do with
various meta-information. This specification should go into classpath,
sourcepath, warning levels for -Xlint (and push -Xlint out of -X territory.
In fact, this spec should list every non -X switch and not mention anything
about -X switches except the idea that -X itself is
implementation-specific), how the lightweight encoding directive is
supposed to be found, links to all the different JLS spec versions and
specifications on how to handle -source and -target (and the source
directive), and rules that any -encoding parameter (or directive) in a
given list MUST be parsed correctly in order to warrant the title 'java
compiler'. This spec can use 'can' and 'must' as appropriate. For example,
a java1.8 compatible compiler MUST understand 'source 1.8;' and MAY treat
'source 1.7;' as a directive to compile according to the JLS 1.7. If it
does not do so, then it MUST emit a 'not compatible with that source
version' error.

 --Reinier Zwitserloot

On Tue, Jan 24, 2012 at 22:35, Neal Gafter <neal at gafter.com> wrote:

> Overall, I agree it would be nice to place these options together in a file
> for the compiler to consume.  I just don't think it should be a "Java
> programming language source" file.
>
> The idea of using module-info.java to specify compilation options seems
> most tempting, but as long as it is a source file in the language, it must
> be subject to those options as well.  Which means that specifying them
> inside the file itself is pretty pointless.  As to the source level, what
> can the language specification say other than that "8" is the only allowed
> value?  And what can the next version of the language specification say
> other than that "9" is the only allowed value?
>
> On Tue, Jan 24, 2012 at 4:07 AM, Jesse Glick <jesse.glick at oracle.com>
> wrote:
>
> > The encoding and source level of a module are fundamental attributes of
> > its sources, without which you cannot reliably even parse a syntax tree,
> so
> > I think they should be declared in module-info.java. Otherwise it is left
> > up to someone calling javac by hand, or a build script, to specify these
> > options; that is potentially error-prone, and means that tools which
> > inspect sources (including but not limited to IDEs) need to have some
> > separate mechanism for configuration of these attributes: you cannot just
> > hand them the sourcepath and let them run.
> >
> > I am assuming that all files in the sourcepath use the same encoding and
> > source level, which seems a reasonable restriction.
> >
> >
> > As to the source level, obviously given that JDK 8 will introduce
> > module-info.java, "8" (or "1.8") seems like the right default value; but
> a
> > syntax ought to be defined for specifying a newer level, e.g.
> >
> >  source 1.9; // or 9?
> >
> > Furthermore I think that JDK 9+ versions of javac should keep the same
> > default source level - you should need to explicitly mark what version of
> > the Java language your module expects. Otherwise a module might compile
> > differently according to which version of javac was used, which is
> > undesirable, and tools cannot guess what version you meant. A little more
> > verbosity here seems to be justified.
> >
> > Whether the bytecode target (-target) should be specified in
> > module-info.java is another question. I have seen projects built using
> > -target 5 for JDK 5 compatibility but also in a separate artifact using
> > -target 6 for speed on JDK 6+ (split verifier). Probably the target level
> > should default to the source level, and in the rare case that you need to
> > override this, you can do so using a javac command option - this has no
> > impact on tools which just need to parse and analyze source files.
> >
> >
> > As to the encoding, something like
> >
> >  encoding ISO-8859-2;
> >
> > would suffice. The obvious problems for encoding are
> >
> > 1. What should the default value be? javac currently uses the platform
> > default encoding, which IMHO is a horrible choice because it means that
> two
> > people running javac with the same parameters on the same files may be
> > producing different classes and/or warning messages. I would suggest
> making
> > UTF-8 be the default when compiling in module mode (leaving the old
> > behavior intact for legacy mode). For developers who want to keep sources
> > in a different character set, adding one line per module-info.java does
> not
> > seem like much of a burden.
> >
> > 2. What is module-info.java itself encoded in? If not UTF-8, then you
> need
> > to be able to reliably find the encoding declaration and then rescan the
> > file in that encoding. That is easy for most encodings (just do an
> initial
> > scan in ISO-8859-1), including everything commonly used by developers
> > AFAIK; a little trickier for UTF-16/32-type encodings but possible by
> > ignoring 0x00/0xFE/0xFF; and only fails on some mainframe charsets, old
> JIS
> > variants, and dingbats (*). Even those rare cases are probably guessable.
> > [1]
> >
> >
> > (*) Demo program:
> >
> > import java.io.**UnsupportedEncodingException;
> > import java.nio.charset.Charset;
> > import java.util.Arrays;
> > public class CharsetTest {
> >    public static void main(String[] args) throws
> > UnsupportedEncodingException {
> >        Charset raw = Charset.forName("ISO-8859-1");
> >        for (Charset c : Charset.availableCharsets().**values()) {
> >            String text = "/* leading comment */\nmodule test {\n
>  encoding
> > " + c.name() + ";\n}\n";
> >            byte[] encoded;
> >            try {
> >                encoded = text.getBytes(c);
> >            } catch (UnsupportedOperationException x) {
> >                System.out.println("cannot encode using " + c.name());
> >                continue;
> >            }
> >            if (Arrays.equals(encoded, text.getBytes(raw))) {
> >                System.out.println("OK in " + c.name());
> >            } else if (new String(encoded, raw).contains("  encoding " +
> > c.name() + ";")) {
> >                System.out.println("substring match in " + c.name());
> >                dump(encoded);
> >            } else if (new String(encoded, raw).replace("\u0000",
> > "").contains("  encoding " + c.name() + ";")) {
> >                System.out.println("NUL-**stripped match in " + c.name
> ());
> >                dump(encoded);
> >            } else {
> >                System.out.println("garbled in " + c.name());
> >                dump(encoded);
> >            }
> >        }
> >    }
> >    private static void dump(byte[] encoded) {
> >        for (byte b : encoded) {
> >            if (b >= 32 && b <= 126 || b == '\n' || b == '\r') {
> >                System.out.write(b);
> >            } else if (b == 0) {
> >                System.out.print('@');
> >            } else {
> >                System.out.printf("\\%02X", b);
> >            }
> >        }
> >        System.out.println();
> >    }
> >    private CharsetTest() {}
> > }
> >
> >
> > [1] http://jchardet.sourceforge.**net/ <http://jchardet.sourceforge.net/
> >
> >
>