-encoding and -source in module-info.java

Tue Jan 24 13:35:32 PST 2012

Overall, I agree it would be nice to place these options together in a file
for the compiler to consume.  I just don't think it should be a "Java
programming language source" file.

The idea of using module-info.java to specify compilation options seems
most tempting, but as long as it is a source file in the language, it must
be subject to those options as well.  Which means that specifying them
inside the file itself is pretty pointless.  As to the source level, what
can the language specification say other than that "8" is the only allowed
value?  And what can the next version of the language specification say
other than that "9" is the only allowed value?

On Tue, Jan 24, 2012 at 4:07 AM, Jesse Glick <jesse.glick at oracle.com> wrote:

> The encoding and source level of a module are fundamental attributes of
> its sources, without which you cannot reliably even parse a syntax tree, so
> I think they should be declared in module-info.java. Otherwise it is left
> up to someone calling javac by hand, or a build script, to specify these
> options; that is potentially error-prone, and means that tools which
> inspect sources (including but not limited to IDEs) need to have some
> separate mechanism for configuration of these attributes: you cannot just
> hand them the sourcepath and let them run.
>
> I am assuming that all files in the sourcepath use the same encoding and
> source level, which seems a reasonable restriction.
>
>
> As to the source level, obviously given that JDK 8 will introduce
> module-info.java, "8" (or "1.8") seems like the right default value; but a
> syntax ought to be defined for specifying a newer level, e.g.
>
>  source 1.9; // or 9?
>
> Furthermore I think that JDK 9+ versions of javac should keep the same
> default source level - you should need to explicitly mark what version of
> the Java language your module expects. Otherwise a module might compile
> differently according to which version of javac was used, which is
> undesirable, and tools cannot guess what version you meant. A little more
> verbosity here seems to be justified.
>
> Whether the bytecode target (-target) should be specified in
> module-info.java is another question. I have seen projects built using
> -target 5 for JDK 5 compatibility but also in a separate artifact using
> -target 6 for speed on JDK 6+ (split verifier). Probably the target level
> should default to the source level, and in the rare case that you need to
> override this, you can do so using a javac command option - this has no
> impact on tools which just need to parse and analyze source files.
>
>
> As to the encoding, something like
>
>  encoding ISO-8859-2;
>
> would suffice. The obvious problems for encoding are
>
> 1. What should the default value be? javac currently uses the platform
> default encoding, which IMHO is a horrible choice because it means that two
> people running javac with the same parameters on the same files may be
> producing different classes and/or warning messages. I would suggest making
> UTF-8 be the default when compiling in module mode (leaving the old
> behavior intact for legacy mode). For developers who want to keep sources
> in a different character set, adding one line per module-info.java does not
> seem like much of a burden.
>
> 2. What is module-info.java itself encoded in? If not UTF-8, then you need
> to be able to reliably find the encoding declaration and then rescan the
> file in that encoding. That is easy for most encodings (just do an initial
> scan in ISO-8859-1), including everything commonly used by developers
> AFAIK; a little trickier for UTF-16/32-type encodings but possible by
> ignoring 0x00/0xFE/0xFF; and only fails on some mainframe charsets, old JIS
> variants, and dingbats (*). Even those rare cases are probably guessable.
> [1]
>
>
> (*) Demo program:
>
> import java.io.**UnsupportedEncodingException;
> import java.nio.charset.Charset;
> import java.util.Arrays;
> public class CharsetTest {
>    public static void main(String[] args) throws
> UnsupportedEncodingException {
>        Charset raw = Charset.forName("ISO-8859-1");
>        for (Charset c : Charset.availableCharsets().**values()) {
>            String text = "/* leading comment */\nmodule test {\n  encoding
> " + c.name() + ";\n}\n";
>            byte[] encoded;
>            try {
>                encoded = text.getBytes(c);
>            } catch (UnsupportedOperationException x) {
>                System.out.println("cannot encode using " + c.name());
>                continue;
>            }
>            if (Arrays.equals(encoded, text.getBytes(raw))) {
>                System.out.println("OK in " + c.name());
>            } else if (new String(encoded, raw).contains("  encoding " +
> c.name() + ";")) {
>                System.out.println("substring match in " + c.name());
>                dump(encoded);
>            } else if (new String(encoded, raw).replace("\u0000",
> "").contains("  encoding " + c.name() + ";")) {
>                System.out.println("NUL-**stripped match in " + c.name());
>                dump(encoded);
>            } else {
>                System.out.println("garbled in " + c.name());
>                dump(encoded);
>            }
>        }
>    }
>    private static void dump(byte[] encoded) {
>        for (byte b : encoded) {
>            if (b >= 32 && b <= 126 || b == '\n' || b == '\r') {
>                System.out.write(b);
>            } else if (b == 0) {
>                System.out.print('@');
>            } else {
>                System.out.printf("\\%02X", b);
>            }
>        }
>        System.out.println();
>    }
>    private CharsetTest() {}
> }
>
>
> [1] http://jchardet.sourceforge.**net/ <http://jchardet.sourceforge.net/>
>