PRE-PROPOSAL: Source and Encoding keyword
Stefan Schulz
schulz at e-spirit.de
Sat Mar 7 15:02:30 PST 2009
It all depends on what one defines as "result" of a compilation. I am
thinking of all the @SuppressWarnings Annotations in my code, which also
influence the result of the compilation _process_. I'd say that
Annotations should not influence the _binary output_ of a compilation.
In the end, @Override and @SuppressWarnings as well as @Deprecated and
@Retention are local compiler flags on how to treat specific code, i.e.,
meta-information being read and handled by the compiler.
Stefan
Roel Spilker wrote:
> Good one :-) Javac won't even create a class file if the @Override
> annotation is present but shouldn't be there.
>
>
> On Sat, Mar 7, 2009 at 7:22 PM, Igor Karp <igor.v.karp at gmail.com> wrote:
>
> > Roel,
> >
> > well, these were not my ideas anyway ;-). I would be equally unhappy
> > using javadoc appoach.
> > And as a side note: @Override does influence the result of the compiler
> > already.
> >
> > Igor
> >
> > On Sat, Mar 7, 2009 at 9:55 AM, Roel Spilker <r.spilker at gmail.com> wrote:
> > > I'd say javadoc, as well as annotation, should never influence the
> result
> > of
> > > the compiler. That's just not the right vehicle.
> > >
> > > Roel
> > >
> > > On Sat, Mar 7, 2009 at 6:27 PM, Igor Karp <igor.v.karp at gmail.com>
> wrote:
> > >>
> > >> Reiner,
> > >>
> > >> please see the comments inline.
> > >>
> > >> On Fri, Mar 6, 2009 at 11:39 PM, Reinier Zwitserloot
> > >> <reinier at zwitserloot.com> wrote:
> > >> > Igor,
> > >> >
> > >> > how could the command line options be expanded? Allow -encoding to
> > >> > specify a
> > >> > separate encoding for each file? I don't see how that can work.
> > >> For example: allow multiple -encoding options and add optional path to
> > >> encoding -encoding <encoding>[,<path>]
> > >> Where path can be either a package (settings applied to the package
> > >> and every package under it) or a single file for maximum precision.
> > >> So one can have:
> > >> -encoding X - encoding Y,a.b -encoding Z,a.b.c -encoding
> > >> X,a.b.c.d.IAmSpecial
> > >> IAMSpecial.java will get encoding X,
> > >> everything else under a.b.c will get encoding Z,
> > >> everything else under a.b will get encoding Y
> > >> and the rest will get encoding X.
> > >> Same approach can be applied to -source.
> > >>
> > >> > There's no
> > >> > way I or anyone else is going to edit a build script (be it just
> > javac,
> > >> > a
> > >> > home-rolled thing, ant, rake, make, maven, ivy, etcetera) to
> carefully
> > >> > enumerate every file's source compatibility level.
> > >> Sure, thats what argfiles are for: store the settings in a file and
> > >> use javac @argfile.
> > >>
> > >> And doing it as proposed above on a package level would make it more
> > >> manageable.
> > >> Remember in your proposal the only option is to specify it on a file
> > >> level (this is fixable i guess).
> > >>
> > >> > Changing the command line
> > >> > options also incurs the neccessary wrath of all those build tool
> > >> > developers
> > >> > as they'd have to update their software to handle the new option
> > (adding
> > >> > an
> > >> > option is a change too!)
> > >> Not more than changing the language itself.
> > >>
> > >> >
> > >> > Could you also elaborate on why you don't like it? For example, how
> > can
> > >> > the
> > >> > benefits of having (more) portable source files, easier
> migration, and
> > a
> > >> > much cleaner solution to e.g. the assert-in-javac1.4 be achieved
> with
> > >> > e.g.
> > >> > command line options, or do you not consider any of those
> worthwhile?
> > >> I fully support the goal. I even see it as is a bit too narrow (see
> > >> below). But I do not see a need to change the language to achieve that
> > >> goal.
> > >>
> > >> On a conceptual level I see these options as a metadata of the source
> > >> files and I don't like the idea of coupling it with the file.
> > >> One can avoid all this complexity of extra parsing by specifying the
> > >> encoding in an external file. This external file does not have
> > >> itself to be in that encoding. In fact it can be restricted to be
> > >> always in ASCII.
> > >>
> > >> I think the addition of an optional path and allowing multiple use of
> > >> the same option approach is much more scalable: it could be extended
> > >> to the other existing options (like -deprecation, -Xlint, etc.) and to
> > >> the options that might appear in the future.
> > >>
> > >> I wish I could concentrate on deprecations in a certain package and
> > >> ignore them everywhere else for now:
> > >> javac -deprecation,really.rusty.one ...
> > >> Finished with (or gave up on ;) that one and want to switch to the
> next
> > >> one:
> > >> javac -deprecation,another.old.one
> > >>
> > >> Igor Karp
> > >>
> > >> >
> > >> > As an aside, how do people approach project coin submissions? I tend
> > to
> > >> > look
> > >> > at a proposal's value, which is its benefit divided by the
> > disadvantages
> > >> > (end-programmer complexity to learn, amount of changes needed to
> javac
> > >> > and/or JVM, and restrictions on potential future expansions). One of
> > the
> > >> > reasons I'm writing this up with Roel is because the disadvantages
> > >> > seemed to
> > >> > be almost nonexistent on the outset (the encoding stuff made it more
> > >> > complicated, but at least the complication is entirely hidden from
> > java
> > >> > developer's eyes, so it value proposal is still aces in my book). If
> > >> > there's
> > >> > a goal to keep the total language changes, no matter how simple they
> > >> > are,
> > >> > down to a small set, then benefit regardless of disadvantages is the
> > >> > better
> > >> > yardstick.
> > >> >
> > >> > --Reinier Zwitserloot
> > >> >
> > >> >
> > >> >
> > >> > On Mar 7, 2009, at 08:15, Igor Karp wrote:
> > >> >
> > >> >> On Fri, Mar 6, 2009 at 10:03 PM, Reinier Zwitserloot
> > >> >> <reinier at zwitserloot.com> wrote:
> > >> >>>
> > >> >>> We have written up a proposal for adding a 'source' and 'encoding'
> > >> >>> keyword (alternatives to the -source and -encoding keywords on the
> > >> >>> command line; they work pretty much just as you expect). The
> > keywords
> > >> >>> are context sensitive and must both appear before anything else
> > other
> > >> >>> than comments to be parsed. In case the benefit isn't obvious:
> It is
> > a
> > >> >>> great help when you are trying to port a big project to a new
> source
> > >> >>> language compatibility. Leaving half your sourcebase in v1.6
> and the
> > >> >>> other half in v1.7 is pretty much impossible today, it's all-or-
> > >> >>> nothing. It should also be a much nicer solution to the 'assert in
> > >> >>> v1.4' dilemma, which I guess is going to happen to v1.7 as well,
> > given
> > >> >>> that 'module' is most likely going to become a keyword.
> Finally, it
> > >> >>> makes java files a lot more portable; you no longer run into your
> > >> >>> strings looking weird when you move your Windows-1252 codefile
> java
> > >> >>> source to a mac, for example.
> > >> >>>
> > >> >>> Before we finish it though, some open questions we'd like some
> > >> >>> feedback on:
> > >> >>>
> > >> >>> A) Technically, starting a file with "source 1.4" is obviously
> > silly;
> > >> >>> javac v1.4 doesn't know about the source keyword and would
> thus fail
> > >> >>> immediately. However, practically, its still useful. Example: if
> > >> >>> you've mostly converted a GWT project to GWT 1.5 (which uses java
> > 1.5
> > >> >>> syntax), but have a few files remaining on GWT v1.4 (which
> uses java
> > >> >>> 1.4 syntax), then tossing a "source 1.4;" in those older files
> > >> >>> eliminates all the generics warnings and serves as a reminder that
> > you
> > >> >>> should still convert those at some point. However, it isn't
> > -actually-
> > >> >>> compatible with a real javac 1.4. We're leaning to making "source
> > >> >>> 1.6;" (and below) legal even when using a javac v1.7 or
> above, but
> > >> >>> perhaps that's a bridge too far? We could go with magic
> comments but
> > >> >>> that seems like a very bad solution.
> > >> >>>
> > >> >>> also:
> > >> >>>
> > >> >>> Encoding is rather a hairy issue; javac will need to read the file
> > to
> > >> >>> find the encoding, but to read a file, it needs to know about
> > >> >>> encoding! Fortunately, *every single* popular encoding on
> > wikipedia's
> > >> >>> popular encoding list at:
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>
> >
> http://en.wikipedia.org/wiki/Character_encoding#Popular_character_encodings
> > >> >>>
> > >> >>> will encode "encoding own-name-in-that-encoding;" the same as
> ASCII
> > >> >>> would, except for KOI-7 and UTF-7, (both 7 bit encodings that I
> > doubt
> > >> >>> anyone ever uses to program java).
> > >> >>>
> > >> >>> Therefore, the proposal includes the following strategy to
> find the
> > >> >>> encoding statement in a java source file without knowing the
> > encoding
> > >> >>> beforehand:
> > >> >>>
> > >> >>> An entirely separate parser (the encoding parser) is run
> repeatedly
> > >> >>> until the right encoding is found. First it'll decode the
> input with
> > >> >>> ISO-8859-1. If that doesn't work, UTF-16 (assume BE if no BOM, as
> > per
> > >> >>> the java standard), then as UTF-32 (BE if no BOM), then the
> current
> > >> >>> behaviour (-encoding parameter's value if any, otherwise platform
> > >> >>> default encoding). This separate parser works as follows:
> > >> >>>
> > >> >>> 1. Ignore any comments and whitespace.
> > >> >>> 3. Ignore the pattern (regexp-like-syntax, ):
> source\s+[^\s]+\s*; -
> > if
> > >> >>> that pattern matches partially but is not correctly completed,
> that
> > >> >>> parser run exits without finding an encoding, immediately.
> > >> >>> 4. Find the pattern: encoding\s+([^\s]+)\s*; - if that pattern
> > matches
> > >> >>> partially but is not correctly completed, that parser run exists
> > >> >>> without finding an encoding, immediately. If it does complete, the
> > >> >>> parser also exists immediately and returns the captured value.
> > >> >>> 5. If it finds anything else, stop immediately, returning no
> > encoding
> > >> >>> found.
> > >> >>>
> > >> >>> Once it's found something, the 'real' java parser will run
> using the
> > >> >>> found encoding (this overrides any -encoding on the command line).
> > >> >>> Note that the encoding parser stops quickly; For example, if it
> > finds
> > >> >>> a stray \0 or e.g. the letter 'i' (perhaps the first letter of an
> > >> >>> import statement), it'll stop immediately.
> > >> >>>
> > >> >>> If an encoding is encountered that was not found during the
> standard
> > >> >>> decoding strategy (ISO-8859-1, UTF-16, UTF-32), but worked
> only due
> > to
> > >> >>> a platform default/command line encoding param, (e.g. a platform
> > that
> > >> >>> defaults to UTF-16LE without a byte order mark) a warning
> explaining
> > >> >>> that the encoding statement isn't doing anything is generated. Of
> > >> >>> course, if the encoding doesn't match itself, you get an error
> > >> >>> (putting "encoding UTF-16;" into a UTF-8 encoded file for
> example).
> > If
> > >> >>> there is no encoding statement, the 'real' java parser does
> what it
> > >> >>> does now: Use the -encoding parameter of javac, and if that wasn't
> > >> >>> present, the platform default.
> > >> >>>
> > >> >>> However, there is 1 major and 1 minor problem with this approach:
> > >> >>>
> > >> >>> B) This means javac will need to read every source file many times
> > to
> > >> >>> compile it.
> > >> >>>
> > >> >>> Worst case (no encoding keyword): 5 times.
> > >> >>> Standard case if an encoding keyword: 2 times (3 times if UTF-16).
> > >> >>>
> > >> >>> Fortunately all runs should stop quickly, due to the encoding
> > parser's
> > >> >>> penchant to quit very early. Javacs out there will either
> stuff the
> > >> >>> entire source file into memory, or if not, disk cache should take
> > care
> > >> >>> of it, but we can't prove beyond a doubt that this repeated
> parsing
> > >> >>> will have no significant impact on compile time. Is this a
> > >> >>> showstopper? Is the need to include a new (but small) parser into
> > >> >>> javac a showstopper?
> > >> >>>
> > >> >>> C) Certain character sets, such as ISO-2022, can make the encoding
> > >> >>> statement unreadable with the standard strategy if a comment
> > including
> > >> >>> non-ASCII characters precedes the encoding statement. These
> > situations
> > >> >>> are very rare (in fact, I haven't managed to find an example),
> so is
> > >> >>> it okay to just ignore this issue? If you add the encoding
> statement
> > >> >>> after a bunch of comments that make it invisible, and then compile
> > it
> > >> >>> with the right -encoding parameter, you WILL get a warning
> that the
> > >> >>> encoding statement isn't going to help a javac on another
> platform /
> > >> >>> without that encoding parameter to figure it out, so you just get
> > the
> > >> >>> current status quo: your source file won't compile without an
> > explicit
> > >> >>> -encoding parameter (or if that happens to be the platform
> default).
> > >> >>> Should this be mentioned in the proposal? Should the compiler (and
> > the
> > >> >>> proposal) put effort into generating a useful warning message,
> such
> > as
> > >> >>> figuring out if it WOULD parse correctly if the encoding statement
> > is
> > >> >>> at the very top of the source file, vs. suggesting to recode in
> > UTF-8?
> > >> >>>
> > >> >>> and a final dilemma:
> > >> >>>
> > >> >>> D) Should we separate the proposals for source and encoding
> > keywords?
> > >> >>> The source keyword is more useful and a lot simpler overall
> than the
> > >> >>> encoding keyword, but they do sort of go together.
> > >> >>
> > >> >> Separate. Another reason is: the argument of applying different
> > >> >> settings
> > >> >> to
> > >> >> different parts of the project is much less valid with encoding
> than
> > >> >> with source.
> > >> >>
> > >> >>>
> > >> >>> --Reinier Zwitserloot and Roel Spilker
> > >> >>>
> > >> >>>
> > >> >> Overall: I would prefer command line options enhanced to handle the
> > >> >> situation
> > >> >> rather than language change.
> > >> >>
> > >> >> Igor Karp
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>
>
More information about the coin-dev
mailing list