PRE-PROPOSAL: Source and Encoding keyword

Stefan Schulz schulz at e-spirit.de
Sat Mar 7 15:02:30 PST 2009


It all depends on what one defines as "result" of a compilation. I am 
thinking of all the @SuppressWarnings Annotations in my code, which also 
influence the result of the compilation _process_. I'd say that 
Annotations should not influence the _binary output_ of a compilation.

In the end, @Override and @SuppressWarnings as well as @Deprecated and 
@Retention are local compiler flags on how to treat specific code, i.e., 
meta-information being read and handled by the compiler.

Stefan

Roel Spilker wrote:
> Good one :-) Javac won't even create a class file if the @Override
> annotation is present but shouldn't be there.
> 
> 
> On Sat, Mar 7, 2009 at 7:22 PM, Igor Karp <igor.v.karp at gmail.com> wrote:
> 
>  > Roel,
>  >
>  > well, these were not my ideas anyway ;-). I would be equally unhappy
>  > using javadoc appoach.
>  > And as a side note: @Override does influence the result of the compiler
>  > already.
>  >
>  > Igor
>  >
>  > On Sat, Mar 7, 2009 at 9:55 AM, Roel Spilker <r.spilker at gmail.com> wrote:
>  > > I'd say javadoc, as well as annotation, should never influence the 
> result
>  > of
>  > > the compiler. That's just not the right vehicle.
>  > >
>  > > Roel
>  > >
>  > > On Sat, Mar 7, 2009 at 6:27 PM, Igor Karp <igor.v.karp at gmail.com> 
> wrote:
>  > >>
>  > >> Reiner,
>  > >>
>  > >> please see the comments inline.
>  > >>
>  > >> On Fri, Mar 6, 2009 at 11:39 PM, Reinier Zwitserloot
>  > >> <reinier at zwitserloot.com> wrote:
>  > >> > Igor,
>  > >> >
>  > >> > how could the command line options be expanded? Allow -encoding to
>  > >> > specify a
>  > >> > separate encoding for each file? I don't see how that can work.
>  > >> For example: allow multiple -encoding options and add optional path to
>  > >> encoding -encoding <encoding>[,<path>]
>  > >> Where path can be either a package (settings applied to the package
>  > >> and every package under it) or a single file for maximum precision.
>  > >> So one can have:
>  > >> -encoding X - encoding Y,a.b -encoding Z,a.b.c -encoding
>  > >> X,a.b.c.d.IAmSpecial
>  > >> IAMSpecial.java will get encoding X,
>  > >> everything else under a.b.c will get encoding Z,
>  > >> everything else under a.b will get encoding Y
>  > >> and the rest will get encoding X.
>  > >> Same approach can be applied to -source.
>  > >>
>  > >> > There's no
>  > >> > way I or anyone else is going to edit a build script (be it just
>  > javac,
>  > >> > a
>  > >> > home-rolled thing, ant, rake, make, maven, ivy, etcetera) to 
> carefully
>  > >> > enumerate every file's source compatibility level.
>  > >> Sure, thats what argfiles are for: store the settings in a file and
>  > >> use javac @argfile.
>  > >>
>  > >> And doing it as proposed above on a package level would make it more
>  > >> manageable.
>  > >> Remember in your proposal the only option is to specify it on a file
>  > >> level (this is fixable i guess).
>  > >>
>  > >> > Changing the command line
>  > >> > options also incurs the neccessary wrath of all those build tool
>  > >> > developers
>  > >> > as they'd have to update their software to handle the new option
>  > (adding
>  > >> > an
>  > >> > option is a change too!)
>  > >> Not more than changing the language itself.
>  > >>
>  > >> >
>  > >> > Could you also elaborate on why you don't like it? For example, how
>  > can
>  > >> > the
>  > >> > benefits of having (more) portable source files, easier 
> migration, and
>  > a
>  > >> > much cleaner solution to e.g. the assert-in-javac1.4 be achieved 
> with
>  > >> > e.g.
>  > >> > command line options, or do you not consider any of those 
> worthwhile?
>  > >> I fully support the goal. I even see it as is a bit too narrow (see
>  > >> below). But I do not see a need to change the language to achieve that
>  > >> goal.
>  > >>
>  > >> On a conceptual level I see these options as a metadata of the source
>  > >> files and I don't like the idea of coupling it with the file.
>  > >> One can avoid all this complexity of extra parsing by specifying the
>  > >> encoding in an external file. This external file does not have
>  > >> itself to be in that encoding. In fact it can be restricted to be
>  > >> always in ASCII.
>  > >>
>  > >> I think the addition of an optional path and allowing multiple use of
>  > >> the same option approach is much more scalable: it could be extended
>  > >> to the other existing options (like -deprecation, -Xlint, etc.) and to
>  > >> the options that might appear in the future.
>  > >>
>  > >> I wish I could concentrate on deprecations in a certain package and
>  > >> ignore them everywhere else for now:
>  > >> javac -deprecation,really.rusty.one ...
>  > >> Finished with (or gave up on ;) that one and want to switch to the 
> next
>  > >> one:
>  > >> javac -deprecation,another.old.one
>  > >>
>  > >> Igor Karp
>  > >>
>  > >> >
>  > >> > As an aside, how do people approach project coin submissions? I tend
>  > to
>  > >> > look
>  > >> > at a proposal's value, which is its benefit divided by the
>  > disadvantages
>  > >> > (end-programmer complexity to learn, amount of changes needed to 
> javac
>  > >> > and/or JVM, and restrictions on potential future expansions). One of
>  > the
>  > >> > reasons I'm writing this up with Roel is because the disadvantages
>  > >> > seemed to
>  > >> > be almost nonexistent on the outset (the encoding stuff made it more
>  > >> > complicated, but at least the complication is entirely hidden from
>  > java
>  > >> > developer's eyes, so it value proposal is still aces in my book). If
>  > >> > there's
>  > >> > a goal to keep the total language changes, no matter how simple they
>  > >> > are,
>  > >> > down to a small set, then benefit regardless of disadvantages is the
>  > >> > better
>  > >> > yardstick.
>  > >> >
>  > >> >  --Reinier Zwitserloot
>  > >> >
>  > >> >
>  > >> >
>  > >> > On Mar 7, 2009, at 08:15, Igor Karp wrote:
>  > >> >
>  > >> >> On Fri, Mar 6, 2009 at 10:03 PM, Reinier Zwitserloot
>  > >> >> <reinier at zwitserloot.com> wrote:
>  > >> >>>
>  > >> >>> We have written up a proposal for adding a 'source' and 'encoding'
>  > >> >>> keyword (alternatives to the -source and -encoding keywords on the
>  > >> >>> command line; they work pretty much just as you expect). The
>  > keywords
>  > >> >>> are context sensitive and must both appear before anything else
>  > other
>  > >> >>> than comments to be parsed. In case the benefit isn't obvious: 
> It is
>  > a
>  > >> >>> great help when you are trying to port a big project to a new 
> source
>  > >> >>> language compatibility. Leaving half your sourcebase in v1.6 
> and the
>  > >> >>> other half in v1.7 is pretty much impossible today, it's all-or-
>  > >> >>> nothing. It should also be a much nicer solution to the 'assert in
>  > >> >>> v1.4' dilemma, which I guess is going to happen to v1.7 as well,
>  > given
>  > >> >>> that 'module' is most likely going to become a keyword. 
> Finally, it
>  > >> >>> makes java files a lot more portable; you no longer run into your
>  > >> >>> strings looking weird when you move your Windows-1252 codefile 
> java
>  > >> >>> source to a mac, for example.
>  > >> >>>
>  > >> >>> Before we finish it though, some open questions we'd like some
>  > >> >>> feedback on:
>  > >> >>>
>  > >> >>> A) Technically, starting a file with "source 1.4" is obviously
>  > silly;
>  > >> >>> javac v1.4 doesn't know about the source keyword and would 
> thus fail
>  > >> >>> immediately. However, practically, its still useful. Example: if
>  > >> >>> you've mostly converted a GWT project to GWT 1.5 (which uses java
>  > 1.5
>  > >> >>> syntax), but have a few files remaining on GWT v1.4 (which 
> uses java
>  > >> >>> 1.4 syntax), then tossing a "source 1.4;" in those older files
>  > >> >>> eliminates all the generics warnings and serves as a reminder that
>  > you
>  > >> >>> should still convert those at some point. However, it isn't
>  > -actually-
>  > >> >>> compatible with a real javac 1.4. We're leaning to making "source
>  > >> >>> 1.6;"  (and below) legal even when using a javac v1.7 or 
> above, but
>  > >> >>> perhaps that's a bridge too far? We could go with magic 
> comments but
>  > >> >>> that seems like a very bad solution.
>  > >> >>>
>  > >> >>> also:
>  > >> >>>
>  > >> >>> Encoding is rather a hairy issue; javac will need to read the file
>  > to
>  > >> >>> find the encoding, but to read a file, it needs to know about
>  > >> >>> encoding! Fortunately, *every single* popular encoding on
>  > wikipedia's
>  > >> >>> popular encoding list at:
>  > >> >>>
>  > >> >>>
>  > >> >>>
>  > >> >>>
>  > 
> http://en.wikipedia.org/wiki/Character_encoding#Popular_character_encodings
>  > >> >>>
>  > >> >>> will encode "encoding own-name-in-that-encoding;" the same as 
> ASCII
>  > >> >>> would, except for KOI-7 and UTF-7, (both 7 bit encodings that I
>  > doubt
>  > >> >>> anyone ever uses to program java).
>  > >> >>>
>  > >> >>> Therefore, the proposal includes the following strategy to 
> find the
>  > >> >>> encoding statement in a java source file without knowing the
>  > encoding
>  > >> >>> beforehand:
>  > >> >>>
>  > >> >>> An entirely separate parser (the encoding parser) is run 
> repeatedly
>  > >> >>> until the right encoding is found. First it'll decode the 
> input with
>  > >> >>> ISO-8859-1. If that doesn't work, UTF-16 (assume BE if no BOM, as
>  > per
>  > >> >>> the java standard), then as UTF-32 (BE if no BOM), then the 
> current
>  > >> >>> behaviour (-encoding parameter's value if any, otherwise platform
>  > >> >>> default encoding). This separate parser works as follows:
>  > >> >>>
>  > >> >>> 1. Ignore any comments and whitespace.
>  > >> >>> 3. Ignore the pattern (regexp-like-syntax, ): 
> source\s+[^\s]+\s*; -
>  > if
>  > >> >>> that pattern matches partially but is not correctly completed, 
> that
>  > >> >>> parser run exits without finding an encoding, immediately.
>  > >> >>> 4. Find the pattern: encoding\s+([^\s]+)\s*; - if that pattern
>  > matches
>  > >> >>> partially but is not correctly completed, that parser run exists
>  > >> >>> without finding an encoding, immediately. If it does complete, the
>  > >> >>> parser also exists immediately and returns the captured value.
>  > >> >>> 5. If it finds anything else, stop immediately, returning no
>  > encoding
>  > >> >>> found.
>  > >> >>>
>  > >> >>> Once it's found something, the 'real' java parser will run 
> using the
>  > >> >>> found encoding (this overrides any -encoding on the command line).
>  > >> >>> Note that the encoding parser stops quickly; For example, if it
>  > finds
>  > >> >>> a stray \0 or e.g. the letter 'i' (perhaps the first letter of an
>  > >> >>> import statement), it'll stop immediately.
>  > >> >>>
>  > >> >>> If an encoding is encountered that was not found during the 
> standard
>  > >> >>> decoding strategy (ISO-8859-1, UTF-16, UTF-32), but worked 
> only due
>  > to
>  > >> >>> a platform default/command line encoding param, (e.g. a platform
>  > that
>  > >> >>> defaults to UTF-16LE without a byte order mark) a warning 
> explaining
>  > >> >>> that the encoding statement isn't doing anything is generated. Of
>  > >> >>> course, if the encoding doesn't match itself, you get an error
>  > >> >>> (putting "encoding UTF-16;" into a UTF-8 encoded file for 
> example).
>  > If
>  > >> >>> there is no encoding statement, the 'real' java parser does 
> what it
>  > >> >>> does now: Use the -encoding parameter of javac, and if that wasn't
>  > >> >>> present, the platform default.
>  > >> >>>
>  > >> >>> However, there is 1 major and 1 minor problem with this approach:
>  > >> >>>
>  > >> >>> B) This means javac will need to read every source file many times
>  > to
>  > >> >>> compile it.
>  > >> >>>
>  > >> >>> Worst case (no encoding keyword): 5 times.
>  > >> >>> Standard case if an encoding keyword: 2 times (3 times if UTF-16).
>  > >> >>>
>  > >> >>> Fortunately all runs should stop quickly, due to the encoding
>  > parser's
>  > >> >>> penchant to quit very early. Javacs out there will either 
> stuff the
>  > >> >>> entire source file into memory, or if not, disk cache should take
>  > care
>  > >> >>> of it, but we can't prove beyond a doubt that this repeated 
> parsing
>  > >> >>> will have no significant impact on compile time. Is this a
>  > >> >>> showstopper? Is the need to include a new (but small) parser into
>  > >> >>> javac a showstopper?
>  > >> >>>
>  > >> >>> C) Certain character sets, such as ISO-2022, can make the encoding
>  > >> >>> statement unreadable with the standard strategy if a comment
>  > including
>  > >> >>> non-ASCII characters precedes the encoding statement. These
>  > situations
>  > >> >>> are very rare (in fact, I haven't managed to find an example), 
> so is
>  > >> >>> it okay to just ignore this issue? If you add the encoding 
> statement
>  > >> >>> after a bunch of comments that make it invisible, and then compile
>  > it
>  > >> >>> with the right -encoding parameter, you WILL get a warning 
> that the
>  > >> >>> encoding statement isn't going to help a javac on another 
> platform /
>  > >> >>> without that encoding parameter to figure it out, so you just get
>  > the
>  > >> >>> current status quo: your source file won't compile without an
>  > explicit
>  > >> >>> -encoding parameter (or if that happens to be the platform 
> default).
>  > >> >>> Should this be mentioned in the proposal? Should the compiler (and
>  > the
>  > >> >>> proposal) put effort into generating a useful warning message, 
> such
>  > as
>  > >> >>> figuring out if it WOULD parse correctly if the encoding statement
>  > is
>  > >> >>> at the very top of the source file, vs. suggesting to recode in
>  > UTF-8?
>  > >> >>>
>  > >> >>> and a final dilemma:
>  > >> >>>
>  > >> >>> D) Should we separate the proposals for source and encoding
>  > keywords?
>  > >> >>> The source keyword is more useful and a lot simpler overall 
> than the
>  > >> >>> encoding keyword, but they do sort of go together.
>  > >> >>
>  > >> >> Separate. Another reason is: the argument of applying different
>  > >> >> settings
>  > >> >> to
>  > >> >> different parts of the project is much less valid with encoding 
> than
>  > >> >> with source.
>  > >> >>
>  > >> >>>
>  > >> >>> --Reinier Zwitserloot and Roel Spilker
>  > >> >>>
>  > >> >>>
>  > >> >> Overall: I would prefer command line options enhanced to handle the
>  > >> >> situation
>  > >> >> rather than language change.
>  > >> >>
>  > >> >> Igor Karp
>  > >> >
>  > >> >
>  > >>
>  > >
>  > >
>  >
> 
> 



More information about the coin-dev mailing list