PRE-PROPOSAL: Source and Encoding keyword
Roel Spilker
r.spilker at gmail.com
Sat Mar 7 15:58:52 PST 2009
No, for @Override it's not.
If you compile both
class Foo {
@Override
void bar() {}
}
and
class Foo {
void bar() {}
}
you will notice that the binary output is different, since the first one
won't create Foo.class and the second will. But maybe that's just
semantics...
Roel
On Sun, Mar 8, 2009 at 12:02 AM, Stefan Schulz <schulz at e-spirit.de> wrote:
> It all depends on what one defines as "result" of a compilation. I am
> thinking of all the @SuppressWarnings Annotations in my code, which also
> influence the result of the compilation _process_. I'd say that
> Annotations should not influence the _binary output_ of a compilation.
>
> In the end, @Override and @SuppressWarnings as well as @Deprecated and
> @Retention are local compiler flags on how to treat specific code, i.e.,
> meta-information being read and handled by the compiler.
>
> Stefan
>
> Roel Spilker wrote:
> > Good one :-) Javac won't even create a class file if the @Override
> > annotation is present but shouldn't be there.
> >
> >
> > On Sat, Mar 7, 2009 at 7:22 PM, Igor Karp <igor.v.karp at gmail.com> wrote:
> >
> > > Roel,
> > >
> > > well, these were not my ideas anyway ;-). I would be equally unhappy
> > > using javadoc appoach.
> > > And as a side note: @Override does influence the result of the
> compiler
> > > already.
> > >
> > > Igor
> > >
> > > On Sat, Mar 7, 2009 at 9:55 AM, Roel Spilker <r.spilker at gmail.com>
> wrote:
> > > > I'd say javadoc, as well as annotation, should never influence the
> > result
> > > of
> > > > the compiler. That's just not the right vehicle.
> > > >
> > > > Roel
> > > >
> > > > On Sat, Mar 7, 2009 at 6:27 PM, Igor Karp <igor.v.karp at gmail.com>
> > wrote:
> > > >>
> > > >> Reiner,
> > > >>
> > > >> please see the comments inline.
> > > >>
> > > >> On Fri, Mar 6, 2009 at 11:39 PM, Reinier Zwitserloot
> > > >> <reinier at zwitserloot.com> wrote:
> > > >> > Igor,
> > > >> >
> > > >> > how could the command line options be expanded? Allow -encoding
> to
> > > >> > specify a
> > > >> > separate encoding for each file? I don't see how that can work.
> > > >> For example: allow multiple -encoding options and add optional path
> to
> > > >> encoding -encoding <encoding>[,<path>]
> > > >> Where path can be either a package (settings applied to the package
> > > >> and every package under it) or a single file for maximum precision.
> > > >> So one can have:
> > > >> -encoding X - encoding Y,a.b -encoding Z,a.b.c -encoding
> > > >> X,a.b.c.d.IAmSpecial
> > > >> IAMSpecial.java will get encoding X,
> > > >> everything else under a.b.c will get encoding Z,
> > > >> everything else under a.b will get encoding Y
> > > >> and the rest will get encoding X.
> > > >> Same approach can be applied to -source.
> > > >>
> > > >> > There's no
> > > >> > way I or anyone else is going to edit a build script (be it just
> > > javac,
> > > >> > a
> > > >> > home-rolled thing, ant, rake, make, maven, ivy, etcetera) to
> > carefully
> > > >> > enumerate every file's source compatibility level.
> > > >> Sure, thats what argfiles are for: store the settings in a file and
> > > >> use javac @argfile.
> > > >>
> > > >> And doing it as proposed above on a package level would make it
> more
> > > >> manageable.
> > > >> Remember in your proposal the only option is to specify it on a
> file
> > > >> level (this is fixable i guess).
> > > >>
> > > >> > Changing the command line
> > > >> > options also incurs the neccessary wrath of all those build tool
> > > >> > developers
> > > >> > as they'd have to update their software to handle the new option
> > > (adding
> > > >> > an
> > > >> > option is a change too!)
> > > >> Not more than changing the language itself.
> > > >>
> > > >> >
> > > >> > Could you also elaborate on why you don't like it? For example,
> how
> > > can
> > > >> > the
> > > >> > benefits of having (more) portable source files, easier
> > migration, and
> > > a
> > > >> > much cleaner solution to e.g. the assert-in-javac1.4 be achieved
> > with
> > > >> > e.g.
> > > >> > command line options, or do you not consider any of those
> > worthwhile?
> > > >> I fully support the goal. I even see it as is a bit too narrow (see
> > > >> below). But I do not see a need to change the language to achieve
> that
> > > >> goal.
> > > >>
> > > >> On a conceptual level I see these options as a metadata of the
> source
> > > >> files and I don't like the idea of coupling it with the file.
> > > >> One can avoid all this complexity of extra parsing by specifying
> the
> > > >> encoding in an external file. This external file does not have
> > > >> itself to be in that encoding. In fact it can be restricted to be
> > > >> always in ASCII.
> > > >>
> > > >> I think the addition of an optional path and allowing multiple use
> of
> > > >> the same option approach is much more scalable: it could be
> extended
> > > >> to the other existing options (like -deprecation, -Xlint, etc.) and
> to
> > > >> the options that might appear in the future.
> > > >>
> > > >> I wish I could concentrate on deprecations in a certain package and
> > > >> ignore them everywhere else for now:
> > > >> javac -deprecation,really.rusty.one ...
> > > >> Finished with (or gave up on ;) that one and want to switch to the
> > next
> > > >> one:
> > > >> javac -deprecation,another.old.one
> > > >>
> > > >> Igor Karp
> > > >>
> > > >> >
> > > >> > As an aside, how do people approach project coin submissions? I
> tend
> > > to
> > > >> > look
> > > >> > at a proposal's value, which is its benefit divided by the
> > > disadvantages
> > > >> > (end-programmer complexity to learn, amount of changes needed to
> > javac
> > > >> > and/or JVM, and restrictions on potential future expansions). One
> of
> > > the
> > > >> > reasons I'm writing this up with Roel is because the
> disadvantages
> > > >> > seemed to
> > > >> > be almost nonexistent on the outset (the encoding stuff made it
> more
> > > >> > complicated, but at least the complication is entirely hidden
> from
> > > java
> > > >> > developer's eyes, so it value proposal is still aces in my book).
> If
> > > >> > there's
> > > >> > a goal to keep the total language changes, no matter how simple
> they
> > > >> > are,
> > > >> > down to a small set, then benefit regardless of disadvantages is
> the
> > > >> > better
> > > >> > yardstick.
> > > >> >
> > > >> > --Reinier Zwitserloot
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Mar 7, 2009, at 08:15, Igor Karp wrote:
> > > >> >
> > > >> >> On Fri, Mar 6, 2009 at 10:03 PM, Reinier Zwitserloot
> > > >> >> <reinier at zwitserloot.com> wrote:
> > > >> >>>
> > > >> >>> We have written up a proposal for adding a 'source' and
> 'encoding'
> > > >> >>> keyword (alternatives to the -source and -encoding keywords on
> the
> > > >> >>> command line; they work pretty much just as you expect). The
> > > keywords
> > > >> >>> are context sensitive and must both appear before anything else
> > > other
> > > >> >>> than comments to be parsed. In case the benefit isn't obvious:
> > It is
> > > a
> > > >> >>> great help when you are trying to port a big project to a new
> > source
> > > >> >>> language compatibility. Leaving half your sourcebase in v1.6
> > and the
> > > >> >>> other half in v1.7 is pretty much impossible today, it's
> all-or-
> > > >> >>> nothing. It should also be a much nicer solution to the 'assert
> in
> > > >> >>> v1.4' dilemma, which I guess is going to happen to v1.7 as
> well,
> > > given
> > > >> >>> that 'module' is most likely going to become a keyword.
> > Finally, it
> > > >> >>> makes java files a lot more portable; you no longer run into
> your
> > > >> >>> strings looking weird when you move your Windows-1252 codefile
> > java
> > > >> >>> source to a mac, for example.
> > > >> >>>
> > > >> >>> Before we finish it though, some open questions we'd like some
> > > >> >>> feedback on:
> > > >> >>>
> > > >> >>> A) Technically, starting a file with "source 1.4" is obviously
> > > silly;
> > > >> >>> javac v1.4 doesn't know about the source keyword and would
> > thus fail
> > > >> >>> immediately. However, practically, its still useful. Example:
> if
> > > >> >>> you've mostly converted a GWT project to GWT 1.5 (which uses
> java
> > > 1.5
> > > >> >>> syntax), but have a few files remaining on GWT v1.4 (which
> > uses java
> > > >> >>> 1.4 syntax), then tossing a "source 1.4;" in those older files
> > > >> >>> eliminates all the generics warnings and serves as a reminder
> that
> > > you
> > > >> >>> should still convert those at some point. However, it isn't
> > > -actually-
> > > >> >>> compatible with a real javac 1.4. We're leaning to making
> "source
> > > >> >>> 1.6;" (and below) legal even when using a javac v1.7 or
> > above, but
> > > >> >>> perhaps that's a bridge too far? We could go with magic
> > comments but
> > > >> >>> that seems like a very bad solution.
> > > >> >>>
> > > >> >>> also:
> > > >> >>>
> > > >> >>> Encoding is rather a hairy issue; javac will need to read the
> file
> > > to
> > > >> >>> find the encoding, but to read a file, it needs to know about
> > > >> >>> encoding! Fortunately, *every single* popular encoding on
> > > wikipedia's
> > > >> >>> popular encoding list at:
> > > >> >>>
> > > >> >>>
> > > >> >>>
> > > >> >>>
> > >
> >
> http://en.wikipedia.org/wiki/Character_encoding#Popular_character_encodings
> > > >> >>>
> > > >> >>> will encode "encoding own-name-in-that-encoding;" the same as
> > ASCII
> > > >> >>> would, except for KOI-7 and UTF-7, (both 7 bit encodings that I
> > > doubt
> > > >> >>> anyone ever uses to program java).
> > > >> >>>
> > > >> >>> Therefore, the proposal includes the following strategy to
> > find the
> > > >> >>> encoding statement in a java source file without knowing the
> > > encoding
> > > >> >>> beforehand:
> > > >> >>>
> > > >> >>> An entirely separate parser (the encoding parser) is run
> > repeatedly
> > > >> >>> until the right encoding is found. First it'll decode the
> > input with
> > > >> >>> ISO-8859-1. If that doesn't work, UTF-16 (assume BE if no BOM,
> as
> > > per
> > > >> >>> the java standard), then as UTF-32 (BE if no BOM), then the
> > current
> > > >> >>> behaviour (-encoding parameter's value if any, otherwise
> platform
> > > >> >>> default encoding). This separate parser works as follows:
> > > >> >>>
> > > >> >>> 1. Ignore any comments and whitespace.
> > > >> >>> 3. Ignore the pattern (regexp-like-syntax, ):
> > source\s+[^\s]+\s*; -
> > > if
> > > >> >>> that pattern matches partially but is not correctly completed,
> > that
> > > >> >>> parser run exits without finding an encoding, immediately.
> > > >> >>> 4. Find the pattern: encoding\s+([^\s]+)\s*; - if that pattern
> > > matches
> > > >> >>> partially but is not correctly completed, that parser run
> exists
> > > >> >>> without finding an encoding, immediately. If it does complete,
> the
> > > >> >>> parser also exists immediately and returns the captured value.
> > > >> >>> 5. If it finds anything else, stop immediately, returning no
> > > encoding
> > > >> >>> found.
> > > >> >>>
> > > >> >>> Once it's found something, the 'real' java parser will run
> > using the
> > > >> >>> found encoding (this overrides any -encoding on the command
> line).
> > > >> >>> Note that the encoding parser stops quickly; For example, if it
> > > finds
> > > >> >>> a stray \0 or e.g. the letter 'i' (perhaps the first letter of
> an
> > > >> >>> import statement), it'll stop immediately.
> > > >> >>>
> > > >> >>> If an encoding is encountered that was not found during the
> > standard
> > > >> >>> decoding strategy (ISO-8859-1, UTF-16, UTF-32), but worked
> > only due
> > > to
> > > >> >>> a platform default/command line encoding param, (e.g. a
> platform
> > > that
> > > >> >>> defaults to UTF-16LE without a byte order mark) a warning
> > explaining
> > > >> >>> that the encoding statement isn't doing anything is generated.
> Of
> > > >> >>> course, if the encoding doesn't match itself, you get an error
> > > >> >>> (putting "encoding UTF-16;" into a UTF-8 encoded file for
> > example).
> > > If
> > > >> >>> there is no encoding statement, the 'real' java parser does
> > what it
> > > >> >>> does now: Use the -encoding parameter of javac, and if that
> wasn't
> > > >> >>> present, the platform default.
> > > >> >>>
> > > >> >>> However, there is 1 major and 1 minor problem with this
> approach:
> > > >> >>>
> > > >> >>> B) This means javac will need to read every source file many
> times
> > > to
> > > >> >>> compile it.
> > > >> >>>
> > > >> >>> Worst case (no encoding keyword): 5 times.
> > > >> >>> Standard case if an encoding keyword: 2 times (3 times if
> UTF-16).
> > > >> >>>
> > > >> >>> Fortunately all runs should stop quickly, due to the encoding
> > > parser's
> > > >> >>> penchant to quit very early. Javacs out there will either
> > stuff the
> > > >> >>> entire source file into memory, or if not, disk cache should
> take
> > > care
> > > >> >>> of it, but we can't prove beyond a doubt that this repeated
> > parsing
> > > >> >>> will have no significant impact on compile time. Is this a
> > > >> >>> showstopper? Is the need to include a new (but small) parser
> into
> > > >> >>> javac a showstopper?
> > > >> >>>
> > > >> >>> C) Certain character sets, such as ISO-2022, can make the
> encoding
> > > >> >>> statement unreadable with the standard strategy if a comment
> > > including
> > > >> >>> non-ASCII characters precedes the encoding statement. These
> > > situations
> > > >> >>> are very rare (in fact, I haven't managed to find an example),
> > so is
> > > >> >>> it okay to just ignore this issue? If you add the encoding
> > statement
> > > >> >>> after a bunch of comments that make it invisible, and then
> compile
> > > it
> > > >> >>> with the right -encoding parameter, you WILL get a warning
> > that the
> > > >> >>> encoding statement isn't going to help a javac on another
> > platform /
> > > >> >>> without that encoding parameter to figure it out, so you just
> get
> > > the
> > > >> >>> current status quo: your source file won't compile without an
> > > explicit
> > > >> >>> -encoding parameter (or if that happens to be the platform
> > default).
> > > >> >>> Should this be mentioned in the proposal? Should the compiler
> (and
> > > the
> > > >> >>> proposal) put effort into generating a useful warning message,
> > such
> > > as
> > > >> >>> figuring out if it WOULD parse correctly if the encoding
> statement
> > > is
> > > >> >>> at the very top of the source file, vs. suggesting to recode in
> > > UTF-8?
> > > >> >>>
> > > >> >>> and a final dilemma:
> > > >> >>>
> > > >> >>> D) Should we separate the proposals for source and encoding
> > > keywords?
> > > >> >>> The source keyword is more useful and a lot simpler overall
> > than the
> > > >> >>> encoding keyword, but they do sort of go together.
> > > >> >>
> > > >> >> Separate. Another reason is: the argument of applying different
> > > >> >> settings
> > > >> >> to
> > > >> >> different parts of the project is much less valid with encoding
> > than
> > > >> >> with source.
> > > >> >>
> > > >> >>>
> > > >> >>> --Reinier Zwitserloot and Roel Spilker
> > > >> >>>
> > > >> >>>
> > > >> >> Overall: I would prefer command line options enhanced to handle
> the
> > > >> >> situation
> > > >> >> rather than language change.
> > > >> >>
> > > >> >> Igor Karp
> > > >> >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
> >
>
>
More information about the coin-dev
mailing list