on javac flags

Remi Forax forax at univ-mlv.fr
Wed Sep 26 15:12:21 UTC 2018



----- Mail original -----
> De: "Maurizio Cimadamore" <maurizio.cimadamore at oracle.com>
> À: "compiler-dev" <compiler-dev at openjdk.java.net>
> Envoyé: Mercredi 26 Septembre 2018 16:16:29
> Objet: on javac flags

> Hi,
> In javac we have a Flags class which is used to define a list of all the
> flags that are used, both internally (i.e. javac private flags) and
> externally (i.e ACC_ flags). The assumption is that external flags are
> 16-bit wide, while internal flags are all values above 2^16 (_which can
> fit in a long_).
> 
> There are several issues with the current handling of flags:
> 
> * since both internal and external flags are mapped onto the same
> abstraction (a numeric constant), it becomes confusing to
> marshal/unmarshal public flags to private flags. For instance, the
> internal BRIDGE flag must be mapped to the public ACC_BRIDGE flag when
> we are about to write a classfile - and the reverse operation must be
> performed when we read from the classfile (if we used ACC_BRIDGE
> internally javac would think that the method is VOLATILE). For the
> records - there are 4 separate methods for mapping public flags onto
> private and viceversa (spread between ClassReader and ClassWriter).
> 
> * The code for checking well-formedness of flags has become very
> convoluted. This is caused by the fact that this code used to work on
> the assumption that the well-formedness check only needed to work on
> public flags (i.e. X < 2^16). But when we added default methods, this
> code needed to be upgraded to work on any flag (even internal ones -
> such as DEFAULT, an internal flag), and that turned the code even more
> obscure.
> 
> * checking presence/absence of flags is very tedious; we have many many
> occurrences (600+) of C-like code like:
> 
> if ((sym.flags() & STATIC) == 0 && ...
> 
> or
> 
> (sym.flags() & (ABSTRACT|DEFAULT|PRIVATE)) == ABSTRACT) { ... }
> 
> etc.
> 
> * this is kind of related to the point above - since checking flags is
> so darn unhandy - helper methods popped up; i.e. Symbol.isStatic will
> essentially do (Symbol.flags() & STATIC) != 0 - etc. So there are more
> ways to get to the same answer - and of the codebase doesn't adhere to
> any strict guideline on whether to use one way or the other (although
> generally speaking, the low level variant is preferred when symbol
> completion needs to be avoided).
> 
> * last and not least - we are running out of internal flags (we only
> have 6 available slots in jdk/jdk [2]). While we can be more
> parsimonious w.r.t. our internal flags (and indeed we have, see [3]), I
> think - and garbage collect the unused ones over times, I also think
> that the very nature of an internal flag is such that it can be used to
> describe something very specific (i.e. a compiler invariant - such as
> UNCOMPLETED or ACYCLIC) - and there are many many occasions where we'd
> like to be able to do so; so while 16 bits might be enough for public
> flags (cough), I don't think that 64 bits are necessarily adequate for
> representing the internal flags space. I'd be wary of starting to play
> the same dance we do for public flags - e.g. UNCOMPLETED and ACYCLIC can
> never occur on the same symbols, so let's reuse the same bit for both.
> 
> * Duplication with the Flags.Flag enum; each flag constant is
> represented as an enum, as sometimes client code need to access flags
> that way; needless to say, over time this has led to inconsistencies -
> the last of which has been described in this thread [1]
> 
> All this got me thinking; it seems to me that most of the problems above
> are caused by the fact that we want to shoehorn private flags onto the
> same bits we use for public flags. So, what if we used different
> abstractions for public vs private flags? More specifically, we could
> keep using ints for mapping well-known public flag values (whose value
> is specified in the JVMS after all). But we could use something fancier
> like an EnumSet<Flag> for internal flags. What does this mean?
> 
> First of all, Symbol would no longer have a 'long' field for flags - it
> will have an Set<Flag>, where Flag is an enum (stay with me on this - if
> you are concerned about memory footprint, please note that I'll address
> that later on). This simple change of representation has already many
> advantages:
> 
> * as Flag is an enum, the number of internal flags is virtually unbounded.
> * since we have a Set<Flag>, testing for flag presence/absence becomes
> much nicer (and if the underlying set is an EnumSet, we get good
> performances too!)
> * we could centralize the logic for mapping internal flags to external
> ones onto the Flag itself! This means that each private flag would know
> how to map itself onto an external one (and viceversa) - and
> ClassReader/Writer will simply take advantage of that.
> 
> Ok, the obvious catch is that what used to be a 'value' (a long) is now
> a reference pointing into the heap. So, with a naive implementation, we
> would have one more heap-allocated object per Symbol. This is of course
> not a very desirable property. Can we improve? I think we can - for
> instance, the Flag enum can define immutable enum sets for very common
> flags combos:
> 
> public static
> public
> protected
> public abstract
> 
> and so forth. Since these flag masks are shared, they will need to be
> made immutable (Collections.immutableSet) to make sure that nobody will
> try to add stuff to them. Flags could also define a bunch of methods to
> union/intersect flags, so that you get a new EnumSet with the right bits
> (this overcomes the problem that some flag sets are immutable).
> 
> At the same time, Flags could also define a factory method which takes a
> bunch of Modifiers (as read by the parser) and turns them into an
> internal flag set - the factory would check for common idioms and return
> the shared objects where possible.
> 
> Another, even more extreme way, would be to pool all flags into a
> Set<Set<Flags>> - and then, if the flag set you are about to create is
> already contained in the set, you return it, otherwise you add the new
> flag combo into the set. This means taking a slight hit on flag creation
> (i.e. the new flag combo will need to be looked up in the shared set).
> But it has the advantage of not needing to 'guess' which flag combos are
> likely - and, since all sets created in this way will be immutable, it
> would provide more uniformity - i.e. to chain flags together you have to
> go through the Flag helper methods.
> 
> Where does this leave us? I think with these optimizations the memory
> footprint should be relatively contained - we are essentially trading a
> 64-bit value (long flag) with a 64 bit pointer which might or might not
> point to a fresh new object (if not, no extra cost). Of course this all
> needed to be validated with some real world profiling/JMH benchmark, but
> seems like a direction worth exploring.
> 
> Thoughts?

Three years ago, i've done the opposite transformation Set<Flat> to long for a compiler far less complex than javac,
the main issues when you have a Set<Flags> are:
- the JumboEnumSet (more than 64 flags) is quite slow because it uses an array internally,
- it's very easy to have more than two representations for a callsite, and at that point your flag check code goes megamorphic.
- even using an interface makes the code slow because an invokevirtual means there is a check in the assembly code, each additional checks in a tight loop is visible in term of perf.

The last two points can be easily hidden when doing JMH tests. 

For dealing with the fact that there are more than 64 flags, i've used 2 longs, one for the public API, one for the internal code, but since another developer change the code to go back to use a 64 bits by reusing the bits (because of the memory consumption).

What you are describing is the kind of perfect use case for a value type. So we know the perfect solution, it's just that you can not using it yet.

> 
> Maurizio

Rémi

> 
> [1] -
> http://mail.openjdk.java.net/pipermail/compiler-dev/2018-September/012470.html
> [2] -
> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/jdk.compiler/share/classes/com/sun/tools/javac/code/Flags.java#l314
> [3] -
> http://hg.openjdk.java.net/jdk/jdk/file/tip/src/jdk.compiler/share/classes/com/sun/tools/javac/code/Flags.java#l320


More information about the compiler-dev mailing list