on javac flags

Thu Sep 27 04:25:11 UTC 2018

Having two different mechanisms for the two sets of flags doesn’t sound great either.  However, if the problem is that we’re worried about overflowing RegularEnumSet, we can have a FlagSet class that implements Set<Flag> that uses two longs instead of an array.  Then we can put the public and private flags into one enum, and wrap a flag-spec specific API around all of it.  Then when we have values, we can turn it into a value.  

> On Sep 26, 2018, at 7:16 AM, Maurizio Cimadamore <maurizio.cimadamore at oracle.com> wrote:
> 
> Hi,
> In javac we have a Flags class which is used to define a list of all the flags that are used, both internally (i.e. javac private flags) and externally (i.e ACC_ flags). The assumption is that external flags are 16-bit wide, while internal flags are all values above 2^16 (_which can fit in a long_).
> 
> There are several issues with the current handling of flags:
> 
> * since both internal and external flags are mapped onto the same abstraction (a numeric constant), it becomes confusing to marshal/unmarshal public flags to private flags. For instance, the internal BRIDGE flag must be mapped to the public ACC_BRIDGE flag when we are about to write a classfile - and the reverse operation must be performed when we read from the classfile (if we used ACC_BRIDGE internally javac would think that the method is VOLATILE). For the records - there are 4 separate methods for mapping public flags onto private and viceversa (spread between ClassReader and ClassWriter).
> 
> * The code for checking well-formedness of flags has become very convoluted. This is caused by the fact that this code used to work on the assumption that the well-formedness check only needed to work on public flags (i.e. X < 2^16). But when we added default methods, this code needed to be upgraded to work on any flag (even internal ones - such as DEFAULT, an internal flag), and that turned the code even more obscure.
> 
> * checking presence/absence of flags is very tedious; we have many many occurrences (600+) of C-like code like:
> 
> if ((sym.flags() & STATIC) == 0 && ...
> 
> or
> 
> (sym.flags() & (ABSTRACT|DEFAULT|PRIVATE)) == ABSTRACT) { ... }
> 
> etc.
> 
> * this is kind of related to the point above - since checking flags is so darn unhandy - helper methods popped up; i.e. Symbol.isStatic will essentially do (Symbol.flags() & STATIC) != 0 - etc. So there are more ways to get to the same answer - and of the codebase doesn't adhere to any strict guideline on whether to use one way or the other (although generally speaking, the low level variant is preferred when symbol completion needs to be avoided).
> 
> * last and not least - we are running out of internal flags (we only have 6 available slots in jdk/jdk [2]). While we can be more parsimonious w.r.t. our internal flags (and indeed we have, see [3]), I think - and garbage collect the unused ones over times, I also think that the very nature of an internal flag is such that it can be used to describe something very specific (i.e. a compiler invariant - such as UNCOMPLETED or ACYCLIC) - and there are many many occasions where we'd like to be able to do so; so while 16 bits might be enough for public flags (cough), I don't think that 64 bits are necessarily adequate for representing the internal flags space. I'd be wary of starting to play the same dance we do for public flags - e.g. UNCOMPLETED and ACYCLIC can never occur on the same symbols, so let's reuse the same bit for both.
> 
> * Duplication with the Flags.Flag enum; each flag constant is represented as an enum, as sometimes client code need to access flags that way; needless to say, over time this has led to inconsistencies - the last of which has been described in this thread [1]
> 
> All this got me thinking; it seems to me that most of the problems above are caused by the fact that we want to shoehorn private flags onto the same bits we use for public flags. So, what if we used different abstractions for public vs private flags? More specifically, we could keep using ints for mapping well-known public flag values (whose value is specified in the JVMS after all). But we could use something fancier like an EnumSet<Flag> for internal flags. What does this mean?
> 
> First of all, Symbol would no longer have a 'long' field for flags - it will have an Set<Flag>, where Flag is an enum (stay with me on this - if you are concerned about memory footprint, please note that I'll address that later on). This simple change of representation has already many advantages:
> 
> * as Flag is an enum, the number of internal flags is virtually unbounded.
> * since we have a Set<Flag>, testing for flag presence/absence becomes much nicer (and if the underlying set is an EnumSet, we get good performances too!)
> * we could centralize the logic for mapping internal flags to external ones onto the Flag itself! This means that each private flag would know how to map itself onto an external one (and viceversa) - and ClassReader/Writer will simply take advantage of that.
> 
> Ok, the obvious catch is that what used to be a 'value' (a long) is now a reference pointing into the heap. So, with a naive implementation, we would have one more heap-allocated object per Symbol. This is of course not a very desirable property. Can we improve? I think we can - for instance, the Flag enum can define immutable enum sets for very common flags combos:
> 
> public static
> public
> protected
> public abstract
> 
> and so forth. Since these flag masks are shared, they will need to be made immutable (Collections.immutableSet) to make sure that nobody will try to add stuff to them. Flags could also define a bunch of methods to union/intersect flags, so that you get a new EnumSet with the right bits (this overcomes the problem that some flag sets are immutable).
> 
> At the same time, Flags could also define a factory method which takes a bunch of Modifiers (as read by the parser) and turns them into an internal flag set - the factory would check for common idioms and return the shared objects where possible.
> 
> Another, even more extreme way, would be to pool all flags into a Set<Set<Flags>> - and then, if the flag set you are about to create is already contained in the set, you return it, otherwise you add the new flag combo into the set. This means taking a slight hit on flag creation (i.e. the new flag combo will need to be looked up in the shared set). But it has the advantage of not needing to 'guess' which flag combos are likely - and, since all sets created in this way will be immutable, it would provide more uniformity - i.e. to chain flags together you have to go through the Flag helper methods.
> 
> Where does this leave us? I think with these optimizations the memory footprint should be relatively contained - we are essentially trading a 64-bit value (long flag) with a 64 bit pointer which might or might not point to a fresh new object (if not, no extra cost). Of course this all needed to be validated with some real world profiling/JMH benchmark, but seems like a direction worth exploring.
> 
> Thoughts?
> 
> Maurizio
> 
> [1] - http://mail.openjdk.java.net/pipermail/compiler-dev/2018-September/012470.html
> [2] - http://hg.openjdk.java.net/jdk/jdk/file/tip/src/jdk.compiler/share/classes/com/sun/tools/javac/code/Flags.java#l314
> [3] - http://hg.openjdk.java.net/jdk/jdk/file/tip/src/jdk.compiler/share/classes/com/sun/tools/javac/code/Flags.java#l320
>