on javac flags
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Wed Sep 26 14:16:29 UTC 2018
Hi,
In javac we have a Flags class which is used to define a list of all the
flags that are used, both internally (i.e. javac private flags) and
externally (i.e ACC_ flags). The assumption is that external flags are
16-bit wide, while internal flags are all values above 2^16 (_which can
fit in a long_).
There are several issues with the current handling of flags:
* since both internal and external flags are mapped onto the same
abstraction (a numeric constant), it becomes confusing to
marshal/unmarshal public flags to private flags. For instance, the
internal BRIDGE flag must be mapped to the public ACC_BRIDGE flag when
we are about to write a classfile - and the reverse operation must be
performed when we read from the classfile (if we used ACC_BRIDGE
internally javac would think that the method is VOLATILE). For the
records - there are 4 separate methods for mapping public flags onto
private and viceversa (spread between ClassReader and ClassWriter).
* The code for checking well-formedness of flags has become very
convoluted. This is caused by the fact that this code used to work on
the assumption that the well-formedness check only needed to work on
public flags (i.e. X < 2^16). But when we added default methods, this
code needed to be upgraded to work on any flag (even internal ones -
such as DEFAULT, an internal flag), and that turned the code even more
obscure.
* checking presence/absence of flags is very tedious; we have many many
occurrences (600+) of C-like code like:
if ((sym.flags() & STATIC) == 0 && ...
or
(sym.flags() & (ABSTRACT|DEFAULT|PRIVATE)) == ABSTRACT) { ... }
etc.
* this is kind of related to the point above - since checking flags is
so darn unhandy - helper methods popped up; i.e. Symbol.isStatic will
essentially do (Symbol.flags() & STATIC) != 0 - etc. So there are more
ways to get to the same answer - and of the codebase doesn't adhere to
any strict guideline on whether to use one way or the other (although
generally speaking, the low level variant is preferred when symbol
completion needs to be avoided).
* last and not least - we are running out of internal flags (we only
have 6 available slots in jdk/jdk [2]). While we can be more
parsimonious w.r.t. our internal flags (and indeed we have, see [3]), I
think - and garbage collect the unused ones over times, I also think
that the very nature of an internal flag is such that it can be used to
describe something very specific (i.e. a compiler invariant - such as
UNCOMPLETED or ACYCLIC) - and there are many many occasions where we'd
like to be able to do so; so while 16 bits might be enough for public
flags (cough), I don't think that 64 bits are necessarily adequate for
representing the internal flags space. I'd be wary of starting to play
the same dance we do for public flags - e.g. UNCOMPLETED and ACYCLIC can
never occur on the same symbols, so let's reuse the same bit for both.
* Duplication with the Flags.Flag enum; each flag constant is
represented as an enum, as sometimes client code need to access flags
that way; needless to say, over time this has led to inconsistencies -
the last of which has been described in this thread [1]
All this got me thinking; it seems to me that most of the problems above
are caused by the fact that we want to shoehorn private flags onto the
same bits we use for public flags. So, what if we used different
abstractions for public vs private flags? More specifically, we could
keep using ints for mapping well-known public flag values (whose value
is specified in the JVMS after all). But we could use something fancier
like an EnumSet<Flag> for internal flags. What does this mean?
First of all, Symbol would no longer have a 'long' field for flags - it
will have an Set<Flag>, where Flag is an enum (stay with me on this - if
you are concerned about memory footprint, please note that I'll address
that later on). This simple change of representation has already many
advantages:
* as Flag is an enum, the number of internal flags is virtually unbounded.
* since we have a Set<Flag>, testing for flag presence/absence becomes
much nicer (and if the underlying set is an EnumSet, we get good
performances too!)
* we could centralize the logic for mapping internal flags to external
ones onto the Flag itself! This means that each private flag would know
how to map itself onto an external one (and viceversa) - and
ClassReader/Writer will simply take advantage of that.
Ok, the obvious catch is that what used to be a 'value' (a long) is now
a reference pointing into the heap. So, with a naive implementation, we
would have one more heap-allocated object per Symbol. This is of course
not a very desirable property. Can we improve? I think we can - for
instance, the Flag enum can define immutable enum sets for very common
flags combos:
public static
public
protected
public abstract
and so forth. Since these flag masks are shared, they will need to be
made immutable (Collections.immutableSet) to make sure that nobody will
try to add stuff to them. Flags could also define a bunch of methods to
union/intersect flags, so that you get a new EnumSet with the right bits
(this overcomes the problem that some flag sets are immutable).
At the same time, Flags could also define a factory method which takes a
bunch of Modifiers (as read by the parser) and turns them into an
internal flag set - the factory would check for common idioms and return
the shared objects where possible.
Another, even more extreme way, would be to pool all flags into a
Set<Set<Flags>> - and then, if the flag set you are about to create is
already contained in the set, you return it, otherwise you add the new
flag combo into the set. This means taking a slight hit on flag creation
(i.e. the new flag combo will need to be looked up in the shared set).
But it has the advantage of not needing to 'guess' which flag combos are
likely - and, since all sets created in this way will be immutable, it
would provide more uniformity - i.e. to chain flags together you have to
go through the Flag helper methods.
Where does this leave us? I think with these optimizations the memory
footprint should be relatively contained - we are essentially trading a
64-bit value (long flag) with a 64 bit pointer which might or might not
point to a fresh new object (if not, no extra cost). Of course this all
needed to be validated with some real world profiling/JMH benchmark, but
seems like a direction worth exploring.
Thoughts?
Maurizio
[1] -
http://mail.openjdk.java.net/pipermail/compiler-dev/2018-September/012470.html
[2] -
http://hg.openjdk.java.net/jdk/jdk/file/tip/src/jdk.compiler/share/classes/com/sun/tools/javac/code/Flags.java#l314
[3] -
http://hg.openjdk.java.net/jdk/jdk/file/tip/src/jdk.compiler/share/classes/com/sun/tools/javac/code/Flags.java#l320
More information about the compiler-dev
mailing list