Improving AppCDS for Custom Loaders
Volker Simonis
volker.simonis at gmail.com
Sat May 5 20:22:44 UTC 2018
Hi Jiangli,
thanks a lot explaining your plans more precisely. I think the approach
makes much more sense now.
While I‘m all for it in general, I still have some questions and remarks :)
1. Will it be possible to use the two archives independantly ? I.e. will it
be possible to choose just one of the two archives as well as a combination
of both?
2. We should make sure to not introduce any performance regressions if we
have two archives (and potentially three SymbolTables and three
“Metaspaces“)
3. In general I think we’re moving away from scenarios where we have one
single, central Java installation on a system which is used by the
“tens/hundreds different applications” you mentioned in your mail towards a
world where every of these “tens/hundreds different applications” comes
with its own, bundled JDK/JRE version (and as far as I understand Oracle
is propagating this new distribution model after the deprecation of
Applets/WebStart). In such a world it would be not easy to have a static,
common system layer archive. We could try to keep the format of the
archived classes as compatible between Java versions as possible, although
I understand that this will be quite complicated. Do you already have any
thoughts on this specific problem?
4. Finally we should design the new model in a way which allows it to
easily integrate AOT compiled code into the same one or two archives in the
future. Because with AOT we face the exactly same problems and having to
maintain four different archives will be not very user friendly. OpenJ9
already uses a single archive for CDT/AOT/ProfilingData. I don’t say we
have to unify the AOT and CDT archives in HotSpot now, I just want to
suggest that if we’re redesigning the CDS archive anyway to perhaps think
twice before we come to a new model such that we at least don’t exclude the
possibility of a unification with the AOT archive in the future.
Thanks,
Volker
Jiangli Zhou <jiangli.zhou at oracle.com> schrieb am Fr. 4. Mai 2018 um 19:42:
> Hi Volker,
>
> Thank you so much for the feedbacks!
>
> You comments made me realize that some clarifications are need for the
> top-layer dynamic archiving. The top-layer would include all application
> classes, middleware classes, system library classes that are not in the
> bottom layer. Apologizing for not stating that clearly in the proposal.
> With the top layer including all necessary classes (excluding the classes
> in bottom layer) for a specific application, multiple instances running the
> same application can achieve maximum memory sharing. Using your example,
> the extra AWT/Swing classes are archived along with the application classes
> in the top layer.
>
> With the dynamic archiving ability, it’s natural to think there is no need
> for the two-layer scheme. In fact, I originally had the thoughts. However,
> after exploring different possible use cases, I agree two-layer archiving
> do bring benefits that are lacking in the single archive design. One
> example of such use case is tens/hundreds different applications running on
> the same machine. Without a common system layer archive, it is difficult to
> achieve memory sharing for common system classes, String objects, etc
> between different applications.
>
> Another reason why we should go with a two-layer & static/dynamic
> archiving is some runtime information may be sensitive and should not be
> archived. Strings can be a good example in this case. Currently, we archive
> interned String objects without running application. The Strings include
> resolved CONSTANT_strings, class names, method names, etc, which are all
> static information from class files. During execution of an application,
> interned Strings might contain sensitive user information, which is not
> safe to be included in the archive. It’s difficult to distinguish between
> the static information and runtime information if we go with the single
> layer dynamic-only archiving.
>
> We had many discussions internally over long period of time on how to
> improve usability with different approaches. In the end, dynamic-only or
> static-only two-layer archiving all have their own disadvantages and fail
> to meet requirements in some use cases. The hybrid archiving combines
> benefits/advantages of different approaches and seem to be flexible enough
> to fit most usages.
>
> Further comments and feedbacks are much appreciated.
>
> Best regards,
>
> Jiangli
>
> > On May 4, 2018, at 3:01 AM, Volker Simonis <volker.simonis at gmail.com>
> wrote:
> >
> > Hi Jiangli,
> >
> > thanks for sharing the hybrid archiving proposal. I think it is very
> > interesting and useful!
> >
> > One issue I see is that the "system library classes" which should go
> > into the "static" archive are inherently application specific (i.e. if
> > my application uses Swing, it will use most of the AWT/Swing classes).
> > So how will you cope with this problem. Either you put ALL the system
> > classes into the static archive which will be a waste for most
> > applications. Or you just put a small commonly used subset of classes
> > into the static archive which may miss many classes for specific
> > applications.
> >
> > If we would add the possibility to create a dynamic archive at runtime
> > / program end (which I think would be great from a usability
> > perspective) I don't see a big need for two different archives. Two
> > archives will further complicate (and slow down) things like Symbol
> > lookup (we already have two SymbolTable now and we'd probably need a
> > third one if we would have two archives).
> >
> > I don't think that running a few different Java applications on one
> > host is the most important use case for CDS. In such a scenario the
> > current, build time generated archive is probably the best we can do
> > anyway. Instead, I think the most important use case is if we have
> > many instances of the same Java application running on the same host.
> > And this is becoming more common with the raise of containerization.
> > For the latter use case, a dynamically generated, application specific
> > archive would be the optimal solution.
> >
> > Thank you and best regards,
> > Volker
> >
> >
> >
> > On Fri, May 4, 2018 at 3:42 AM, Jiangli Zhou <jiangli.zhou at oracle.com>
> wrote:
> >> Hi Volker,
> >>
> >> Here are some details about the hybrid archiving. The goal is to
> harvest the benefit of archiving by default and improve its usability. The
> hybrid approach combines a two-layer archiving (proposed by Ioi internally)
> and static & dynamic archiving techniques:
> >>
> >> - Statically archive system library classes from a provided classlist
> using the existing method. The archiving includes class metadata, interned
> string objects, constant pool resolved_references arrays, class mirror
> objects, etc. Static archiving can be done at the JDK image build time and
> shipped together with JDK binary. Following need to be addressed:
> >> *Relaxing the runtime CDS/AppCDS boot path check, so the
> packaged archive can be used after the JDK binary is installed on the
> target device. JDK-8199807 was created to address this issue and is
> targeted for JDK 11.
> >> *Add the static archiving generation in JDK build steps and
> package the generated archive with JDK image. The archive can only be
> generated for the same target (both OS can CPU architecture) as the build
> platform. I will create a RFE.
> >>
> >> - Dynamic archiving can done for application classes at the first
> execution of a specific application
> >> * The archive is created on top of the default system archive
> shipped with the JDK image. A separate top-layer archive file is generated
> for each different application.
> >> * Archiving is done at the end of the application execution
> before VM exists by relocating the class metadata to the archive spaces.
> Cleanup also needs to be done for copied class meta data to remove any
> runtime information. Most of the required functionality already exists
> today. For example, class metadata relocation was implemented by Ioi in JDK
> 10.
> >> * Only archive class metadata for application in the top layer
> initially. Archiving java heap objects in the top-layer requires more
> investigations.
> >>
> >> Benefits of the hybrid archiving:
> >> * The system archive can be shared by different applications and
> provides memory saving.
> >> * Archive for application is created and used transparently. No more
> profiling step and class list are required!
> >> * Separating the system archiving from application archiving reduces
> the cost of archiving at application execution time. The overhead added to
> the first execution time is reduced.
> >>
> >> Thanks,
> >>
> >> Jiangli
> >>
> >>
> >>> On May 3, 2018, at 10:34 AM, Jiangli Zhou <jiangli.zhou at Oracle.COM>
> wrote:
> >>>
> >>>
> >>>> On May 3, 2018, at 2:14 AM, Volker Simonis <volker.simonis at gmail.com>
> wrote:
> >>>>
> >>>> On Thu, May 3, 2018 at 11:01 AM, David Holmes <
> david.holmes at oracle.com> wrote:
> >>>>> On 3/05/2018 5:16 PM, Volker Simonis wrote:
> >>>>>>
> >>>>>> On Thu, May 3, 2018 at 8:55 AM, David Holmes <
> david.holmes at oracle.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Just lurking here but ...
> >>>>>>>
> >>>>>>>> But is this really y relevant use case? Why would I like to
> create ONE
> >>>>>>>> archive for several apps? This would actually increase the
> footprint
> >>>>>>>> of a single instance which uses this archive. If I have several
> apps I
> >>>>>>>> would expect that users create a specific archive for each app to
> get
> >>>>>>>> the best out of CDS.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> One app instance may get increased footprint but you presumably
> use CDS
> >>>>>>> because you have multiple apps running (whether the same or not).
> These
> >>>>>>> apps
> >>>>>>> all share the core JDK classes from the archive so the overall
> footprint
> >>>>>>> per
> >>>>>>> instance is less.
> >>>>>>>
> >>>>>>
> >>>>>> If we just want to share the core JDK classes that's easy. For that
> we
> >>>>>> could mostly use the default class list (or a slightly extended one)
> >>>>>> which is generated at JDK build time (at JAVA_HOME/lib/classlist).
> >>>>>
> >>>>>
> >>>>> The point is that you are presumably running multiple instances of
> multiple
> >>>>> apps, hence you want to share one set of core classes across all,
> and share
> >>>>> the app classes across each app instance.
> >>>>>
> >>>>
> >>>> But that would require two archives: a general one with the core
> >>>> classes and an application specific one for each application.
> >>>> Combining the core classes and the application of various applications
> >>>> will not be optimal because the application classes will be all mixed
> >>>> in the same archive. The archive is being mapped page-wise into the
> >>>> java process so you'll probably end up mapping the whole archive into
> >>>> each process although you'll only use a fraction of the classes in the
> >>>> archive.
> >>>>
> >>>>>> If we want to use ONE archive for several applications and we can
> >>>>>> accept to have a bigger footprint if running a single (or just a
> few)
> >>>>>> applications in parallel I suppose the overhead of simply dumping
> all
> >>>>>> the classes from the classpathes of the various applications
> compared
> >>>>>> to an accurate solution where we only dump the actually used classes
> >>>>>> of all applications would be not that big.
> >>>>>
> >>>>>
> >>>>> But those "accurate" solutions duplicate the core classes and that's
> a waste
> >>>>> of footprint.
> >>>>>
> >>>>
> >>>> By "accurate" I meant one "fat" archive which contains all the classes
> >>>> USED by several applications plus the core classes. My argument was
> >>>> that such an "accurate" "fat" archive won't be much smaller compared
> >>>> to a "fat" archive which simply contains all the core classes plus all
> >>>> the application classes (i.e. from the application class pathes, no
> >>>> matter if they are ever used or not). But the latter would be much
> >>>> simpler to implement.
> >>>
> >>> The above discussion and an internal proposal for hybrid archiving
> seem to converge on a few points. If there is no objection to the hybrid
> archiving proposal internally, maybe we can shared the details of the
> proposal on openjdk soon.
> >>>
> >>> Thanks,
> >>>
> >>> Jiangli
> >>>
> >>>
> >>>>
> >>>>> David
> >>>>> -----
> >>>>>
> >>>>>
> >>>>>>> David
> >>>>>>> -----
> >>>>>>>
> >>>>>>>
> >>>>>>> On 3/05/2018 4:48 PM, Volker Simonis wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, May 3, 2018 at 6:52 AM, Ioi Lam <ioi.lam at oracle.com>
> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 5/2/18 10:00 AM, Volker Simonis wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Tue, May 1, 2018 at 8:32 PM, Ioi Lam <ioi.lam at oracle.com>
> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> PROBLEM:
> >>>>>>>>>>>
> >>>>>>>>>>> As discussed with Volker and Yumin in previous e-mails, AppCDS
> has
> >>>>>>>>>>> some
> >>>>>>>>>>> experimental support for custom class loaders. However, it's
> not very
> >>>>>>>>>>> easy
> >>>>>>>>>>> to use.
> >>>>>>>>>>>
> >>>>>>>>>>> For example, you can write a classlist like this:
> >>>>>>>>>>>
> >>>>>>>>>>> java/lang/Object id: 1
> >>>>>>>>>>> CustomLoadee id: 2 super: 1 source: /tmp/foo.jar
> >>>>>>>>>>>
> >>>>>>>>>>> The CustomLoadee class will be stored in the shared archive
> with a
> >>>>>>>>>>> CRC
> >>>>>>>>>>> code.
> >>>>>>>>>>> During runtime, if a customed loader wants to load a class of
> the
> >>>>>>>>>>> same
> >>>>>>>>>>> name,
> >>>>>>>>>>> and its classfile has the same size and CRC as the archived
> class,
> >>>>>>>>>>> the
> >>>>>>>>>>> archived version will be loaded. This speeds up class loading
> by
> >>>>>>>>>>> avoiding
> >>>>>>>>>>> parsing the class file, and saves space by sharing the mmap'ed
> class
> >>>>>>>>>>> metadata across processes.
> >>>>>>>>>>>
> >>>>>>>>>>> You can see an example test at:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> http://hg.openjdk.java.net/jdk/hs/file/46dc568d6804/test/hotspot/jtreg/runtime/appcds/customLoader/HelloCustom.java
> >>>>>>>>>>>
> >>>>>>>>>>> However, the current scheme requires you to specify all the
> super
> >>>>>>>>>>> classes
> >>>>>>>>>>> and interfaces. There's no support provided by the
> >>>>>>>>>>> -XX:DumpLoadedClassList
> >>>>>>>>>>> option. It can be helped somewhat with Volker's tool:
> >>>>>>>>>>> https://github.com/simonis/cl4cds
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> POSSIBLE SOLUTIONS:
> >>>>>>>>>>>
> >>>>>>>>>>> 1. "Dump-as-you-go". As suggested by Yumin, we can provide a
> jcmd to
> >>>>>>>>>>> ask
> >>>>>>>>>>> a
> >>>>>>>>>>> running JVM process to dump all of its loaded classes,
> including
> >>>>>>>>>>> those
> >>>>>>>>>>> loaded by custom loaders, into an archive. An alternative is
> to dump
> >>>>>>>>>>> the
> >>>>>>>>>>> archive at JVM exit time (or when you press Ctrl-C, etc.
> >>>>>>>>>>>
> >>>>>>>>>>> 2. Add information about the custom classes for
> >>>>>>>>>>> -XX:DumpLoadedClassList.
> >>>>>>>>>>> The
> >>>>>>>>>>> trouble is some class loaders don't specify a code source that
> can be
> >>>>>>>>>>> understood by the built-in class loaders. For example, the
> "Fat Jars"
> >>>>>>>>>>> would
> >>>>>>>>>>> have a code source like
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> jar:file:/jdk/tmp/test-1.0-SNAPSHOT.jar!/BOOT-INF/lib/validation-api-2.0.1.Final.jar!/
> >>>>>>>>>>>
> >>>>>>>>>>> also, many custom loaders would pre-process the classfile data
> before
> >>>>>>>>>>> defining the class, so we can't simply archive the version of
> the
> >>>>>>>>>>> class
> >>>>>>>>>>> on
> >>>>>>>>>>> disk.
> >>>>>>>>>>>
> >>>>>>>>>>> One possible solution for #2 is to include the class file data
> in the
> >>>>>>>>>>> -XX:DumpLoadedClassList output:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> java/lang/Object id: 1
> >>>>>>>>>>> CustomLoadee id: 2 super: 1 source: base64
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> yv66vgAAADQAFwoABAAQCQAFABEHABIHABMHABQBAAJIaQEADElubmVyQ2xhc3NlcwEABjxpbml0
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> PgEAAygpVgEABENvZGUBAA9MaW5lTnVtYmVyVGFibGUBAAFmAQADKClJAQAKU291cmNlRmlsZQEA
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> CkhlbGxvLmphdmEMAAgACQwAFQAWAQAFSGVsbG8BABBqYXZhL2xhbmcvT2JqZWN0AQAISGVsbG8k
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> SGkBAAFYAQABSQAhAAMABAAAAAAAAgABAAgACQABAAoAAAAdAAEAAQAAAAUqtwABsQAAAAEACwAA
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> AAYAAQAAAC8ACAAMAA0AAQAKAAAAHAABAAAAAAAEsgACrAAAAAEACwAAAAYAAQAAADEAAgAOAAAA
> >>>>>>>>>>> AgAPAAcAAAAKAAEABQADAAYACA==
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Of the 2 solutions:
> >>>>>>>>>>>
> >>>>>>>>>>> #1 seems easier to use, but may require more invasive
> modifications
> >>>>>>>>>>> in
> >>>>>>>>>>> the
> >>>>>>>>>>> VM, especially if you want to be able to continue execution
> after
> >>>>>>>>>>> dumping.
> >>>>>>>>>>>
> >>>>>>>>>> Not sure what #1 really proposes: dumping the complete .jsa
> archive at
> >>>>>>>>>> runtime or dumping just the loaded classes.
> >>>>>>>>>>
> >>>>>>>>>> If it's just about dumping the loaded class without generating
> the
> >>>>>>>>>> .jsa archive there's the problem that by default the VM doesn't
> store
> >>>>>>>>>> the exact bytes of a class after the class was loaded (except
> when
> >>>>>>>>>> class transformers are registered). So the class files would
> have to
> >>>>>>>>>> be re-assembled from the internal VM structures (in the same
> way this
> >>>>>>>>>> is done for class redefinition) and the resulting class-file
> may be
> >>>>>>>>>> different from the original bytes (i.e. some attributes may be
> >>>>>>>>>> missing).
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> #1 is for creating the JSA file, not just dumping the class
> files.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> If #1 is about creating the whole .jsa archive at runtime (or
> at VM
> >>>>>>>>>> exit) I think that would be the most attractive solution from a
> >>>>>>>>>> usability point of view although I understand that #2 will be
> easier
> >>>>>>>>>> to implement in the short term. Regarding the argument that #1
> will
> >>>>>>>>>> produce a "binary blob" that's true, but that's already true
> now when
> >>>>>>>>>> we use "Xshare:dump". I think it should be not to hard to
> implement a
> >>>>>>>>>> tool based an SA which could introspect a .jsa archive.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> The argument about the binary blob is to compare it against the
> text
> >>>>>>>>> file
> >>>>>>>>> produced by -XX:DumpLoadedClassList.
> >>>>>>>>>
> >>>>>>>>> One use case to consider is when you have a JAR file that
> contains
> >>>>>>>>> several
> >>>>>>>>> apps that each load a unique set of classes. Today, (assuming
> that
> >>>>>>>>> custom
> >>>>>>>>> class loaders are not used), you can run each app once with
> >>>>>>>>> -XX:DumpLoadedClassList, and then do an
> >>>>>>>>>
> >>>>>>>>> cat *.classlist | sort | uniq > combined.classlist
> >>>>>>>>>
> >>>>>>>>> and then create an archive that would work for all these apps.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> But is this really y relevant use case? Why would I like to
> create ONE
> >>>>>>>> archive for several apps? This would actually increase the
> footprint
> >>>>>>>> of a single instance which uses this archive. If I have several
> apps I
> >>>>>>>> would expect that users create a specific archive for each app to
> get
> >>>>>>>> the best out of CDS.
> >>>>>>>>
> >>>>>>>>> With the binary blob, there's no easy way of doing this. It will
> be
> >>>>>>>>> very
> >>>>>>>>> difficult to write a tool to decipher each blob and then somehow
> >>>>>>>>> combine
> >>>>>>>>> them into a single one.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> But if users really wants such a "fat" archive, there's a much
> easier
> >>>>>>>> way: just dump ALL the classes from the .jar file into the
> archive. A
> >>>>>>>> class list for this could easily be assembled either with an
> external
> >>>>>>>> tool like cl4cds (or even a simple shell scripts which converts
> the
> >>>>>>>> output of `unzip -l <jar-file>` into the correct format). Or, even
> >>>>>>>> simpler, by adding a new option to the VM similar to
> >>>>>>>> -XX:DumpLoadedClassList which dumps all the classes it can find
> on the
> >>>>>>>> class path (and potentially other, configurable locations).
> >>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>> - Ioi
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>> #2 would be easier to implement, but the classlist might be
> huge.
> >>>>>>>>>>>
> >>>>>>>>>>> Also, #2 would allow post-processing tools to remove unneeded
> >>>>>>>>>>> classes,
> >>>>>>>>>>> or
> >>>>>>>>>>> merge two runs into a single list. The output of #1 is
> essentially a
> >>>>>>>>>>> binary
> >>>>>>>>>>> blob that's impossible for off-line analysis/optimizations.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Any comments, or suggestions for alternatives?
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>> - Ioi
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
>
>
More information about the hotspot-runtime-dev
mailing list