RFC: regarding metaspace(metadata?) dump

Ioi Lam ioi.lam at oracle.com
Thu Jan 12 06:00:18 UTC 2023



On 1/11/2023 6:52 PM, Yi Yang wrote:
> Hi Ioi,
>
> > I think there are overlaps between your proposal and existing tools. For 
> example, there are jcmd options such as VM.class_hierarchy and 
> VM.classes, etc.
> > The Serviceability Agent can also be used to analyze the contents of 
> the class metadata.
>
> Of course, we can continue to add jcmd commands such as jcmd 
> VM.method_counter and jcmd VM.aggregtate_by_class_package to help 
> diagnosing, but another once and for all solution is to implement a 
> rich and well-formed metadata dump as this proposal described, 
> third-party parsers and platforms are eligible to analyze well-formed 
> dump file and provide many grouping/filtering 
> options(grouping_by_package, filter_linked, filter_force_inline, 
> essentially VM.class_hierarchy is aggregation of VM.classes).
>
> I'm trying to describe a real use case to illustrate benefits of 
> well-formed metaspace dump: In our internal DevOps platform, I 
> observed that the Metaspace utilization rate of my application has 
> been high. During this period, FGC occurred several times. So I 
> generate a well-formed metaspace dump through DevOps platform, and 
> then the dump file will be automatically generated and uploaded to 
> another internal Java troubleshooting platform, troubleshooting 
> platform further analyzes and show it with many grouping and filter 
> options and so on.
>
> > I'd be interested in seeing your implementation and compare it with 
> the existing tools.
>
> I'm starting to do this, and it may take several months to implement 
> since it looks more like a JEP level feature, I want to hear some 
> general discussion before coding, i.e, is it acceptable to use JSON 
> format? should it be Metadata Dump or keeping the current metaspace 
> scope? Do you think basic+extend output for internal structure is 
> acceptable?
>

Before discussing the output of this tool, I think it's better to first 
discuss the goals and intended use

- For Java app developers, I am not sure if they care about the 
representation of the classes inside HotSpot. They may want to know what 
classes are loaded in what class loaders, or want to trouble shoot 
memory leaks (why aren't my classes unloaded, etc). For these, we 
already have existing tools.

- For HotSpot developers, it would be nice to have a dump of all the 
metadata, but I am not sure how important this is, as people seem to be 
able to get by with their own debugging methods.

By the way, there may be multiple ways of creating such a dump. The 
least intrusive way would be to program the Serviceability Agent, which 
already has a lot of Java APIs to access HotSpot internals. That way, 
you can write the dumper without modifying the HotSpot C++ code. It 
could even be maintained as a project outside of the JDK repo.

Also you mentioned that "Internally we implemented a metaspace dump that 
generates human-readable text". Can you share how this tool was implemented?

Thanks
- Ioi


> > This may be quite difficult, because the metadata contains rewritten 
> Java bytecodes. The rewriting format may be dependent on the JDK 
> version. Also, the class linkage (the resolution of constant pool 
> information) will be vastly from one JDK version to another. So using 
> writing a third party tool that can work with multiple JDK versions 
> will be quite hard.
>
> Thanks for your input! Maybe display rewrited bytecodes? Anyway,  I'll 
> take a close look at this, and I'll prepare a POC along with dump 
> parser and a simple UI diagnose web once ready.
>
> > Also, defining a "portable" format for the dump will be difficult, 
> since we don't know how the internal data structure will evolve in the 
> future.
>
> Yes, since we don't know how internal data structure will changed in 
> the future, so I propose reaching a consensus that we can at least 
> reconstruct Java (rewrited?) source code as much as possible. For 
> example, the dumped JSON object for InstanceKlass contains two parts, 
> the first part contains the necessary information to reconstruct the 
> source code as much as possible, and the second part is extended 
> information, like this:
> {
> name:..,
> super:..,
> flags:...,
> method:[]
> interface:[]
> fields:[],
> annotation:[]
> bytecode:[],
> constantpool:[],
> //extend
> init_state:...,
> init_thread:...,
> }
> The first part is basically unchanged(or adding new fields only), and 
> the extended part is subject to change, visualization dump client 
> checks if fields of JSON objects are defined and displays them further.
>
>     ------------------------------------------------------------------
>     From:Ioi Lam <ioi.lam at oracle.com>
>     Send Time:2023 Jan. 12 (Thu.) 08:15
>     To:hotspot-runtime-dev <hotspot-runtime-dev at openjdk.org>;
>     serviceability-dev at openjdk.java.net
>     <serviceability-dev at openjdk.java.net>
>     Subject:Re: RFC: regarding metaspace(metadata?) dump
>
>     CC-ing serviceability.
>
>     Hi Yi,
>
>     In general, I think it's good to have tools for understanding the
>     internal layout of the class metadata layouts.
>
>     I think there are overlaps between your proposal and existing
>     tools. For example, there are jcmd options such as
>     VM.class_hierarchy and VM.classes, etc.
>
>     The Serviceability Agent can also be used to analyze the contents
>     of the class metadata.
>
>     Dd you look at the existing tools and see how they match up with
>     your requirements?
>
>     I'd be interested in seeing your implementation and compare it
>     with the existing tools.
>
>
>     On 1/11/2023 4:56 AM, Yi Yang wrote:
>     Hi,
>
>     Internally, we often receive feedback from users and ask for help
>     on metaspace-related issues, for example
>     1. Users are eager to know which GroovyClassLoader loads which
>     classes, why they are not unloaded,
>     and why they are leading to Metaspace OOME.
>     2. They want to know the class structure of dynamically generated
>     classes in some scenarios such as
>     deserialization
>     3. Finding memory leaking about duplicated classes
>     ...
>     Internally we implemented a metaspace dump that generates
>     human-readable text, it looks something like this:
>
>     [Basic Information]
>     Dump Reason : JCMD
>     MaxMetaspaceSize : 18446744073709547520 B
>     CompressedClassSpaceSize : 1073741824 B
>     Class Space Used : 309992 B
>     Class Space Capacity : 395264 B
>     ...
>     [Class Loader Data]
>     ClassLoaderData : loader = 0x000000008024f928, loader_klass =
>     0x0000000800010098, loader_klass_name =
>     sun/misc/Launcher$AppClassLoader, label = N/A
>       Class Used Chunks :
>         * Chunk : [0x0000000800060000, 0x0000000800060230,
>     0x0000000800060800)
>       NonClass Used Chunks :
>         * Chunk : [0x00007fd8379c1000, 0x00007fd8379c1350,
>     0x00007fd8379c2000)
>       Klasses :
>         Klass : 0x0000000800060028, name = Test, size = 520 B
>           ConstantPool : 0x00007fd8379c1050, size = 296 B
>     ...
>
>     It has been working effectively for several years and has helped
>     many users solve metaspace-related problems.
>     But a more user-friendly way is that JDK can inherently support
>     this capability. We hope that format of the metaspace
>     dump file can take both flexibility and compatibility into
>     account,  and the content of dump file should be detailed
>     enough to meet the needs of both application developers and
>     lower-level developers.
>
>     Based on above considerations, I think using JSON as its file
>     format is an appropriate solution(But XML or binary
>     format are still not excluded as candidates). Specifically, in
>     earlier thoughts, I thought the format of the metaspace
>     file could be as follows(pretty printed)
>
>     https://gist.github.com/y1yang0/ab3034b6381b8a9d215602c89af4e9c3
>
>     Using the JSON format, we can flexibly add new fields without
>     breaking compatibility. It is debatable as to which data
>     to write. We can reach a consensus that third-party
>     parsers(Metaspace Analyzer Tool) can at least reconstruct Java
>     source code from the dump file.
>
>     This may be quite difficult, because the metadata contains
>     rewritten Java bytecodes. The rewriting format may be dependent on
>     the JDK version. Also, the class linkage (the resolution of
>     constant pool information) will be vastly from one JDK version to
>     another. So using writing a third party tool that can work with
>     multiple JDK versions will be quite hard. Also, defining a
>     "portable" format for the dump will be difficult, since we don't
>     know how the internal data structure will evolve in the future.
>
>     Thanks
>     - Ioi
>
>
>     Based on this, we can write more useful information for low-level
>     troubleshooting
>     or debugging. (e.g. the init_state of InstanceKlass).
>     In addition, we can even output the native code and associated
>     information with regard to Method, third-party parser
>     can reconstruct the human-readable assembly representation of the
>     compiled method based on dump file. To some extent,
>     we have implemented code cache dump by the way. For this reason,
>     I'm not sure if the title of the RFC proposal should
>     be called metaspace dump, maybe metadata dump?  It looks more like
>     a metadata-dump framework.
>
>     Do you have any thoughts about metaspace/metadata dump? Looking
>     forward to hearing your feedback, any comments are invaluable!
>
>     Best regards,
>     Yi Yang
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-runtime-dev/attachments/20230111/524aff78/attachment-0001.htm>


More information about the hotspot-runtime-dev mailing list