File Format Investigation: jimage
Hi, We have been looking at lib/modules (a.k.a jimage) as a potential candidate format for Project Leyden. Along the way, I've documented how the file is structured[1] and used in its current form. The document discussing jimage in the context of Leyden is here: https://cr.openjdk.org/~sgehwolf/leyden/jimage_file_format_investigation_ley... Looking forward to your feedback! Thanks, Severin [1] https://cr.openjdk.org/~sgehwolf/leyden/jimage_visualization_1.png
Nice analysis and reverse-engineering. (Also, nice monospace font; is that Roboto?) I agree with your list of "useful properties"; storing existing JVM artifacts (such as CDS archives) as well as new JVM artifacts (heap object snapshots, AOT) are going to be required, and jimage is a candidate here (though not the only one.) Similarly, there's an evaluation to be done, for example, whether to update the CDS format to store more things (AOT, more general heap object storage), or to have separate stores for these, or to overhaul the format. But whichever is chosen, embedding in jimage seems a possibility. The one assumption that I would quibble with is the suggestion for using jimage as an intermediate format in the condenser pipeline. It seems credible that separate intermediate and final formats will be preferable to a one-size-fits-all, since the needs of the next condenser in the pipeline at build time are not likely to be the same as those of the VM at runtime, and formats like CDS are optimized for the VM to consume them, not to be, for example, easily queried and updated. Cheers, -Brian On 2/14/2023 8:39 AM, Severin Gehwolf wrote:
Hi,
We have been looking at lib/modules (a.k.a jimage) as a potential candidate format for Project Leyden. Along the way, I've documented how the file is structured[1] and used in its current form. The document discussing jimage in the context of Leyden is here:
https://cr.openjdk.org/~sgehwolf/leyden/jimage_file_format_investigation_ley...
Looking forward to your feedback!
Thanks, Severin
[1]https://cr.openjdk.org/~sgehwolf/leyden/jimage_visualization_1.png
On 2/14/2023 1:50 PM, Brian Goetz wrote:
Nice analysis and reverse-engineering. (Also, nice monospace font; is that Roboto?)
I agree with your list of "useful properties"; storing existing JVM artifacts (such as CDS archives) as well as new JVM artifacts (heap object snapshots, AOT) are going to be required, and jimage is a candidate here (though not the only one.) Similarly, there's an evaluation to be done, for example, whether to update the CDS format to store more things (AOT, more general heap object storage), or to have separate stores for these, or to overhaul the format. But whichever is chosen, embedding in jimage seems a possibility.
The one assumption that I would quibble with is the suggestion for using jimage as an intermediate format in the condenser pipeline. It seems credible that separate intermediate and final formats will be preferable to a one-size-fits-all, since the needs of the next condenser in the pipeline at build time are not likely to be the same as those of the VM at runtime, and formats like CDS are optimized for the VM to consume them, not to be, for example, easily queried and updated.
FYI, For debugging purposes, CDS can be generated with a map file that describes its contents in detail. One thing on my todo list is to generate an hprof file as part of the CDS dump. You can load it into an hprof viewer to examine the archived Java objects. But I agree that trying to extract individual elements from the CDS archive is not easy, and probably not worth doing. Thanks - Ioi
Cheers, -Brian
On 2/14/2023 8:39 AM, Severin Gehwolf wrote:
Hi,
We have been looking at lib/modules (a.k.a jimage) as a potential candidate format for Project Leyden. Along the way, I've documented how the file is structured[1] and used in its current form. The document discussing jimage in the context of Leyden is here:
https://cr.openjdk.org/~sgehwolf/leyden/jimage_file_format_investigation_ley...
Looking forward to your feedback!
Thanks, Severin
[1]https://cr.openjdk.org/~sgehwolf/leyden/jimage_visualization_1.png
On Tue, Feb 14, 2023 at 4:50 PM Brian Goetz <brian.goetz@oracle.com> wrote:
Nice analysis and reverse-engineering. (Also, nice monospace font; is that Roboto?)
I agree with your list of "useful properties"; storing existing JVM artifacts (such as CDS archives) as well as new JVM artifacts (heap object snapshots, AOT) are going to be required, and jimage is a candidate here (though not the only one.) Similarly, there's an evaluation to be done, for example, whether to update the CDS format to store more things (AOT, more general heap object storage), or to have separate stores for these, or to overhaul the format. But whichever is chosen, embedding in jimage seems a possibility.
The one assumption that I would quibble with is the suggestion for using jimage as an intermediate format in the condenser pipeline. It seems credible that separate intermediate and final formats will be preferable to a one-size-fits-all, since the needs of the next condenser in the pipeline at build time are not likely to be the same as those of the VM at runtime, and formats like CDS are optimized for the VM to consume them, not to be, for example, easily queried and updated.
"Intermediate format" doesn't quite capture our thinking here. The assumption was that starting from a JVM and a set of modules, we might apply some condensers (jlink being the tool for such activities today), and produce a resulting image. This image should both be runnable as is and also be usable as input for further condensation passes. Mark and I's discussion [0] indicated this was a goal at least for non-terminal condensers (ie: those that haven't "condensed all the way down to a platform-specific executable"). Does that mean that jimage (or whatever the chosen format ends up being) must be the intermediate form? Probably not but I think it does put a requirement on the system that condensed images need to be considered as valid inputs to the system. Were you thinking of requiring a separate step that converts condensed images to runnable ones? Where the second step in this process is always required to get a runnable result? { JDK + modules | condensed image } --> condensers --> condensed image condensed image --> runnable image --Dan [0] https://mail.openjdk.org/pipermail/leyden-dev/2022-October/000085.html
Cheers, -Brian
On 2/14/2023 8:39 AM, Severin Gehwolf wrote:
Hi,
We have been looking at lib/modules (a.k.a jimage) as a potential candidate format for Project Leyden. Along the way, I've documented how the file is structured[1] and used in its current form. The document discussing jimage in the context of Leyden is here: https://cr.openjdk.org/~sgehwolf/leyden/jimage_file_format_investigation_ley...
Looking forward to your feedback!
Thanks, Severin
[1] https://cr.openjdk.org/~sgehwolf/leyden/jimage_visualization_1.png
The devil is in the details, of course; different condensers may do different degrees of condensation. You cited as your use case:
starting from a JVM and a set of modules, we might apply some condensers (jlink being the tool for such activities today), and produce a resulting image.
The key word that is confusing us here is *some*; let's say you're applying condensers A, B, and C. Should we produce a complete, runnable, binary application image with AppCDS archives and all that between A and B, and then again between B and C? That's what I mean by "intermediate format". On 2/15/2023 9:16 AM, Dan Heidinga wrote:
On Tue, Feb 14, 2023 at 4:50 PM Brian Goetz <brian.goetz@oracle.com> wrote:
Nice analysis and reverse-engineering. (Also, nice monospace font; is that Roboto?)
I agree with your list of "useful properties"; storing existing JVM artifacts (such as CDS archives) as well as new JVM artifacts (heap object snapshots, AOT) are going to be required, and jimage is a candidate here (though not the only one.) Similarly, there's an evaluation to be done, for example, whether to update the CDS format to store more things (AOT, more general heap object storage), or to have separate stores for these, or to overhaul the format. But whichever is chosen, embedding in jimage seems a possibility.
The one assumption that I would quibble with is the suggestion for using jimage as an intermediate format in the condenser pipeline. It seems credible that separate intermediate and final formats will be preferable to a one-size-fits-all, since the needs of the next condenser in the pipeline at build time are not likely to be the same as those of the VM at runtime, and formats like CDS are optimized for the VM to consume them, not to be, for example, easily queried and updated.
"Intermediate format" doesn't quite capture our thinking here. The assumption was that starting from a JVM and a set of modules, we might apply some condensers (jlink being the tool for such activities today), and produce a resulting image. This image should both be runnable as is and also be usable as input for further condensation passes. Mark and I's discussion [0] indicated this was a goal at least for non-terminal condensers (ie: those that haven't "condensed all the way down to a platform-specific executable").
Does that mean that jimage (or whatever the chosen format ends up being) must be the intermediate form? Probably not but I think it does put a requirement on the system that condensed images need to be considered as valid inputs to the system.
Were you thinking of requiring a separate step that converts condensed images to runnable ones? Where the second step in this process is always required to get a runnable result?
{ JDK + modules | condensed image } --> condensers --> condensed image condensed image --> runnable image
--Dan
[0] https://mail.openjdk.org/pipermail/leyden-dev/2022-October/000085.html
Cheers, -Brian
On 2/14/2023 8:39 AM, Severin Gehwolf wrote:
Hi,
We have been looking at lib/modules (a.k.a jimage) as a potential candidate format for Project Leyden. Along the way, I've documented how the file is structured[1] and used in its current form. The document discussing jimage in the context of Leyden is here:
https://cr.openjdk.org/~sgehwolf/leyden/jimage_file_format_investigation_ley...
Looking forward to your feedback!
Thanks, Severin
[1]https://cr.openjdk.org/~sgehwolf/leyden/jimage_visualization_1.png
On Wed, Feb 15, 2023 at 9:36 AM Brian Goetz <brian.goetz@oracle.com> wrote:
The devil is in the details, of course; different condensers may do different degrees of condensation. You cited as your use case:
starting from a JVM and a set of modules, we might apply some condensers (jlink being the tool for such activities today), and produce a resulting image.
The key word that is confusing us here is *some*; let's say you're applying condensers A, B, and C. Should we produce a complete, runnable, binary application image with AppCDS archives and all that between A and B, and then again between B and C? That's what I mean by "intermediate format".
I think we're on the same page. My expectation was I'd have a runnable image at the end of A->B->C and could then, assuming non-terminal condensers, either execute it or feed it through a new condenser pass of D->E->F. --Dan
On 2/15/2023 9:16 AM, Dan Heidinga wrote:
On Tue, Feb 14, 2023 at 4:50 PM Brian Goetz <brian.goetz@oracle.com> wrote:
Nice analysis and reverse-engineering. (Also, nice monospace font; is that Roboto?)
I agree with your list of "useful properties"; storing existing JVM artifacts (such as CDS archives) as well as new JVM artifacts (heap object snapshots, AOT) are going to be required, and jimage is a candidate here (though not the only one.) Similarly, there's an evaluation to be done, for example, whether to update the CDS format to store more things (AOT, more general heap object storage), or to have separate stores for these, or to overhaul the format. But whichever is chosen, embedding in jimage seems a possibility.
The one assumption that I would quibble with is the suggestion for using jimage as an intermediate format in the condenser pipeline. It seems credible that separate intermediate and final formats will be preferable to a one-size-fits-all, since the needs of the next condenser in the pipeline at build time are not likely to be the same as those of the VM at runtime, and formats like CDS are optimized for the VM to consume them, not to be, for example, easily queried and updated.
"Intermediate format" doesn't quite capture our thinking here. The assumption was that starting from a JVM and a set of modules, we might apply some condensers (jlink being the tool for such activities today), and produce a resulting image. This image should both be runnable as is and also be usable as input for further condensation passes. Mark and I's discussion [0] indicated this was a goal at least for non-terminal condensers (ie: those that haven't "condensed all the way down to a platform-specific executable").
Does that mean that jimage (or whatever the chosen format ends up being) must be the intermediate form? Probably not but I think it does put a requirement on the system that condensed images need to be considered as valid inputs to the system.
Were you thinking of requiring a separate step that converts condensed images to runnable ones? Where the second step in this process is always required to get a runnable result?
{ JDK + modules | condensed image } --> condensers --> condensed image condensed image --> runnable image
--Dan
[0] https://mail.openjdk.org/pipermail/leyden-dev/2022-October/000085.html
Cheers, -Brian
On 2/14/2023 8:39 AM, Severin Gehwolf wrote:
Hi,
We have been looking at lib/modules (a.k.a jimage) as a potential candidate format for Project Leyden. Along the way, I've documented how the file is structured[1] and used in its current form. The document discussing jimage in the context of Leyden is here: https://cr.openjdk.org/~sgehwolf/leyden/jimage_file_format_investigation_ley...
Looking forward to your feedback!
Thanks, Severin
[1] https://cr.openjdk.org/~sgehwolf/leyden/jimage_visualization_1.png
I think we're on the same page about "what should you be able to do", though clearly we need some more refined terminology for describing it, since there are a number of possible states. On 2/15/2023 9:48 AM, Dan Heidinga wrote:
On Wed, Feb 15, 2023 at 9:36 AM Brian Goetz <brian.goetz@oracle.com> wrote:
The devil is in the details, of course; different condensers may do different degrees of condensation. You cited as your use case:
starting from a JVM and a set of modules, we might apply some condensers (jlink being the tool for such activities today), and produce a resulting image.
The key word that is confusing us here is *some*; let's say you're applying condensers A, B, and C. Should we produce a complete, runnable, binary application image with AppCDS archives and all that between A and B, and then again between B and C? That's what I mean by "intermediate format".
I think we're on the same page. My expectation was I'd have a runnable image at the end of A->B->C and could then, assuming non-terminal condensers, either execute it or feed it through a new condenser pass of D->E->F.
--Dan
On 2/15/2023 9:16 AM, Dan Heidinga wrote:
On Tue, Feb 14, 2023 at 4:50 PM Brian Goetz <brian.goetz@oracle.com> wrote:
Nice analysis and reverse-engineering. (Also, nice monospace font; is that Roboto?)
I agree with your list of "useful properties"; storing existing JVM artifacts (such as CDS archives) as well as new JVM artifacts (heap object snapshots, AOT) are going to be required, and jimage is a candidate here (though not the only one.) Similarly, there's an evaluation to be done, for example, whether to update the CDS format to store more things (AOT, more general heap object storage), or to have separate stores for these, or to overhaul the format. But whichever is chosen, embedding in jimage seems a possibility.
The one assumption that I would quibble with is the suggestion for using jimage as an intermediate format in the condenser pipeline. It seems credible that separate intermediate and final formats will be preferable to a one-size-fits-all, since the needs of the next condenser in the pipeline at build time are not likely to be the same as those of the VM at runtime, and formats like CDS are optimized for the VM to consume them, not to be, for example, easily queried and updated.
"Intermediate format" doesn't quite capture our thinking here. The assumption was that starting from a JVM and a set of modules, we might apply some condensers (jlink being the tool for such activities today), and produce a resulting image. This image should both be runnable as is and also be usable as input for further condensation passes. Mark and I's discussion [0] indicated this was a goal at least for non-terminal condensers (ie: those that haven't "condensed all the way down to a platform-specific executable").
Does that mean that jimage (or whatever the chosen format ends up being) must be the intermediate form? Probably not but I think it does put a requirement on the system that condensed images need to be considered as valid inputs to the system.
Were you thinking of requiring a separate step that converts condensed images to runnable ones? Where the second step in this process is always required to get a runnable result?
{ JDK + modules | condensed image } --> condensers --> condensed image condensed image --> runnable image
--Dan
[0] https://mail.openjdk.org/pipermail/leyden-dev/2022-October/000085.html
Cheers, -Brian
On 2/14/2023 8:39 AM, Severin Gehwolf wrote:
Hi,
We have been looking at lib/modules (a.k.a jimage) as a potential candidate format for Project Leyden. Along the way, I've documented how the file is structured[1] and used in its current form. The document discussing jimage in the context of Leyden is here:
https://cr.openjdk.org/~sgehwolf/leyden/jimage_file_format_investigation_ley...
Looking forward to your feedback!
Thanks, Severin
[1]https://cr.openjdk.org/~sgehwolf/leyden/jimage_visualization_1.png
On Tue, 2023-02-14 at 16:50 -0500, Brian Goetz wrote:
Nice analysis and reverse-engineering. (Also, nice monospace font; is that Roboto?)
Thanks. The font is Roboto Mono.
I agree with your list of "useful properties"; storing existing JVM artifacts (such as CDS archives) as well as new JVM artifacts (heap object snapshots, AOT) are going to be required, and jimage is a candidate here (though not the only one.) Similarly, there's an evaluation to be done, for example, whether to update the CDS format to store more things (AOT, more general heap object storage), or to have separate stores for these, or to overhaul the format. But whichever is chosen, embedding in jimage seems a possibility. The one assumption that I would quibble with is the suggestion for using jimage as an intermediate format in the condenser pipeline. It seems credible that separate intermediate and final formats will be preferable to a one-size-fits-all, since the needs of the next condenser in the pipeline at build time are not likely to be the same as those of the VM at runtime, and formats like CDS are optimized for the VM to consume them, not to be, for example, easily queried and updated.
Perhaps it would be useful to define "format" in this context. Is "format" a single file? Could it be many? Perhaps so. In terms of composablility it doesn't seem strictly necessary as the current way how jlink (or whatever the future interface to drive the condenser pipeline is) works, is to also include transformations for binaries and shared objects. Thanks, Severin
participants (4)
-
Brian Goetz
-
Dan Heidinga
-
Ioi Lam
-
Severin Gehwolf