From akozlov at openjdk.java.net Tue Feb 1 08:48:42 2022 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Tue, 1 Feb 2022 08:48:42 GMT Subject: [crac] RFR: Ensure empty Reference Handler and Cleaners queues [v2] In-Reply-To: References: Message-ID: On Mon, 31 Jan 2022 13:54:37 GMT, Dan Heidinga wrote: >> Anton Kozlov has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Ensure empty Reference Handler and Cleaners queues > > src/java.base/share/classes/java/lang/ref/Reference.java line 344: > >> 342: @Override >> 343: public void beforeCheckpoint(Context context) throws Exception { >> 344: System.gc(); > > Is a single `System.gc()` sufficient for the Hotspot collectors? With OpenJ9, we used to treat back to back System.gc() calls specially as requiring extra effort. Does Hotspot do something similar? Although I could not find immediate counter-examples after a quick look, I don't really expect this to be sufficient. Hence TODO. ------------- PR: https://git.openjdk.java.net/crac/pull/13 From akozlov at openjdk.java.net Tue Feb 1 08:55:42 2022 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Tue, 1 Feb 2022 08:55:42 GMT Subject: [crac] RFR: Ensure empty Reference Handler and Cleaners queues [v2] In-Reply-To: References: Message-ID: On Mon, 31 Jan 2022 12:51:22 GMT, Anton Kozlov wrote: >> At the time of checkpoint, a set of References may need handling. This change ensures no References pending in ReferenceHandler and in Cleaners. >> >> System.gc() is a best effort attempt to make GC to look for References. Default VM flags (-DisableExplicitGC, -ExplicitGCInvokesConcurrent) should not block the call, but additional investigation is needed to make sure GC found all references. > > Anton Kozlov has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Ensure empty Reference Handler and Cleaners queues Thanks for the look! > I'm not sold on the extra `notifyAll` in ReferenceQueue::remove At least it should be correct :) In the common case of a single thread handling a queue, notifyAll will be effectively no-op (no threads are waiting), so the overhead is small. If more than one thread is handling a queue, they could suffer from unnecessary wake-up. I'll proceed with the integration, but will seek for the reivew on the core-libs-dev list. ------------- PR: https://git.openjdk.java.net/crac/pull/13 From akozlov at azul.com Tue Feb 1 09:11:19 2022 From: akozlov at azul.com (Anton Kozlov) Date: Tue, 1 Feb 2022 12:11:19 +0300 Subject: [crac] RFR: Ensure empty Reference Handler and Cleaners queues In-Reply-To: References: Message-ID: Cross-posting RFR from CRaC Project. The change touches Reference class, so I would be glad to receive any feedback from core-libs-dev. In CRaC project, java code participates in the preparation of the platform state that can be safely stored to the image. The image can be attempted at any time, so the image may capture unprocessed References. Recently I found cases when objects became unreachable during preparation for the checkpoint, and their associated clean-up actions to close external resources (which we don't allow open when the image is stored). So it's become necessary to ensure as many References as possible are processed before the image is created. As a nice additional feature, restored java instances won't start with the same Reference processing. With the change, the image is not created until VM's queue of pending j.l.References are drained, and then, as an example, each j.l.ref.Cleaner queue is drained, only then the VM is called to prepare the image. More Reference handling threads will be changed like Cleaner's ones. I'm looking for possible problems or general comments about this approach. Thanks, Anton On 1/31/22 14:51, Anton Kozlov wrote: > At the time of checkpoint, a set of References may need handling. This change ensures no References pending in ReferenceHandler and in Cleaners. > > System.gc() is a best effort attempt to make GC to look for References. Default VM flags (-DisableExplicitGC, -ExplicitGCInvokesConcurrent) should not block the call, but additional investigation is needed to make sure GC found all references. > > ------------- > > Commit messages: > - Ensure empty Reference Handler and Cleaners queues > > Changes: https://git.openjdk.java.net/crac/pull/13/files > Webrev: https://webrevs.openjdk.java.net/?repo=crac&pr=13&range=00 > Stats: 126 lines in 5 files changed: 124 ins; 0 del; 2 mod > Patch: https://git.openjdk.java.net/crac/pull/13.diff > Fetch: git fetch https://git.openjdk.java.net/crac pull/13/head:pull/13 > > PR: https://git.openjdk.java.net/crac/pull/13 From akozlov at openjdk.java.net Tue Feb 1 13:13:39 2022 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Tue, 1 Feb 2022 13:13:39 GMT Subject: [crac] Integrated: Ensure empty Reference Handler and Cleaners queues In-Reply-To: References: Message-ID: On Mon, 31 Jan 2022 11:45:25 GMT, Anton Kozlov wrote: > At the time of checkpoint, a set of References may need handling. This change ensures no References pending in ReferenceHandler and in Cleaners. > > System.gc() is a best effort attempt to make GC to look for References. Default VM flags (-DisableExplicitGC, -ExplicitGCInvokesConcurrent) should not block the call, but additional investigation is needed to make sure GC found all references. This pull request has now been integrated. Changeset: 9cf19956 Author: Anton Kozlov URL: https://git.openjdk.java.net/crac/commit/9cf1995693eead85d3807fb4c83ab38c14e27042 Stats: 130 lines in 5 files changed: 128 ins; 0 del; 2 mod Ensure empty Reference Handler and Cleaners queues Reviewed-by: heidinga ------------- PR: https://git.openjdk.java.net/crac/pull/13 From akozlov at openjdk.java.net Tue Feb 1 14:43:08 2022 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Tue, 1 Feb 2022 14:43:08 GMT Subject: [crac] RFR: Clear JarFileFactory cache on checkpoint Message-ID: Even after an URL object referring to a jar file is closed, the JarFile remains cached and open. This change cleans the cache, so the JarFile becomes reclaimable. The only user of the changed class, sun.net.www.protocol.jar.JarURLConnection, does not assume any state of the cache. ------------- Commit messages: - Clear JarFileFactory cache on checkpoint Changes: https://git.openjdk.java.net/crac/pull/14/files Webrev: https://webrevs.openjdk.java.net/?repo=crac&pr=14&range=00 Stats: 84 lines in 3 files changed: 83 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/crac/pull/14.diff Fetch: git fetch https://git.openjdk.java.net/crac pull/14/head:pull/14 PR: https://git.openjdk.java.net/crac/pull/14 From Alan.Bateman at oracle.com Wed Feb 2 14:48:33 2022 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 2 Feb 2022 14:48:33 +0000 Subject: [crac] RFR: Ensure empty Reference Handler and Cleaners queues In-Reply-To: References: Message-ID: <1c2136a0-90c2-cabc-a948-bc4a02f1533b@oracle.com> On 01/02/2022 09:11, Anton Kozlov wrote: > Cross-posting RFR from CRaC Project. The change touches Reference > class, so I > would be glad to receive any feedback from core-libs-dev. > > In CRaC project, java code participates in the preparation of the > platform > state that can be safely stored to the image. The image can be > attempted at any > time, so the image may capture unprocessed References. Recently I > found cases > when objects became unreachable during preparation for the checkpoint, > and > their associated clean-up actions to close external resources (which > we don't > allow open when the image is stored). So it's become necessary to > ensure as > many References as possible are processed before the image is created. > As a > nice additional feature, restored java instances won't start with the > same > Reference processing. > > With the change, the image is not created until VM's queue of pending > j.l.References are drained, and then, as an example, each > j.l.ref.Cleaner queue > is drained, only then the VM is called to prepare the image. More > Reference > handling threads will be changed like Cleaner's ones. I'm looking for > possible > problems or general comments about this approach. At a high level it should be okay to provide a JDK-internal way to await quiescent. You've added it as a public API which might be okay for the current exploration but I don't think it would be exposed in its current form. Once the method returns then there is no guarantee that the number of waiters hasn't changed, but I think you know that. -Alan. From duke at openjdk.java.net Wed Feb 2 20:20:34 2022 From: duke at openjdk.java.net (Larry-N) Date: Wed, 2 Feb 2022 20:20:34 GMT Subject: [crac] RFR: Run native CRaC checks after failed beforeCheckpoint In-Reply-To: References: Message-ID: On Fri, 28 Jan 2022 12:00:01 GMT, Anton Kozlov wrote: > After checkpoint failed at the Java level, it's worth to make a "dry-run" checkpoint at the native state: check file descriptors, process -XX:+CRHeapDumpOnCheckpointException, etc. > > The patch also removes unused parameter of `checkpoint_restore(FdsInfo* fds)` Looks good to me ------------- PR: https://git.openjdk.java.net/crac/pull/11 From akozlov at azul.com Thu Feb 3 10:18:59 2022 From: akozlov at azul.com (Anton Kozlov) Date: Thu, 3 Feb 2022 13:18:59 +0300 Subject: [crac] RFR: Ensure empty Reference Handler and Cleaners queues In-Reply-To: <1c2136a0-90c2-cabc-a948-bc4a02f1533b@oracle.com> References: <1c2136a0-90c2-cabc-a948-bc4a02f1533b@oracle.com> Message-ID: On 2/2/22 17:48, Alan Bateman wrote:> At a high level it should be okay to provide a JDK-internal way to await quiescent. You've added it as a public API which might be okay for the current exploration but I don't think it would be exposed in its current form. Once the method returns then there is no guarantee that the number of waiters hasn't changed, but I think you know that I thought about blocking waiters regardless of References available in the Queue. This would leave threads in the quiescent state but References would pile up in the pendingReferenceQueue exposed by the VM. So I stopped on the method being a synchronization, rather than a blocking point. I hoped to guarantee all Queues are empty by waiting a sufficient number of waiters for each Queue, in the order of Queues passing References between each other (for a single thread). But now even there, I see handling of a Reference later in the order may make another one pending, filling up a Queue that was supposed to be empty. For a strong guarantee that all Queues are empty, some sort of iteration may be required, that will check no Queue had a new reference since the last check. I think a public API is needed as users may have the same problem as we do. But the current code does not support this (we need to allow user code after JDK Queues are emptied). Interesting... Thanks! Anton From duke at openjdk.java.net Thu Feb 3 10:30:39 2022 From: duke at openjdk.java.net (Larry-N) Date: Thu, 3 Feb 2022 10:30:39 GMT Subject: [crac] RFR: Run native CRaC checks after failed beforeCheckpoint In-Reply-To: References: Message-ID: On Fri, 28 Jan 2022 12:00:01 GMT, Anton Kozlov wrote: > After checkpoint failed at the Java level, it's worth to make a "dry-run" checkpoint at the native state: check file descriptors, process -XX:+CRHeapDumpOnCheckpointException, etc. > > The patch also removes unused parameter of `checkpoint_restore(FdsInfo* fds)` Marked as reviewed by Larry-N at github.com (no known OpenJDK username). ------------- PR: https://git.openjdk.java.net/crac/pull/11 From asmehra at redhat.com Fri Feb 4 17:38:47 2022 From: asmehra at redhat.com (Ashutosh Mehra) Date: Fri, 4 Feb 2022 12:38:47 -0500 Subject: Portability of checkpoints? Message-ID: Hi, We are doing some experiments to understand the impact of change in the environment (could be anything like cpu features, number of cpus, cache line size etc) on the functionality of the JVM after it restores from a checkpoint. It is a work in progress and while we continue our investigation, I thought it would be good to document and summarize our findings so far. I have done a write-up [1] describing the problems we faced and our observations. 1. Between machines with the same operating system distribution. The CPU > features set is a good example of this. Also, available memory resources can > change between checkpoint and restore. We'll likely need to change JVM to > handle the difference. Here we have containers -- it's interesting that even > when starting on the same physical machine (same CPU), a container instance > used for the checkpoint and a container for the restore may have different > hard memory limits. > > We are currently looking at this configuration.The write-up focuses on the effects of change in cpu features. Couple of things that stand out: 1. Portability of a checkpoint is not just a JVM problem. The problems that JVM faces may apply to native libraries as well. If that happens a coordinated effort would be needed from all the parties involved. 2. Need for an option that would allow Hotspot to generate portable code at runtime. Feel free to provide feedback/suggestions. [1] http://cr.openjdk.java.net/~heidinga/crac/Portability_of_checkpoints.pdf Thanks, Ashutosh Mehra From avstepan at openjdk.java.net Fri Feb 4 18:14:13 2022 From: avstepan at openjdk.java.net (Alexander Stepanov) Date: Fri, 4 Feb 2022 18:14:13 GMT Subject: [crac] RFR: [TEST] check if some j.l.* methods time out on restore immediately Message-ID: add a test to check if Thread.join(timeout), Thread.sleep(timeout) and Object.wait(timeout) will be completed on restore immediately if their end time fell on the CRaC pause period checked on Ubuntu 20.04 Linux (x86-64), passed ------------- Commit messages: - update copyright - add JoinSleepWaitOnCRPauseTest Changes: https://git.openjdk.java.net/crac/pull/15/files Webrev: https://webrevs.openjdk.java.net/?repo=crac&pr=15&range=00 Stats: 202 lines in 1 file changed: 202 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/crac/pull/15.diff Fetch: git fetch https://git.openjdk.java.net/crac pull/15/head:pull/15 PR: https://git.openjdk.java.net/crac/pull/15 From asmehra at redhat.com Fri Feb 4 18:41:05 2022 From: asmehra at redhat.com (Ashutosh Mehra) Date: Fri, 4 Feb 2022 13:41:05 -0500 Subject: Portability of checkpoints? In-Reply-To: References: Message-ID: For reference, the previous discussion on portability can be seen at: https://mail.openjdk.java.net/pipermail/crac-dev/2021-October/000029.html Thanks, Ashutosh Mehra On Fri, Feb 4, 2022 at 12:38 PM Ashutosh Mehra wrote: > Hi, > > We are doing some experiments to understand the impact of change in the > environment > (could be anything like cpu features, number of cpus, cache line size etc) > on the functionality > of the JVM after it restores from a checkpoint. > > It is a work in progress and while we continue our investigation, I > thought it would be good > to document and summarize our findings so far. > I have done a write-up [1] describing the problems we faced and our > observations. > > 1. Between machines with the same operating system distribution. The CPU >> features set is a good example of this. Also, available memory resources can >> change between checkpoint and restore. We'll likely need to change JVM to >> handle the difference. Here we have containers -- it's interesting that even >> when starting on the same physical machine (same CPU), a container instance >> used for the checkpoint and a container for the restore may have different >> hard memory limits. >> >> > We are currently looking at this configuration.The write-up focuses on the > effects of change in cpu features. > > Couple of things that stand out: > 1. Portability of a checkpoint is not just a JVM problem. The problems > that JVM faces may apply > to native libraries as well. If that happens a coordinated effort would be > needed from all the parties involved. > 2. Need for an option that would allow Hotspot to generate portable code > at runtime. > > Feel free to provide feedback/suggestions. > > [1] > http://cr.openjdk.java.net/~heidinga/crac/Portability_of_checkpoints.pdf > > Thanks, > Ashutosh Mehra > From akozlov at openjdk.java.net Tue Feb 8 13:37:08 2022 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Tue, 8 Feb 2022 13:37:08 GMT Subject: [crac] RFR: Clear JarFileFactory cache on checkpoint [v2] In-Reply-To: References: Message-ID: > Even after an URL object referring to a jar file is closed, the JarFile remains cached and open. This change cleans the cache, so the JarFile becomes reclaimable. The only user of the changed class, sun.net.www.protocol.jar.JarURLConnection, does not assume any state of the cache. Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: Avoid beforeCheckpoint race with JarURLConnection.connect() ------------- Changes: - all: https://git.openjdk.java.net/crac/pull/14/files - new: https://git.openjdk.java.net/crac/pull/14/files/ab5ac701..3960e454 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=crac&pr=14&range=01 - incr: https://webrevs.openjdk.java.net/?repo=crac&pr=14&range=00-01 Stats: 16 lines in 1 file changed: 13 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/crac/pull/14.diff Fetch: git fetch https://git.openjdk.java.net/crac pull/14/head:pull/14 PR: https://git.openjdk.java.net/crac/pull/14 From akozlov at openjdk.java.net Tue Feb 8 14:00:22 2022 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Tue, 8 Feb 2022 14:00:22 GMT Subject: [crac] RFR: Clear JarFileFactory cache on checkpoint [v2] In-Reply-To: References: Message-ID: <7LAB7_yFvSVTGO8SZbi0ysRUWYIsXe-DJNVDhmuPUsk=.1b08b4bf-65f5-4d57-a60f-97120e3ff8f5@github.com> On Tue, 8 Feb 2022 13:37:08 GMT, Anton Kozlov wrote: >> Even after an URL object referring to a jar file is closed, the JarFile remains cached and open. This change cleans the cache, so the JarFile becomes reclaimable. The only user of the changed class, sun.net.www.protocol.jar.JarURLConnection, does not assume any state of the cache. > > Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Avoid beforeCheckpoint race with JarURLConnection.connect() @alexeybakhtin pointed out that JarURLConnection.connect method was not synchronized with the JarFileFactory.beforeCheckpoint, which could lead to NPE due to entry cleaned from the cache while connect is executed. Now beforeCheckpoint detects JarFile entries that should be cleaned by moving them to a weak cache map for a short period of time. ------------- PR: https://git.openjdk.java.net/crac/pull/14 From abakhtin at openjdk.java.net Tue Feb 8 16:07:37 2022 From: abakhtin at openjdk.java.net (Alexey Bakhtin) Date: Tue, 8 Feb 2022 16:07:37 GMT Subject: [crac] RFR: Clear JarFileFactory cache on checkpoint [v2] In-Reply-To: <7LAB7_yFvSVTGO8SZbi0ysRUWYIsXe-DJNVDhmuPUsk=.1b08b4bf-65f5-4d57-a60f-97120e3ff8f5@github.com> References: <7LAB7_yFvSVTGO8SZbi0ysRUWYIsXe-DJNVDhmuPUsk=.1b08b4bf-65f5-4d57-a60f-97120e3ff8f5@github.com> Message-ID: On Tue, 8 Feb 2022 13:57:14 GMT, Anton Kozlov wrote: > @alexeybakhtin pointed out that JarURLConnection.connect method was not synchronized with the JarFileFactory.beforeCheckpoint, which could lead to NPE due to entry cleaned from the cache while connect is executed. Now beforeCheckpoint detects JarFile entries that should be cleaned by moving them to a weak cache map for a short period of time. Thank you. LGTM ------------- PR: https://git.openjdk.java.net/crac/pull/14 From akozlov at azul.com Wed Feb 9 08:07:05 2022 From: akozlov at azul.com (Anton Kozlov) Date: Wed, 9 Feb 2022 11:07:05 +0300 Subject: Portability of checkpoints? In-Reply-To: References: Message-ID: <7b1e4398-4720-1afe-595b-bad9c4241377@azul.com> On 2/4/22 20:38, Ashutosh Mehra wrote: > I have done a write-up [1] describing the problems we faced and our > observations. Very nice write-up, thank you. > 1. Portability of a checkpoint is not just a JVM problem. The problems that > JVM faces may apply to native libraries as well. I like the separation of the native code on the "lower" level (libc, CRIU) and the "higher" one above the JVM (JNI code). For the JNI code, what are our options as platform developers? Applications usually provide separate builds of JNI libraries for different operating systems or flavors like glibc/musl. Assuming that such kind of burden is acceptable, maybe the portability is also OK? Hopefully, not so much native code cache CPU features, and also hard to fix like glibc. Regarding libc, is it possible to avoid calls like memset that cache CPU features? For JVM implementation, we may use our own implementation for standard functions when the latter is inadequate. Hotspot has enough memory operations implemented, that are also optimized for the available CPU feature set, probably better than the glibc. We already have jio_snprintf -- a portable replacement for a standard function. It's a different question if memset, etc are used by the JNI code or implicitly. As a wild thought, is it a good idea to provide a library in LD_PRELOAD that routes some standard functions to JVM's implementation? Just as a workaround until glibc is fixed. > 2. Need for an option that would allow Hotspot to generate portable code at > runtime. I think an option should accept a set of the target CPU feature set like a C compiler has a parameter for the target CPU. It could be a raw CPUID value or textual representation. So it will give users the power to choose between portability and performance. Thanks, Anton From abakhtin at openjdk.java.net Thu Feb 10 14:56:42 2022 From: abakhtin at openjdk.java.net (Alexey Bakhtin) Date: Thu, 10 Feb 2022 14:56:42 GMT Subject: [crac] RFR: Clear JarFileFactory cache on checkpoint [v2] In-Reply-To: References: Message-ID: On Tue, 8 Feb 2022 13:37:08 GMT, Anton Kozlov wrote: >> Even after an URL object referring to a jar file is closed, the JarFile remains cached and open. This change cleans the cache, so the JarFile becomes reclaimable. The only user of the changed class, sun.net.www.protocol.jar.JarURLConnection, does not assume any state of the cache. > > Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Avoid beforeCheckpoint race with JarURLConnection.connect() Marked as reviewed by abakhtin (no project role). ------------- PR: https://git.openjdk.java.net/crac/pull/14 From akozlov at openjdk.java.net Thu Feb 10 15:06:45 2022 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Thu, 10 Feb 2022 15:06:45 GMT Subject: [crac] RFR: Run native CRaC checks after failed beforeCheckpoint In-Reply-To: References: Message-ID: On Fri, 28 Jan 2022 12:00:01 GMT, Anton Kozlov wrote: > After checkpoint failed at the Java level, it's worth to make a "dry-run" checkpoint at the native state: check file descriptors, process -XX:+CRHeapDumpOnCheckpointException, etc. > > The patch also removes unused parameter of `checkpoint_restore(FdsInfo* fds)` Thanks! ------------- PR: https://git.openjdk.java.net/crac/pull/11 From akozlov at openjdk.java.net Thu Feb 10 15:07:46 2022 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Thu, 10 Feb 2022 15:07:46 GMT Subject: [crac] RFR: Clear JarFileFactory cache on checkpoint [v2] In-Reply-To: References: Message-ID: On Tue, 8 Feb 2022 13:37:08 GMT, Anton Kozlov wrote: >> Even after an URL object referring to a jar file is closed, the JarFile remains cached and open. This change cleans the cache, so the JarFile becomes reclaimable. The only user of the changed class, sun.net.www.protocol.jar.JarURLConnection, does not assume any state of the cache. > > Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Avoid beforeCheckpoint race with JarURLConnection.connect() Thanks for review! ------------- PR: https://git.openjdk.java.net/crac/pull/14 From akozlov at openjdk.java.net Thu Feb 10 15:07:48 2022 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Thu, 10 Feb 2022 15:07:48 GMT Subject: [crac] Integrated: Clear JarFileFactory cache on checkpoint In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 14:36:05 GMT, Anton Kozlov wrote: > Even after an URL object referring to a jar file is closed, the JarFile remains cached and open. This change cleans the cache, so the JarFile becomes reclaimable. The only user of the changed class, sun.net.www.protocol.jar.JarURLConnection, does not assume any state of the cache. This pull request has now been integrated. Changeset: e3f6d11c Author: Anton Kozlov URL: https://git.openjdk.java.net/crac/commit/e3f6d11cc913456997685a6b005dbf7540e2d8e0 Stats: 96 lines in 3 files changed: 95 ins; 0 del; 1 mod Clear JarFileFactory cache on checkpoint Reviewed-by: abakhtin ------------- PR: https://git.openjdk.java.net/crac/pull/14 From akozlov at openjdk.java.net Thu Feb 10 15:09:38 2022 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Thu, 10 Feb 2022 15:09:38 GMT Subject: [crac] Integrated: Run native CRaC checks after failed beforeCheckpoint In-Reply-To: References: Message-ID: On Fri, 28 Jan 2022 12:00:01 GMT, Anton Kozlov wrote: > After checkpoint failed at the Java level, it's worth to make a "dry-run" checkpoint at the native state: check file descriptors, process -XX:+CRHeapDumpOnCheckpointException, etc. > > The patch also removes unused parameter of `checkpoint_restore(FdsInfo* fds)` This pull request has now been integrated. Changeset: aa3f8050 Author: Anton Kozlov URL: https://git.openjdk.java.net/crac/commit/aa3f805055e71694df07dd2f4a0912dc58c96646 Stats: 134 lines in 7 files changed: 90 ins; 11 del; 33 mod Run native CRaC checks after failed beforeCheckpoint Reviewed-by: inakonechnyy ------------- PR: https://git.openjdk.java.net/crac/pull/11 From akozlov at azul.com Thu Feb 10 16:43:03 2022 From: akozlov at azul.com (Anton Kozlov) Date: Thu, 10 Feb 2022 19:43:03 +0300 Subject: CRaC EA build Message-ID: Hi, After integration of the few recent changes, I would like to do another EA build. It will be "17-crac+2" based on revision [1]. The build will be hosted on the same infra as the previous one [2]. I don't think there will be objections, but in this case please provide your feedback in the next day or two. Anyway, subsequent builds are possible shortly after, please request if you need one. Thanks, Anton [1] https://github.com/openjdk/crac/commit/aa3f805055e71694df07dd2f4a0912dc58c96646 [2] https://github.com/CRaC/openjdk-builds/releases/tag/17-crac%2B1 From akozlov at openjdk.java.net Fri Feb 11 12:03:04 2022 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 11 Feb 2022 12:03:04 GMT Subject: [crac] RFR: Provide arguments for restore Message-ID: This change adds a new API and implementation to receive a new set of command-line arguments in the restored Java instance. The supplied demo code shows a faster replacement for `javac`. The current implementation obligates the first argument of the new set not to start with the dash, otherwise, the java launcher will interpret it as its own parameter. So the first argument should be a "verb" similar to the Main class. ------------- Commit messages: - Provide arguments for restore Changes: https://git.openjdk.java.net/crac/pull/16/files Webrev: https://webrevs.openjdk.java.net/?repo=crac&pr=16&range=00 Stats: 164 lines in 7 files changed: 135 ins; 10 del; 19 mod Patch: https://git.openjdk.java.net/crac/pull/16.diff Fetch: git fetch https://git.openjdk.java.net/crac pull/16/head:pull/16 PR: https://git.openjdk.java.net/crac/pull/16 From akozlov at openjdk.java.net Fri Feb 11 12:03:04 2022 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 11 Feb 2022 12:03:04 GMT Subject: [crac] RFR: Provide arguments for restore In-Reply-To: References: Message-ID: On Fri, 11 Feb 2022 11:56:10 GMT, Anton Kozlov wrote: > This change adds a new API and implementation to receive a new set of command-line arguments in the restored Java instance. The supplied demo code shows a faster replacement for `javac`. > > The current implementation obligates the first argument of the new set not to start with the dash, otherwise, the java launcher will interpret it as its own parameter. So the first argument should be a "verb" similar to the Main class. The workflow for checkpoint could be: $BUILDDIR/images/jdk/bin/java -XX:CRaCCheckpointTo=$CRDIR -Xshare:off # disable CDS for faster start after restore -XX:-UsePerfData # disable jps for -jar $BUILDDIR/images/jdk/demo/crac/JavaCompilerCRaC/JavaCompilerCRaC.jar # provide a set of warm-up workloads -nowarn -d tmp @${BUILDDIR}/jdk/modules/java.base/_the.java.base_batch.filelist -- -nowarn -XDignore.symbol.file=true -d tmp @${BUILDDIR}/jdk/modules/java.desktop/_the.java.desktop_batch.filelist -- -nowarn -XDignore.symbol.file=true -d tmp @${BUILDDIR}/jdk/modules/java.xml/_the.java.xml_batch.filelist And after that, compile anything by $BUILDDIR/images/jdk/bin/java -XX:CRaCRestoreFrom=$CRDIR Compile # does not mean anything "${TARGET_ARGS[@]}" Or more concrete: $ time jdk/bin/java -XX:CRaCRestoreFrom=./cr Compile HelloWorld.java real 0m0.124s user 0m0.269s sys 0m0.075s $ time jdk/bin/javac HelloWorld.java real 0m0.380s user 0m0.817s sys 0m0.072s ------------- PR: https://git.openjdk.java.net/crac/pull/16 From akozlov at openjdk.java.net Fri Feb 11 12:20:01 2022 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 11 Feb 2022 12:20:01 GMT Subject: [crac] RFR: Provide arguments for restore [v2] In-Reply-To: References: Message-ID: > This change adds a new API and implementation to receive a new set of command-line arguments in the restored Java instance. The supplied demo code shows a faster replacement for `javac`. > > The current implementation obligates the first argument of the new set not to start with the dash, otherwise, the java launcher will interpret it as its own parameter. So the first argument should be a "verb" similar to the Main class. Anton Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge remote-tracking branch 'jdk/crac/crac' into args - Provide arguments for restore ------------- Changes: https://git.openjdk.java.net/crac/pull/16/files Webrev: https://webrevs.openjdk.java.net/?repo=crac&pr=16&range=01 Stats: 166 lines in 7 files changed: 137 ins; 10 del; 19 mod Patch: https://git.openjdk.java.net/crac/pull/16.diff Fetch: git fetch https://git.openjdk.java.net/crac pull/16/head:pull/16 PR: https://git.openjdk.java.net/crac/pull/16 From alexander.smirnoff at gmail.com Mon Feb 21 14:04:15 2022 From: alexander.smirnoff at gmail.com (Alexander Smirnov) Date: Mon, 21 Feb 2022 14:04:15 +0000 Subject: Release timeline Message-ID: Hi folks, I was looking into CRaC, mostly reading docs and trying to understand how it works. It looks to be a promising technology and a future industry standard for checkpointing and restoring Java applications. However, it is not clear what is the timeline for the project. Will it be a part of the next major OpenJDK release? Will it be backported to previous OpenJDK releases (doubt it is possible though) I apologize if this information was already available, I've done my best to research it Thank you, Alexander From akozlov at azul.com Mon Feb 21 16:53:21 2022 From: akozlov at azul.com (Anton Kozlov) Date: Mon, 21 Feb 2022 19:53:21 +0300 Subject: Release timeline In-Reply-To: References: Message-ID: <549b5275-6160-a345-5816-b730756f6fc4@azul.com> Hi, On 2/21/22 17:04, Alexander Smirnov wrote: > However, it is not clear what is the timeline for the project. Will it be a > part of the next major OpenJDK release? Will it be backported to previous > OpenJDK releases (doubt it is possible though) Sorry, there could be no commitment for the target OpenJDK release. It won't be JDK 18, definitely :) Backports are possible in the more mature phase. Although it won't be possible to extend the official API for already released JDKs, with some burden of accessing internal details, an application could reach an internal one via reflection or with org.crac[1]. Backports could be possible even now, but it will require some discipline in backporting patches to keep parity in features. We can experiment with this, having enough demand and resources. Thanks, Anton [1] https://github.com/CRaC/org.crac