From philip.race at oracle.com Wed Jul 7 13:24:03 2021 From: philip.race at oracle.com (Philip Race) Date: Wed, 7 Jul 2021 06:24:03 -0700 Subject: Call for Discussion : New Project to support the Wayland display server on Linux Message-ID: <31d8ebcf-6c63-fdb9-e8e8-88573975dd2f@oracle.com> For a number of years the Linux community has been working on a complete replacement for the 1980's era X11 desktop display server protocol with new protocols and libraries that support client-side rendering and a compositing desktop windowing system. This work is being done under the auspices of the Wayland project [1] and there is a reference implementation of a Wayland compositor called "Weston". A new client written for? the Wayland desktop has no dependency at all on X11, but Wayland also provides a compatibility mode where the X.org X11 server runs along side Wayland, so that thousands of X11 applications can continue to run. Presently all distros that ship the Wayland server, also still ship the pure X11 server and the user can select which one to use on the login screen. However there will come a time when Wayland is the only choice and already Wayland is the default on RHEL 8, OL 8, Ubuntu 21.04 and I am sure others too. At that time Java for Linux will "mostly" run via the X11 compatibility layer, but would not pass the JCK, since some important APIs,? notably those of the java.awt.Robot class [2] will fail with Wayland. We need to solve this so that pure Java and applications which mix Java and X11 APis can work. But even then this would mean Java on Linux is not a first class desktop citizen, which is not desirable for the long term, given the importance of Linux to many Java developers as well as to active individual and corporate contributors to the JDK project. Indeed there have already been informal discussions for some time with various parties that have expressed interest in helping towards the outcome of supporting Wayland Consequently we expect quite shortly to propose an OpenJDK project that will consider the goals of - a short to medium term solution for JDK running on Wayland in X11 compatibility mode - a medium to long term solution for JDK running as a native Wayland client. There are some unknowns and questions, such as what are the options for supporting Robot ? What other support is missing ? What platform APIs should the implementation? use ? How does a native Wayland solution interoperate with OpenJFX ? Comments, expressions of interest etc are welcome. -Phil Race, for the Java client groups. [1] : https://wayland.freedesktop.org/ [2] : https://docs.oracle.com/en/java/javase/11/docs/api/java.desktop/java/awt/Robot.html From johan at lodgon.com Wed Jul 7 16:19:23 2021 From: johan at lodgon.com (Johan Vos) Date: Wed, 7 Jul 2021 18:19:23 +0200 Subject: Call for Discussion : New Project to support the Wayland display server on Linux In-Reply-To: <31d8ebcf-6c63-fdb9-e8e8-88573975dd2f@oracle.com> References: <31d8ebcf-6c63-fdb9-e8e8-88573975dd2f@oracle.com> Message-ID: Thanks for starting this discussion, and for the great description of the landscape and evolutions. I've been looking at Wayland in the past, and as far as JavaFX concerns, I believe there are only a small amount of changes needed in glass (leveraging the fact that GDK can use Wayland as a backend), and then a wayland version of the prism-es2 pipeline which should be relative small as well. - Johan Op wo 7 jul. 2021 om 15:28 schreef Philip Race : > > For a number of years the Linux community has been working on a complete > replacement for the 1980's era X11 desktop display server protocol > with new protocols and libraries that support client-side rendering and > a compositing desktop windowing system. > This work is being done under the auspices of the Wayland project [1] > and there is a reference > implementation of a Wayland compositor called "Weston". > > A new client written for the Wayland desktop has no dependency at all > on X11, but Wayland also provides a compatibility > mode where the X.org X11 server runs along side Wayland, so that > thousands of X11 applications can continue to run. > > Presently all distros that ship the Wayland server, also still ship the > pure X11 server and the user can select > which one to use on the login screen. However there will come a time > when Wayland is the only choice and > already Wayland is the default on RHEL 8, OL 8, Ubuntu 21.04 and I am > sure others too. > > At that time Java for Linux will "mostly" run via the X11 compatibility > layer, but would not pass the JCK, > since some important APIs, notably those of the java.awt.Robot class > [2] will fail with Wayland. > We need to solve this so that pure Java and applications which mix Java > and X11 APis can work. > > But even then this would mean Java on Linux is not a first class desktop > citizen, which is not desirable for > the long term, given the importance of Linux to many Java developers as > well as to active > individual and corporate contributors to the JDK project. > > Indeed there have already been informal discussions for some time with > various parties that have expressed > interest in helping towards the outcome of supporting Wayland > > Consequently we expect quite shortly to propose an OpenJDK project that > will consider the goals of > - a short to medium term solution for JDK running on Wayland in X11 > compatibility mode > - a medium to long term solution for JDK running as a native Wayland > client. > > There are some unknowns and questions, such as what are the options for > supporting Robot ? > What other support is missing ? What platform APIs should the > implementation use ? > How does a native Wayland solution interoperate with OpenJFX ? > > > Comments, expressions of interest etc are welcome. > > > -Phil Race, for the Java client groups. > > > [1] : https://wayland.freedesktop.org/ > [2] : > > https://docs.oracle.com/en/java/javase/11/docs/api/java.desktop/java/awt/Robot.html > > From philip.race at oracle.com Wed Jul 7 16:56:08 2021 From: philip.race at oracle.com (Philip Race) Date: Wed, 7 Jul 2021 09:56:08 -0700 Subject: Call for Discussion : New Project to support the Wayland display server on Linux In-Reply-To: References: <31d8ebcf-6c63-fdb9-e8e8-88573975dd2f@oracle.com> Message-ID: <56ae8b21-0059-8e40-d3e7-15f9848ebc14@oracle.com> Directly rendering using EGL is something that is an option for the 2D code too, but it has no ES2 pipeline like FX does. It is tightly integrated with XGL.? We should consider the other options too of course. And AWT has no dependency on GTK / GDK. GTK is used only for the theming support for the Swing GRK L&F So there's a lot to do on the client-libs side. -phil. On 7/7/21 9:19 AM, Johan Vos wrote: > Thanks for starting this discussion, and for the great description of > the landscape and evolutions. > I've been looking at Wayland in the past, and as far as JavaFX > concerns, I believe there are only a small amount of changes needed in > glass (leveraging the fact that GDK can use Wayland as a backend), and > then a wayland version of the prism-es2 pipeline which?should be > relative small as well. > > - Johan > > Op wo 7 jul. 2021 om 15:28 schreef Philip Race >: > > > For a number of years the Linux community has been working on a > complete > replacement for the 1980's era X11 desktop display server protocol > with new protocols and libraries that support client-side > rendering and > a compositing desktop windowing system. > This work is being done under the auspices of the Wayland project [1] > and there is a reference > implementation of a Wayland compositor called "Weston". > > A new client written for? the Wayland desktop has no dependency at > all > on X11, but Wayland also provides a compatibility > mode where the X.org X11 server runs along side Wayland, so that > thousands of X11 applications can continue to run. > > Presently all distros that ship the Wayland server, also still > ship the > pure X11 server and the user can select > which one to use on the login screen. However there will come a time > when Wayland is the only choice and > already Wayland is the default on RHEL 8, OL 8, Ubuntu 21.04 and I am > sure others too. > > At that time Java for Linux will "mostly" run via the X11 > compatibility > layer, but would not pass the JCK, > since some important APIs,? notably those of the java.awt.Robot class > [2] will fail with Wayland. > We need to solve this so that pure Java and applications which mix > Java > and X11 APis can work. > > But even then this would mean Java on Linux is not a first class > desktop > citizen, which is not desirable for > the long term, given the importance of Linux to many Java > developers as > well as to active > individual and corporate contributors to the JDK project. > > Indeed there have already been informal discussions for some time > with > various parties that have expressed > interest in helping towards the outcome of supporting Wayland > > Consequently we expect quite shortly to propose an OpenJDK project > that > will consider the goals of > - a short to medium term solution for JDK running on Wayland in X11 > compatibility mode > - a medium to long term solution for JDK running as a native Wayland > client. > > There are some unknowns and questions, such as what are the > options for > supporting Robot ? > What other support is missing ? What platform APIs should the > implementation? use ? > How does a native Wayland solution interoperate with OpenJFX ? > > > Comments, expressions of interest etc are welcome. > > > -Phil Race, for the Java client groups. > > > [1] : https://wayland.freedesktop.org/ > > [2] : > https://docs.oracle.com/en/java/javase/11/docs/api/java.desktop/java/awt/Robot.html > > > From kevin.rushforth at oracle.com Wed Jul 7 17:07:44 2021 From: kevin.rushforth at oracle.com (Kevin Rushforth) Date: Wed, 7 Jul 2021 10:07:44 -0700 Subject: Call for Discussion : New Project to support the Wayland display server on Linux In-Reply-To: <56ae8b21-0059-8e40-d3e7-15f9848ebc14@oracle.com> References: <31d8ebcf-6c63-fdb9-e8e8-88573975dd2f@oracle.com> <56ae8b21-0059-8e40-d3e7-15f9848ebc14@oracle.com> Message-ID: Yes, given that most of the JavaFX windowing code on Linux uses GDK / GTK with a fairly small amoun of Xlib, a native Wayland port for JavaFX should be less work than for AWT. Doing a Wayland version of the "prism-es2" pipeline for Linux should be straight-forward, as Johan mentions. As with AWT, the first step is to get JavaFX running in XWayland mode. Some of the common concerns that AWT and JavaFX have is around how to support Robot (which will carry over to a native Wayland port). -- Kevin On 7/7/2021 9:56 AM, Philip Race wrote: > Directly rendering using EGL is something that is an option for the 2D > code too, but it has no ES2 pipeline like FX does. > It is tightly integrated with XGL.? We should consider the other > options too of course. > And AWT has no dependency on GTK / GDK. GTK is used only for the > theming support for the Swing GRK L&F > > So there's a lot to do on the client-libs side. > > -phil. > > On 7/7/21 9:19 AM, Johan Vos wrote: >> Thanks for starting this discussion, and for the great description of >> the landscape and evolutions. >> I've been looking at Wayland in the past, and as far as JavaFX >> concerns, I believe there are only a small amount of changes needed >> in glass (leveraging the fact that GDK can use Wayland as a backend), >> and then a wayland version of the prism-es2 pipeline which?should be >> relative small as well. >> >> - Johan >> >> Op wo 7 jul. 2021 om 15:28 schreef Philip Race >> >: >> >> >> ??? For a number of years the Linux community has been working on a >> ??? complete >> ??? replacement for the 1980's era X11 desktop display server protocol >> ??? with new protocols and libraries that support client-side >> ??? rendering and >> ??? a compositing desktop windowing system. >> ??? This work is being done under the auspices of the Wayland project >> [1] >> ??? and there is a reference >> ??? implementation of a Wayland compositor called "Weston". >> >> ??? A new client written for? the Wayland desktop has no dependency at >> ??? all >> ??? on X11, but Wayland also provides a compatibility >> ??? mode where the X.org X11 server runs along side Wayland, so that >> ??? thousands of X11 applications can continue to run. >> >> ??? Presently all distros that ship the Wayland server, also still >> ??? ship the >> ??? pure X11 server and the user can select >> ??? which one to use on the login screen. However there will come a time >> ??? when Wayland is the only choice and >> ??? already Wayland is the default on RHEL 8, OL 8, Ubuntu 21.04 and >> I am >> ??? sure others too. >> >> ??? At that time Java for Linux will "mostly" run via the X11 >> ??? compatibility >> ??? layer, but would not pass the JCK, >> ??? since some important APIs,? notably those of the java.awt.Robot >> class >> ??? [2] will fail with Wayland. >> ??? We need to solve this so that pure Java and applications which mix >> ??? Java >> ??? and X11 APis can work. >> >> ??? But even then this would mean Java on Linux is not a first class >> ??? desktop >> ??? citizen, which is not desirable for >> ??? the long term, given the importance of Linux to many Java >> ??? developers as >> ??? well as to active >> ??? individual and corporate contributors to the JDK project. >> >> ??? Indeed there have already been informal discussions for some time >> ??? with >> ??? various parties that have expressed >> ??? interest in helping towards the outcome of supporting Wayland >> >> ??? Consequently we expect quite shortly to propose an OpenJDK project >> ??? that >> ??? will consider the goals of >> ??? - a short to medium term solution for JDK running on Wayland in X11 >> ??? compatibility mode >> ??? - a medium to long term solution for JDK running as a native Wayland >> ??? client. >> >> ??? There are some unknowns and questions, such as what are the >> ??? options for >> ??? supporting Robot ? >> ??? What other support is missing ? What platform APIs should the >> ??? implementation? use ? >> ??? How does a native Wayland solution interoperate with OpenJFX ? >> >> >> ??? Comments, expressions of interest etc are welcome. >> >> >> ??? -Phil Race, for the Java client groups. >> >> >> ??? [1] : https://wayland.freedesktop.org/ >> ??? >> ??? [2] : >> https://docs.oracle.com/en/java/javase/11/docs/api/java.desktop/java/awt/Robot.html >> >> >> > From alexander.zvegintsev at oracle.com Wed Jul 7 21:52:21 2021 From: alexander.zvegintsev at oracle.com (Alexander Zvegintsev) Date: Wed, 7 Jul 2021 23:52:21 +0200 Subject: Call for Discussion : New Project to support the Wayland display server on Linux In-Reply-To: References: <31d8ebcf-6c63-fdb9-e8e8-88573975dd2f@oracle.com> Message-ID: (adding awt-dev) Let me add a few comments. > already Wayland is the default on RHEL 8, OL 8, Ubuntu 21.04 and I am > sure others too. It is the default, but not in every case. Wayland may be turned off deliberately if you are using a Nvidia graphics card, so you need to take extra steps to get it working . I faced this issue on Ubuntu 21.04. However things getting better with Nvidia 470 drivers, its beta released on 2021.6.22. So probably it won't be a problem in the near future. > Consequently we expect quite shortly to propose an OpenJDK project > that will consider the goals of > - a short to medium term solution for JDK running on Wayland in X11 > compatibility mode > - a medium to long term solution for JDK running as a native Wayland > client. Both goals having a common task: we will need to implement java.awt.Robot functionality for Wayland(at least). So it makes sense to make XToolkit's java.awt.Robot work under Wayland first, and then reuse this code for native Wayland client. I see two major tasks here: taking screenshots and mouse/keyboard control. As far as I know there is no standard way to implement they across all display servers yet. Possible ways to implement this: * taking screenshot: o open issue against Wayland, may be resolved someday:https://gitlab.freedesktop.org/wayland/wayland/-/issues/32 o https://flatpak.github.io/xdg-desktop-portal/portal-docs.html#gdbus-org.freedesktop.portal.Screenshot o via DBUS interface(display server dependent), e.g.https://github.com/flameshot-org/flameshot/blob/master/src/utils/screengrabber.cpp#L43 * generating key/mouse events: o https://flatpak.github.io/xdg-desktop-portal/portal-docs.html#gdbus-org.freedesktop.impl.portal.RemoteDesktop o generating new virtual input device, uinput, superuser privileges required, looks too flaky https://unix.stackexchange.com/questions/422698/how-to-set-absolute-mouse-cursor-position-in-wayland-without-using-mouse This still need more investigation, butxdg-desktop-portal looks more preferable way for now. Please see below some caveats for OpenJDK native Wayland client: You can't control a position for a top-level window. This will also affect a splashscreen windows. It is still possible to control position under XWayland though. Looks like we will just accept it. Top-level window decorations. Initially Wayland had only client-side decorations(you have to draw it by yourself). As of now server-side decorations are available by XDG-Decoration protocol . However server-side window decorations are not mandatory and not all compositors are supporting it, e.g. Mutter(Gnome Shell's compositor). Gnome Shell is the default on Ubuntu, so we will need to provide window decorations somehow. One of possible solutions is to use Gtk to create a window for us. -- Thanks, Alexander. On 7/7/21 6:24 AM, Philip Race wrote: > > For a number of years the Linux community has been working on a > complete replacement for the 1980's era X11 desktop display server > protocol > with new protocols and libraries that support client-side rendering > and a compositing desktop windowing system. > This work is being done under the auspices of the Wayland project [1] > and there is a reference > implementation of a Wayland compositor called "Weston". > > A new client written for? the Wayland desktop has no dependency at all > on X11, but Wayland also provides a compatibility > mode where the X.org X11 server runs along side Wayland, so that > thousands of X11 applications can continue to run. > > Presently all distros that ship the Wayland server, also still ship > the pure X11 server and the user can select > which one to use on the login screen. However there will come a time > when Wayland is the only choice and > already Wayland is the default on RHEL 8, OL 8, Ubuntu 21.04 and I am > sure others too. > > At that time Java for Linux will "mostly" run via the X11 > compatibility layer, but would not pass the JCK, > since some important APIs,? notably those of the java.awt.Robot class > [2] will fail with Wayland. > We need to solve this so that pure Java and applications which mix > Java and X11 APis can work. > > But even then this would mean Java on Linux is not a first class > desktop citizen, which is not desirable for > the long term, given the importance of Linux to many Java developers > as well as to active > individual and corporate contributors to the JDK project. > > Indeed there have already been informal discussions for some time with > various parties that have expressed > interest in helping towards the outcome of supporting Wayland > > Consequently we expect quite shortly to propose an OpenJDK project > that will consider the goals of > - a short to medium term solution for JDK running on Wayland in X11 > compatibility mode > - a medium to long term solution for JDK running as a native Wayland > client. > > There are some unknowns and questions, such as what are the options > for supporting Robot ? > What other support is missing ? What platform APIs should the > implementation? use ? > How does a native Wayland solution interoperate with OpenJFX ? > > > Comments, expressions of interest etc are welcome. > > > -Phil Race, for the Java client groups. > > > [1] : https://wayland.freedesktop.org/ > [2] : > https://docs.oracle.com/en/java/javase/11/docs/api/java.desktop/java/awt/Robot.html From ben at netzgut.net Thu Jul 8 15:16:16 2021 From: ben at netzgut.net (Ben Weidig) Date: Thu, 8 Jul 2021 17:16:16 +0200 Subject: Introducing myself Message-ID: Hello! I'm new to the list and wanted to introduce myself and what I'm trying to contribute to the OpenJDK. Almost two decades ago, I started my professional career as a dotnet developer at an international clinical research organization, learning the ropes of "professional software development" and best practices. For the last eight years, I'm running my own company, and Java became my primary language. My first "real" contact with open source contributions was last year when I became a committer for Apache Tapestry. It's great to give back to the community and not just "consuming" open source. Right now, I'm working on a book about a more functional approach to Java. That lead me to read *a lot* of JDK source code to understand better how certain types and features work. And I believe I found some parts that could be improved, and that's what I want to try to contribute. For example, there's no feature-parity between the functional interfaces, like regarding composition (Function has 'compose' and 'andThen', other related types only 'andThen'). Suppliers aren't composable at all. And the specialized types for primitives differ even more. Another group of types that piqued my interest are the Optional-types. The primitive specializations lack multiple methods, like 'filter', 'flatMap', 'map', and 'or', compared to Optional. Also, a 'boxed()' method might increase the interoperability with other Optional-based code. The next steps for me will be reading up on the general process to contribute and trying to get involved in the community. Cheers, Ben From neugens.limasoftware at gmail.com Thu Jul 8 16:07:10 2021 From: neugens.limasoftware at gmail.com (Mario Torre) Date: Thu, 8 Jul 2021 18:07:10 +0200 Subject: Call for Discussion : New Project to support the Wayland display server on Linux In-Reply-To: References: <31d8ebcf-6c63-fdb9-e8e8-88573975dd2f@oracle.com> Message-ID: <72966394-A115-42CD-8E8E-C5B23A049DB2@gmail.com> I tend to agree with this, we certainly need to identify the underlying APIs that are problematic (screenshot and mouse/keyboard control as far as I am aware), then see what we can do from there, and I would like to first address the XWayland use case before turning into a fully fledged new toolkit implementation (but I agree with Phil that this is the long term necessity). What I would like to do however is to see if it makes sense to create a GTK toolkit rather than a Wayland one, and let GTK deal with every abstraction. We can be pretty confident that GTK will always be there, and will work on X11 and Wayland transparently, so even a user on a pure KDE desktop won?t really need to deal directly with Wayland. Ideally, users with pure KDE distributions (OpenSuSE maybe?) may help here analyse what the requirements are. One problem that comes to mind when using GTK as a toolkit is the interaction with other GTK based applications that may be embedded, i.e. Swing views in Eclipse, Firefox based web views, etc.. etc? But I think this is already today an issue and is dealt with in the current 2d and JFX code already. Cheers, Mario > On 7. Jul 2021, at 23:52, Alexander Zvegintsev wrote: > > (adding awt-dev) > > Let me add a few comments. > >> already Wayland is the default on RHEL 8, OL 8, Ubuntu 21.04 and I am sure others too. > It is the default, but not in every case. Wayland may be turned off deliberately if you are using a Nvidia graphics card, so you need to take extra steps to get it working . > > I faced this issue on Ubuntu 21.04. > > However things getting better with Nvidia 470 drivers, its beta released on 2021.6.22. So probably it won't be a problem in the near future. > >> Consequently we expect quite shortly to propose an OpenJDK project that will consider the goals of >> - a short to medium term solution for JDK running on Wayland in X11 compatibility mode >> - a medium to long term solution for JDK running as a native Wayland client. > > Both goals having a common task: we will need to implement java.awt.Robot functionality for Wayland(at least). > > So it makes sense to make XToolkit's java.awt.Robot work under Wayland first, and then reuse this code for native Wayland client. > > I see two major tasks here: taking screenshots and mouse/keyboard control. As far as I know there is no standard way to implement they across all display servers yet. > > Possible ways to implement this: > > * taking screenshot: > o open issue against Wayland, may be resolved > someday:https://gitlab.freedesktop.org/wayland/wayland/-/issues/32 > > o https://flatpak.github.io/xdg-desktop-portal/portal-docs.html#gdbus-org.freedesktop.portal.Screenshot > > o via DBUS interface(display server dependent), > e.g.https://github.com/flameshot-org/flameshot/blob/master/src/utils/screengrabber.cpp#L43 > > * generating key/mouse events: > o https://flatpak.github.io/xdg-desktop-portal/portal-docs.html#gdbus-org.freedesktop.impl.portal.RemoteDesktop > > o generating new virtual input device, uinput, superuser > privileges required, looks too flaky > https://unix.stackexchange.com/questions/422698/how-to-set-absolute-mouse-cursor-position-in-wayland-without-using-mouse > > > This still need more investigation, butxdg-desktop-portal looks more preferable way for now. > > > Please see below some caveats for OpenJDK native Wayland client: > > You can't control a position for a top-level window. > > This will also affect a splashscreen windows. It is still possible to control position under XWayland though. > > Looks like we will just accept it. > > > Top-level window decorations. > > Initially Wayland had only client-side decorations(you have to draw it by yourself). > > As of now server-side decorations are available by XDG-Decoration protocol . > > However server-side window decorations are not mandatory and not all compositors are supporting it, e.g. Mutter(Gnome Shell's compositor). > > Gnome Shell is the default on Ubuntu, so we will need to provide window decorations somehow. One of possible solutions is to use Gtk to create a window for us. > > > -- > Thanks, > Alexander. > > On 7/7/21 6:24 AM, Philip Race wrote: >> >> For a number of years the Linux community has been working on a complete replacement for the 1980's era X11 desktop display server protocol >> with new protocols and libraries that support client-side rendering and a compositing desktop windowing system. >> This work is being done under the auspices of the Wayland project [1] and there is a reference >> implementation of a Wayland compositor called "Weston". >> >> A new client written for the Wayland desktop has no dependency at all on X11, but Wayland also provides a compatibility >> mode where the X.org X11 server runs along side Wayland, so that thousands of X11 applications can continue to run. >> >> Presently all distros that ship the Wayland server, also still ship the pure X11 server and the user can select >> which one to use on the login screen. However there will come a time when Wayland is the only choice and >> already Wayland is the default on RHEL 8, OL 8, Ubuntu 21.04 and I am sure others too. >> >> At that time Java for Linux will "mostly" run via the X11 compatibility layer, but would not pass the JCK, >> since some important APIs, notably those of the java.awt.Robot class [2] will fail with Wayland. >> We need to solve this so that pure Java and applications which mix Java and X11 APis can work. >> >> But even then this would mean Java on Linux is not a first class desktop citizen, which is not desirable for >> the long term, given the importance of Linux to many Java developers as well as to active >> individual and corporate contributors to the JDK project. >> >> Indeed there have already been informal discussions for some time with various parties that have expressed >> interest in helping towards the outcome of supporting Wayland >> >> Consequently we expect quite shortly to propose an OpenJDK project that will consider the goals of >> - a short to medium term solution for JDK running on Wayland in X11 compatibility mode >> - a medium to long term solution for JDK running as a native Wayland client. >> >> There are some unknowns and questions, such as what are the options for supporting Robot ? >> What other support is missing ? What platform APIs should the implementation use ? >> How does a native Wayland solution interoperate with OpenJFX ? >> >> >> Comments, expressions of interest etc are welcome. >> >> >> -Phil Race, for the Java client groups. >> >> >> [1] : https://wayland.freedesktop.org/ >> [2] : https://docs.oracle.com/en/java/javase/11/docs/api/java.desktop/java/awt/Robot.html > ? Mario Torre Java Champion and OpenJDK contributor PGP Key: 0BAB254E Fingerprint: AB1C 7C6F 7181 895F E581 93A9 C6B8 A242 0BAB 254E Twitter: @neugens Web: https://www.mario-torre.eu/ Music: https://mario-torre.bandcamp.com/ From neugens at redhat.com Thu Jul 8 16:11:01 2021 From: neugens at redhat.com (Mario Torre) Date: Thu, 8 Jul 2021 18:11:01 +0200 Subject: Call for Discussion : New Project to support the Wayland display server on Linux In-Reply-To: References: <31d8ebcf-6c63-fdb9-e8e8-88573975dd2f@oracle.com> Message-ID: [resending with hopefully the correct email address as I have completely lost which mailing list I'm subscribed to with which email!!!] I tend to agree with this, we certainly need to identify the underlying APIs that are problematic (screenshot and mouse/keyboard control as far as I am aware), then see what we can do from there, and I would like to first address the XWayland use case before turning into a fully fledged new toolkit implementation (but I agree with Phil that this is the long term necessity). What I would like to do however is to see if it makes sense to create a GTK toolkit rather than a Wayland one, and let GTK deal with every abstraction. We can be pretty confident that GTK will always be there, and will work on X11 and Wayland transparently, so even a user on a pure KDE desktop won?t really need to deal directly with Wayland. Ideally, users with pure KDE distributions (OpenSuSE maybe?) may help here analyse what the requirements are. One problem that comes to mind when using GTK as a toolkit is the interaction with other GTK based applications that may be embedded, i.e. Swing views in Eclipse, Firefox based web views, etc.. etc? But I think this is already today an issue and is dealt with in the current 2d and JFX code already. Cheers, Mario On Wed, Jul 7, 2021 at 11:56 PM Alexander Zvegintsev wrote: > > (adding awt-dev) > > Let me add a few comments. > > > already Wayland is the default on RHEL 8, OL 8, Ubuntu 21.04 and I am > > sure others too. > It is the default, but not in every case. Wayland may be turned off > deliberately if you are using a Nvidia graphics card, so you need to > take extra steps to get it working > . > > I faced this issue on Ubuntu 21.04. > > However things getting better > with > Nvidia 470 drivers, its beta released > on > 2021.6.22. So probably it won't be a problem in the near future. > > > Consequently we expect quite shortly to propose an OpenJDK project > > that will consider the goals of > > - a short to medium term solution for JDK running on Wayland in X11 > > compatibility mode > > - a medium to long term solution for JDK running as a native Wayland > > client. > > Both goals having a common task: we will need to implement > java.awt.Robot functionality for Wayland(at least). > > So it makes sense to make XToolkit's java.awt.Robot work under Wayland > first, and then reuse this code for native Wayland client. > > I see two major tasks here: taking screenshots and mouse/keyboard > control. As far as I know there is no standard way to implement they > across all display servers yet. > > Possible ways to implement this: > > * taking screenshot: > o open issue against Wayland, may be resolved > someday:https://gitlab.freedesktop.org/wayland/wayland/-/issues/32 > > o https://flatpak.github.io/xdg-desktop-portal/portal-docs.html#gdbus-org.freedesktop.portal.Screenshot > > o via DBUS interface(display server dependent), > e.g.https://github.com/flameshot-org/flameshot/blob/master/src/utils/screengrabber.cpp#L43 > > * generating key/mouse events: > o https://flatpak.github.io/xdg-desktop-portal/portal-docs.html#gdbus-org.freedesktop.impl.portal.RemoteDesktop > > o generating new virtual input device, uinput, superuser > privileges required, looks too flaky > https://unix.stackexchange.com/questions/422698/how-to-set-absolute-mouse-cursor-position-in-wayland-without-using-mouse > > > This still need more investigation, butxdg-desktop-portal > looks more preferable way > for now. > > > Please see below some caveats for OpenJDK native Wayland client: > > You can't control a position for a top-level window. > > > This will also affect a splashscreen windows. It is still possible to > control position under XWayland though. > > Looks like we will just accept it. > > > Top-level window decorations. > > Initially Wayland had only client-side decorations(you have to draw it > by yourself). > > As of now server-side decorations are available by XDG-Decoration > protocol . > > However server-side window decorations are not mandatory and not all > compositors are supporting it, e.g. Mutter(Gnome Shell's compositor). > > Gnome Shell is the default on Ubuntu, so we will need to provide window > decorations somehow. One of possible solutions is to use Gtk to create a > window for us. > > > -- > Thanks, > Alexander. > > On 7/7/21 6:24 AM, Philip Race wrote: > > > > For a number of years the Linux community has been working on a > > complete replacement for the 1980's era X11 desktop display server > > protocol > > with new protocols and libraries that support client-side rendering > > and a compositing desktop windowing system. > > This work is being done under the auspices of the Wayland project [1] > > and there is a reference > > implementation of a Wayland compositor called "Weston". > > > > A new client written for the Wayland desktop has no dependency at all > > on X11, but Wayland also provides a compatibility > > mode where the X.org X11 server runs along side Wayland, so that > > thousands of X11 applications can continue to run. > > > > Presently all distros that ship the Wayland server, also still ship > > the pure X11 server and the user can select > > which one to use on the login screen. However there will come a time > > when Wayland is the only choice and > > already Wayland is the default on RHEL 8, OL 8, Ubuntu 21.04 and I am > > sure others too. > > > > At that time Java for Linux will "mostly" run via the X11 > > compatibility layer, but would not pass the JCK, > > since some important APIs, notably those of the java.awt.Robot class > > [2] will fail with Wayland. > > We need to solve this so that pure Java and applications which mix > > Java and X11 APis can work. > > > > But even then this would mean Java on Linux is not a first class > > desktop citizen, which is not desirable for > > the long term, given the importance of Linux to many Java developers > > as well as to active > > individual and corporate contributors to the JDK project. > > > > Indeed there have already been informal discussions for some time with > > various parties that have expressed > > interest in helping towards the outcome of supporting Wayland > > > > Consequently we expect quite shortly to propose an OpenJDK project > > that will consider the goals of > > - a short to medium term solution for JDK running on Wayland in X11 > > compatibility mode > > - a medium to long term solution for JDK running as a native Wayland > > client. > > > > There are some unknowns and questions, such as what are the options > > for supporting Robot ? > > What other support is missing ? What platform APIs should the > > implementation use ? > > How does a native Wayland solution interoperate with OpenJFX ? > > > > > > Comments, expressions of interest etc are welcome. > > > > > > -Phil Race, for the Java client groups. > > > > > > [1] : https://wayland.freedesktop.org/ > > [2] : > > https://docs.oracle.com/en/java/javase/11/docs/api/java.desktop/java/awt/Robot.html > -- Mario Torre Manager, Software Engineering Red Hat GmbH 9704 A60C B4BE A8B8 0F30 9205 5D7E 4952 3F65 7898 From alex.buckley at oracle.com Thu Jul 8 17:05:01 2021 From: alex.buckley at oracle.com (Alex Buckley) Date: Thu, 8 Jul 2021 10:05:01 -0700 Subject: Introducing myself In-Reply-To: References: Message-ID: <0ec0799d-e9ea-ebec-7f7c-27c4578b5fa1@oracle.com> On 7/8/2021 8:16 AM, Ben Weidig wrote: > Right now, I'm working on a book about a more functional approach to Java. > That lead me to read *a lot* of JDK source code to understand better how > certain types and features work. > And I believe I found some parts that could be improved, and that's what I > want to try to contribute. > > For example, there's no feature-parity between the functional interfaces, ... > > Another group of types that piqued my interest are the Optional-types. ... Beyond the JDK source code, I recommend reviewing the lambda-libs-spec-experts list (i.e., read every mail) to get a sense of why the Java 8 API ended up the way it did: https://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/ Alex From philip.race at oracle.com Thu Jul 8 17:14:14 2021 From: philip.race at oracle.com (Philip Race) Date: Thu, 8 Jul 2021 10:14:14 -0700 Subject: Call for Discussion : New Project to support the Wayland display server on Linux In-Reply-To: References: <31d8ebcf-6c63-fdb9-e8e8-88573975dd2f@oracle.com> Message-ID: And I think the first phase of the project needs to be an investigation of the alternatives and consideration of how well it matches all the requirements. I would not want to see a "code first, figure it out later" approach. Significant documentation and justification of the options and reasons for choices is required. Even with the much simpler "Metal" pipeline for macOS we had to spend time deciding if we'd use MetalKit or the lower level Metal APIs and the latter is what we found we needed. Very often it seems if you are writing a platform you end up needing to go for the lower level Not saying that's where we'll end up here, but it all needs to be thought through and written down It seems a wayland compositor needs to work on "non-desktop" machines. Think all those millions of rack mounted boxes with minimal graphics sitting in data centres. So it can't be fully "accelerated graphics or nothing" 2D has s/w loops for everything already so can work in that world and it may be that a GTK port can end up doing *most* things we need but it has to do *all* the things we need Options to investigate include (at least) : - GTK - EGL - Vulkan -Software rendering and then we have to see how the rendering side of it (2D) fits with the AWT part. -phil. On 7/8/21 9:11 AM, Mario Torre wrote: > What I would like to do however is to see if it makes sense to create > a GTK toolkit rather than a Wayland one, and let GTK deal with every > abstraction. We can be pretty confident that GTK will always be there, > and will work on X11 and Wayland transparently, so even a user on a > pure KDE desktop won?t really need to deal directly with Wayland. From neugens at redhat.com Thu Jul 8 17:32:48 2021 From: neugens at redhat.com (Mario Torre) Date: Thu, 8 Jul 2021 19:32:48 +0200 Subject: Call for Discussion : New Project to support the Wayland display server on Linux In-Reply-To: References: <31d8ebcf-6c63-fdb9-e8e8-88573975dd2f@oracle.com> Message-ID: On Thu, Jul 8, 2021 at 7:14 PM Philip Race wrote: > > And I think the first phase of the project needs to be an investigation > of the alternatives > and consideration of how well it matches all the requirements. > I would not want to see a "code first, figure it out later" approach. > Significant documentation and justification of the options and reasons > for choices is required. > Even with the much simpler "Metal" pipeline for macOS we had to spend > time deciding if > we'd use MetalKit or the lower level Metal APIs and the latter is what > we found we needed. > Very often it seems if you are writing a platform you end up needing to > go for the lower level > Not saying that's where we'll end up here, but it all needs to be > thought through and written down > > It seems a wayland compositor needs to work on "non-desktop" machines. > Think all those millions of rack mounted boxes with minimal graphics > sitting in data centres. > So it can't be fully "accelerated graphics or nothing" > > 2D has s/w loops for everything already so can work in that world and it may > be that a GTK port can end up doing *most* things we need but it has to > do *all* the things we need > > Options to investigate include (at least) : > > - GTK > - EGL > - Vulkan > -Software rendering > > and then we have to see how the rendering side of it (2D) fits with the > AWT part. +1! Cheers, Mario -- Mario Torre Manager, Software Engineering Red Hat GmbH 9704 A60C B4BE A8B8 0F30 9205 5D7E 4952 3F65 7898 From neugens at redhat.com Fri Jul 9 11:04:24 2021 From: neugens at redhat.com (Mario Torre) Date: Fri, 9 Jul 2021 13:04:24 +0200 Subject: Call for Discussion : New Project to support the Wayland display server on Linux In-Reply-To: <431674FA-6FE1-493A-9D9B-9542BB7FBD44@jetbrains.com> References: <31d8ebcf-6c63-fdb9-e8e8-88573975dd2f@oracle.com> <431674FA-6FE1-493A-9D9B-9542BB7FBD44@jetbrains.com> Message-ID: On Fri, Jul 9, 2021 at 12:17 PM Alexey Ushakov wrote: > > > > On 8 Jul 2021, at 19:11, Mario Torre wrote: > > [resending with hopefully the correct email address as I have > completely lost which mailing list I'm subscribed to with which > email!!!] > > I tend to agree with this, we certainly need to identify the > underlying APIs that are problematic (screenshot and mouse/keyboard > control as far as I am aware), then see what we can do from there, and > I would like to first address the XWayland use case before turning > into a fully fledged new toolkit implementation (but I agree with Phil > that this is the long term necessity). > > What I would like to do however is to see if it makes sense to create > a GTK toolkit rather than a Wayland one, and let GTK deal with every > abstraction. We can be pretty confident that GTK will always be there, > and will work on X11 and Wayland transparently, so even a user on a > pure KDE desktop won?t really need to deal directly with Wayland. > > > I?m not sure, that it?s right direction. We had a similar situation in early days of java. We had a Motif toolkit that was then replaced by XAWT implementation. I think in long term we need to be as close to low level interfaces as possible. We (JetBrains) have a lot of users on different desktops and having only GTK is not an option for us. So, I think that we need to have something like XAWT for Wayland. Yeah, but the experience with that in the past led to the rather interesting result of having a lot of unmanageable special casing code to support all sorts of corner cases with window managers and toolkits. Also, Wayland is not a window manager but a protocol for a compositor, as such it doesn't really do much itself, we do rely on existing implementations. This means that if we have a pure Wayland toolkit it won't work on every KDE instance, for example, unless where KDE supports Wayland (but yes, that may not be an issue, since the X11 variant will be there anyway). I'm not advocating GTK as a sole solution, though, I fully agree with Phil that we need to first explore the options, and it may turn out that GTK isn't what we want after all and indeed a pure Wayland toolkit is the right approach as you suggest, but I think we should explore first existing abstractions and implementation and decide where to go. I would be very interested in your experience here, when you say you have a lot of users on different desktops, what are the combinations you encounter, can you share this information, can we be sure GTK is not part of those distributions? Cheers, Mario -- Mario Torre Manager, Software Engineering Red Hat GmbH 9704 A60C B4BE A8B8 0F30 9205 5D7E 4952 3F65 7898 From peter.firmstone at zeus.net.au Sat Jul 10 07:35:32 2021 From: peter.firmstone at zeus.net.au (Peter Firmstone) Date: Sat, 10 Jul 2021 17:35:32 +1000 Subject: Authorization layer API and low level access checks. In-Reply-To: <974bd92d-4a7b-a454-af54-8b395e411abc@zeus.net.au> References: <896e5ba9-fee6-1aca-3efa-dc2e6e8fb61e@zeus.net.au> <265e217d-6e98-1916-3584-b17844dbc7c2@redhat.com> <5176c40c-56cb-85da-6e96-9a237a783fdc@zeus.net.au> <6523763c-a3ee-07a9-de55-103c5d790a5b@oracle.com> <2f315680-1cdb-1694-34a3-95312bf42ca7@zeus.net.au> <23b99a92-d29e-38c5-5b0d-e2cb70d99c45@zeus.net.au> <9f6d70c1-f073-482d-cf46-c354ad3c902e@gmail.com> <974bd92d-4a7b-a454-af54-8b395e411abc@zeus.net.au> Message-ID: <53634dab-3669-7a41-9095-7e5ac2e4422f@zeus.net.au> Updated authorization layer prototype: https://github.com/pfirmstone/HighPerformanceSecurity On 30/06/2021 9:38 pm, Peter Firmstone wrote: > A draft Authorization implementation, untested. > -- Regards, Peter Firmstone From maxim.kartashev at jetbrains.com Thu Jul 15 16:53:15 2021 From: maxim.kartashev at jetbrains.com (Maxim Kartashev) Date: Thu, 15 Jul 2021 19:53:15 +0300 Subject: FYI: A CFD for a Wayland project has been posted to discuss@openjdk.java.net In-Reply-To: References: Message-ID: > At that time Java for Linux will "mostly" run via the X11 compatibility layer There's a quality-of-service problem with running via the compatibility layer as under certain circumstances native X windows look blurry. Users with small(ish) HiDPI displays tend to enable fractional scaling and with that enabled (regardless of the actual scale), XWayland pretends that the screen size is smaller and then pixel-stretches the resulting window according to the global scale. This works as a temporary solution, but people get quickly tired of looking at text that is blurry. See https://gitlab.gnome.org/GNOME/mutter/-/issues/402 and https://github.com/swaywm/sway/issues/5917 for some more details. On Wed, Jul 7, 2021 at 4:59 PM Philip Race wrote: > > https://mail.openjdk.java.net/pipermail/discuss/2021-July/005846.html > > -phil. From robert at marcanoonline.com Fri Jul 16 13:18:43 2021 From: robert at marcanoonline.com (Robert Marcano) Date: Fri, 16 Jul 2021 09:18:43 -0400 Subject: Call for Discussion : New Project to support the Wayland display server on Linux In-Reply-To: <31d8ebcf-6c63-fdb9-e8e8-88573975dd2f@oracle.com> References: <31d8ebcf-6c63-fdb9-e8e8-88573975dd2f@oracle.com> Message-ID: <257093bf-e3fd-8f61-c5d2-59e5ae10ade9@marcanoonline.com> On 7/7/21 9:24 AM, Philip Race wrote: > >> > There are some unknowns and questions, such as what are the options for > supporting Robot ? > What other support is missing ? What platform APIs should the > implementation? use ? > How does a native Wayland solution interoperate with OpenJFX ? Among the APIs in consideration are all related to window positioning. Wayland doesn't allow clients to tell the window position, the reason is that Wayland compositors aren't expected to be always traditional desktops, for example a tiling window manager. There were a proposal to add a cookie like extension to allow applications to say Wayland to store the current position and the then be able to restore it later, but I don't think that progressed too much. > > > Comments, expressions of interest etc are welcome. > > > -Phil Race, for the Java client groups. > > > [1] : https://wayland.freedesktop.org/ > [2] : > https://docs.oracle.com/en/java/javase/11/docs/api/java.desktop/java/awt/Robot.html > From philip.race at oracle.com Fri Jul 16 19:49:11 2021 From: philip.race at oracle.com (Philip Race) Date: Fri, 16 Jul 2021 12:49:11 -0700 Subject: Call for Discussion : New Project to support the Wayland display server on Linux In-Reply-To: <257093bf-e3fd-8f61-c5d2-59e5ae10ade9@marcanoonline.com> References: <31d8ebcf-6c63-fdb9-e8e8-88573975dd2f@oracle.com> <257093bf-e3fd-8f61-c5d2-59e5ae10ade9@marcanoonline.com> Message-ID: <25cc97d7-c61c-2881-6f8a-40e0aa238122@oracle.com> I have heard this but we'll see how it plays out in practice. X11 already allows a window manager to not place the window exactly where you asked for it, or ignore other requests to do with window geometry. However in practice you usually get something very close to what you requested. It can be written like that to provide for some situations in which the desktop has no way to exactly honour the client's request without it being a complete unpredictable mess. I'm sceptical that anyone will be happy with a desktop that when you ask it to show that file manager dialog in the middle of the screen instead decides the bottom right hand corner of your screen is where it will show it today and tomorrow top-left .. so a spec. allowing something isn't the same as it always happening. And the AWT spec. already allows for this : ======== https://docs.oracle.com/en/java/javase/11/docs/api/java.desktop/java/awt/Window.html Note: the location and size of top-level windows (including Windows, Frames, and Dialogs) are under the control of the desktop's window management system. Calls to setLocation, setSize, and setBounds are requests (not directives) which are forwarded to the window management system. Every effort will be made to honor such requests. However, in some cases the window management system may ignore such requests, or modify the requested geometry in order to place and size the Window in a way that more closely matches the desktop settings ======= If anything about wayland *requires* a spec. update that would be a problem as it would make it more onerous to backport to older JDKs if it is needed. -phil. On 7/16/21 6:18 AM, Robert Marcano wrote: > On 7/7/21 9:24 AM, Philip Race wrote: >> >>> >> There are some unknowns and questions, such as what are the options >> for supporting Robot ? >> What other support is missing ? What platform APIs should the >> implementation? use ? >> How does a native Wayland solution interoperate with OpenJFX ? > > Among the APIs in consideration are all related to window positioning. > Wayland doesn't allow clients to tell the window position, the reason > is that Wayland compositors aren't expected to be always traditional > desktops, for example a tiling window manager. > > There were a proposal to add a cookie like extension to allow > applications to say Wayland to store the current position and the then > be able to restore it later, but I don't think that progressed too much. > >> >> >> Comments, expressions of interest etc are welcome. >> >> >> -Phil Race, for the Java client groups. >> >> >> [1] : https://wayland.freedesktop.org/ >> [2] : >> https://docs.oracle.com/en/java/javase/11/docs/api/java.desktop/java/awt/Robot.html >> > From akozlov at azul.com Sun Jul 18 14:48:33 2021 From: akozlov at azul.com (Anton Kozlov) Date: Sun, 18 Jul 2021 17:48:33 +0300 Subject: Call for Discussion: New Project: CRaC Message-ID: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> Hi, It's been a while since we presented Coordinated Restore at Checkpoint for the first time [0]. We are still committed to the idea and researching this topic. Java applications can avoid the long start-up and warm-up by saving the state of the Java runtime (snapshot, checkpoint). The saved state is then used to start instances fast (restored). But after the state was saved, the execution environment could change. Also, if multiple instances are started from the saved state simultaneously, they should obtain some uniqueness, and their executions should diverge at some point. We believe that the practical way to solve these problems is to make Java applications aware of when the state is saved and restored. Then an application will be able to handle environmental changes. The application will also be able to obtain uniqueness from the environment. The CRaC project aims to research Java API for coordination between application and runtime to save and restore the state. Runtime should support multiple ways to save the state: virtual machine snapshot, container snapshot, CRIU project on Linux, etc. We hope to come with an API that is general enough for any underlying mechanism. We also plan to explore safety checks in the API and runtime, which prevent saving the state if it may not be restored or work correctly after the restore. I propose myself as a Project Lead of the CRaC Project. If you're interested or want to be the committer, please drop me a message. A fork of JDK [1] would be a starting point of this project. Thanks, Anton [0] https://mail.openjdk.java.net/pipermail/discuss/2020-September/005594.html [1] https://github.com/CRaC/jdk From volker.simonis at gmail.com Mon Jul 19 16:46:27 2021 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 19 Jul 2021 18:46:27 +0200 Subject: Call for Discussion: New Project: CRaC In-Reply-To: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> Message-ID: Hi Anton, I think this will be a useful project. I support the creation of a dedicated OpenJDK Project for researching checkpoint/restore technologies within the JVM and defining of a standard API for chekpoint/restore events. I'm interested in becoming a Committer in this project once it will be created. I think we should make it clear that this project will investigate and implement different, orthogonal features which can be delivered independently. In accordance with your description I currently see three main work streams: 1. Define a Java API which allows applications to coordinate with checkpoint/restore events. This API should be generic enough to work with any underlying mechanism no matter if it is initiated by the JVM itself or externally, just for the JVM process or for the whole OS. 2. Investigate what it takes to make the JVM checkpoint/restore aware and safe. Again, this should be as much as possible independent of the underlying checkpointing mechanism. 3. Investigate possibilities to implement checkpoint/restore functionality right within the JVM. Your proof of concept implementation [1] based on CRIU [2] is certainly a good starting point. We in the Amazon Corretto team are currently experimenting with full OS checkpointing and restore as provided by the snapshotting functionality [3] in Firecracker [4]. Compared to CRIU and Docker Checkpoint&Restore, Firecracker Snapshotting is different in the sense that it does not only saves a single process or container but a whole running OS. This has some advantages (i.e. you don't have to care about file handles because they will be still valid after resume) but also some drawbacks (i.e. the need to reseed /dev/random and to sync the system clock). We were wondering if we could use the API which you've proposed in your initial post (i.e. jdk.crac [3]) to notify the JDK and applications of suspend and resume events. In the case of Firecracker, the source of these events would be either the kernel (through a System Generation ID kernel driver [5] or SystemD [6]). There is ongoing work to push these mechanisms into the respective upstream projects but SystemD "inhibitors" [7,8] events could be used already now to trigger the callbacks of the envisioned API. There are several issues which we are currently investigating and which we'd like to discuss in this project: - Doe's it make sense to add timeouts (or a TimeoutException) to the proposed API? - How to deal with Pseudo Random Generators like j.u.Random? They are specified to be deterministic and applications might rely on this determinism. But we might also run into problems if several, cloned JVM instances are using the same random values (e.g. as UIDs). - How to make the JVM/JDK behave gracefully after "time-jumps". - Is there anything special required to make the JVM "snap-safe" if checkpointing can be initiated from outside the JVM at any arbitrary time. I hope we will find at least one "Sponsoring Group" [9] for this Project such that we can continue the discussion on a dedicated mailing list. Thanks for proposing this group and your great work on this topic, Volker [1] https://github.com/org-crac/jdk/compare/jdk-base..jdk-crac [2] https://github.com/checkpoint-restore/criu [3] https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md [4] https://firecracker-microvm.github.io/ [5] https://lkml.org/lkml/2021/3/8/677 [6] https://github.com/systemd/systemd/issues/19269 [7] https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html [8] https://github.com/systemd/systemd/blob/main/src/login/inhibit.c [9] https://openjdk.java.net/projects/#new-project On Sun, Jul 18, 2021 at 4:49 PM Anton Kozlov wrote: > > Hi, > > It's been a while since we presented Coordinated Restore at Checkpoint for the > first time [0]. We are still committed to the idea and researching this topic. > > Java applications can avoid the long start-up and warm-up by saving the state > of the Java runtime (snapshot, checkpoint). The saved state is then used to > start instances fast (restored). But after the state was saved, the execution > environment could change. Also, if multiple instances are started from the > saved state simultaneously, they should obtain some uniqueness, and their > executions should diverge at some point. > > We believe that the practical way to solve these problems is to make Java > applications aware of when the state is saved and restored. Then an > application will be able to handle environmental changes. The application will > also be able to obtain uniqueness from the environment. > > The CRaC project aims to research Java API for coordination between application > and runtime to save and restore the state. Runtime should support multiple > ways to save the state: virtual machine snapshot, container snapshot, CRIU > project on Linux, etc. We hope to come with an API that is general enough for > any underlying mechanism. We also plan to explore safety checks in the API and > runtime, which prevent saving the state if it may not be restored or work > correctly after the restore. > > I propose myself as a Project Lead of the CRaC Project. If you're interested > or want to be the committer, please drop me a message. > > A fork of JDK [1] would be a starting point of this project. > > Thanks, > Anton > > [0] https://mail.openjdk.java.net/pipermail/discuss/2020-September/005594.html > [1] https://github.com/CRaC/jdk > From akozlov at azul.com Mon Jul 19 21:40:23 2021 From: akozlov at azul.com (Anton Kozlov) Date: Tue, 20 Jul 2021 00:40:23 +0300 Subject: Call for Discussion: New Project: CRaC In-Reply-To: References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> Message-ID: <2723b1f7-e675-4256-f3cc-f0e274da7786@azul.com> Hi Volker, thanks for your reply! It will be great to see you as a Committer. Indeed, the CRaC is not about a single checkpoint/restore implementation. A set of deliverables would likely include API and something in JVM, but I'm not sure about the total set -- it's would be a part of the research, depending on what we'll learn. That's correct that there is no specific order between deliverables. The current implementation based on CRIU is just an example (that needs revisiting). Although, it was necessary to understand the practical implication of checkpoint/restore and to get the first problems. Having more mechanisms early will be very useful. A VM like Firecracker is probably on another side of the spectrum of mechanisms for checkpoint/restore. The VM environment looks rather different compared to a process. So it is interesting what does it demand from the API and safety checks. Such work should highlight what I hard-coded for CRIU inadvertently for myself. The questions likely need elaboration and likely experimenting/testing. I would love to have this discussed on the appropriate mail list of the project. Yes, we need a sponsoring group. Thanks, Anton On 7/19/21 7:46 PM, Volker Simonis wrote: > Hi Anton, > > I think this will be a useful project. I support the creation of a > dedicated OpenJDK Project for researching checkpoint/restore > technologies within the JVM and defining of a standard API for > chekpoint/restore events. I'm interested in becoming a Committer in > this project once it will be created. > > I think we should make it clear that this project will investigate and > implement different, orthogonal features which can be delivered > independently. In accordance with your description I currently see > three main work streams: > > 1. Define a Java API which allows applications to coordinate with > checkpoint/restore events. This API should be generic enough to work > with any underlying mechanism no matter if it is initiated by the JVM > itself or externally, just for the JVM process or for the whole OS. > > 2. Investigate what it takes to make the JVM checkpoint/restore aware > and safe. Again, this should be as much as possible independent of the > underlying checkpointing mechanism. > > 3. Investigate possibilities to implement checkpoint/restore > functionality right within the JVM. Your proof of concept > implementation [1] based on CRIU [2] is certainly a good starting > point. > > We in the Amazon Corretto team are currently experimenting with full > OS checkpointing and restore as provided by the snapshotting > functionality [3] in Firecracker [4]. Compared to CRIU and Docker > Checkpoint&Restore, Firecracker Snapshotting is different in the sense > that it does not only saves a single process or container but a whole > running OS. This has some advantages (i.e. you don't have to care > about file handles because they will be still valid after resume) but > also some drawbacks (i.e. the need to reseed /dev/random and to sync > the system clock). > > We were wondering if we could use the API which you've proposed in > your initial post (i.e. jdk.crac [3]) to notify the JDK and > applications of suspend and resume events. In the case of Firecracker, > the source of these events would be either the kernel (through a > System Generation ID kernel driver [5] or SystemD [6]). There is > ongoing work to push these mechanisms into the respective upstream > projects but SystemD "inhibitors" [7,8] events could be used already > now to trigger the callbacks of the envisioned API. > > There are several issues which we are currently investigating and > which we'd like to discuss in this project: > - Doe's it make sense to add timeouts (or a TimeoutException) to the > proposed API? > - How to deal with Pseudo Random Generators like j.u.Random? They are > specified to be deterministic and applications might rely on this > determinism. But we might also run into problems if several, cloned > JVM instances are using the same random values (e.g. as UIDs). > - How to make the JVM/JDK behave gracefully after "time-jumps". > - Is there anything special required to make the JVM "snap-safe" if > checkpointing can be initiated from outside the JVM at any arbitrary > time. > > I hope we will find at least one "Sponsoring Group" [9] for this > Project such that we can continue the discussion on a dedicated > mailing list. > > Thanks for proposing this group and your great work on this topic, > Volker > > [1] https://github.com/org-crac/jdk/compare/jdk-base..jdk-crac > [2] https://github.com/checkpoint-restore/criu > [3] https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md > [4] https://firecracker-microvm.github.io/ > [5] https://lkml.org/lkml/2021/3/8/677 > [6] https://github.com/systemd/systemd/issues/19269 > [7] https://www.freedesktop.org/software/systemd/man/systemd-inhibit.html > [8] https://github.com/systemd/systemd/blob/main/src/login/inhibit.c > [9] https://openjdk.java.net/projects/#new-project > > On Sun, Jul 18, 2021 at 4:49 PM Anton Kozlov wrote: >> >> Hi, >> >> It's been a while since we presented Coordinated Restore at Checkpoint for the >> first time [0]. We are still committed to the idea and researching this topic. >> >> Java applications can avoid the long start-up and warm-up by saving the state >> of the Java runtime (snapshot, checkpoint). The saved state is then used to >> start instances fast (restored). But after the state was saved, the execution >> environment could change. Also, if multiple instances are started from the >> saved state simultaneously, they should obtain some uniqueness, and their >> executions should diverge at some point. >> >> We believe that the practical way to solve these problems is to make Java >> applications aware of when the state is saved and restored. Then an >> application will be able to handle environmental changes. The application will >> also be able to obtain uniqueness from the environment. >> >> The CRaC project aims to research Java API for coordination between application >> and runtime to save and restore the state. Runtime should support multiple >> ways to save the state: virtual machine snapshot, container snapshot, CRIU >> project on Linux, etc. We hope to come with an API that is general enough for >> any underlying mechanism. We also plan to explore safety checks in the API and >> runtime, which prevent saving the state if it may not be restored or work >> correctly after the restore. >> >> I propose myself as a Project Lead of the CRaC Project. If you're interested >> or want to be the committer, please drop me a message. >> >> A fork of JDK [1] would be a starting point of this project. >> >> Thanks, >> Anton >> >> [0] https://mail.openjdk.java.net/pipermail/discuss/2020-September/005594.html >> [1] https://github.com/CRaC/jdk >> > From robert at marcanoonline.com Mon Jul 19 13:56:59 2021 From: robert at marcanoonline.com (Robert Marcano) Date: Mon, 19 Jul 2021 09:56:59 -0400 Subject: Call for Discussion : New Project to support the Wayland display server on Linux In-Reply-To: <25cc97d7-c61c-2881-6f8a-40e0aa238122@oracle.com> References: <31d8ebcf-6c63-fdb9-e8e8-88573975dd2f@oracle.com> <257093bf-e3fd-8f61-c5d2-59e5ae10ade9@marcanoonline.com> <25cc97d7-c61c-2881-6f8a-40e0aa238122@oracle.com> Message-ID: On 7/16/21 3:49 PM, Philip Race wrote: > I have heard this but we'll see how it plays out in practice. > X11 already allows a window manager to not place the window exactly > where you asked for it, or > ignore other requests to do with window geometry. > > However in practice you usually get something very close to what you > requested. > It can be written like that to provide for some situations in which the > desktop has no way to exactly honour the client's request > without it being a complete unpredictable mess. > I'm sceptical that anyone will be happy with a desktop that when you ask > it to show that file manager dialog > in the middle of the screen instead decides the bottom right hand corner > of your screen is where it will show it today > and tomorrow top-left .. so a spec. allowing something isn't the same as > it always happening. > > > And the AWT spec. already allows for this : > ======== > https://docs.oracle.com/en/java/javase/11/docs/api/java.desktop/java/awt/Window.html > > > Note: the location and size of top-level windows (including Windows, > Frames, and Dialogs) are > under the control of the desktop's window management system. Calls to > setLocation, setSize, and setBounds > are requests (not directives) which are forwarded to the window > management system. > Every effort will be made to honor such requests. However, in some cases > the window management > system may ignore such requests, or modify the requested geometry in > order to place and size the > Window in a way that more closely matches the desktop settings > > ======= > > If anything about wayland *requires* a spec. update that would be a > problem as it would make it more > onerous to backport to older JDKs if it is needed. > > -phil. True, the Java API spec. allow for the Windows manager for not being able to set exact positions, but Wayland doesn't even have a way to get the current position and I am not sure about how ComponentListener.componentMoved() will behave in that situation. IIRC XWayland was initialy unable to set windows position so a private "protocol" between Wayland and XWayland was estabilished because it was noticed too many legacy applications were having problems. In real world so many people coded their applications expecting setting windows positions request were respected even if the spec. say it wasn't guaranteed. > > On 7/16/21 6:18 AM, Robert Marcano wrote: >> On 7/7/21 9:24 AM, Philip Race wrote: >>> >>>> >>> There are some unknowns and questions, such as what are the options >>> for supporting Robot ? >>> What other support is missing ? What platform APIs should the >>> implementation? use ? >>> How does a native Wayland solution interoperate with OpenJFX ? >> >> Among the APIs in consideration are all related to window positioning. >> Wayland doesn't allow clients to tell the window position, the reason >> is that Wayland compositors aren't expected to be always traditional >> desktops, for example a tiling window manager. >> >> There were a proposal to add a cookie like extension to allow >> applications to say Wayland to store the current position and the then >> be able to restore it later, but I don't think that progressed too much. >> >>> >>> >>> Comments, expressions of interest etc are welcome. >>> >>> >>> -Phil Race, for the Java client groups. >>> >>> >>> [1] : https://wayland.freedesktop.org/ >>> [2] : >>> https://docs.oracle.com/en/java/javase/11/docs/api/java.desktop/java/awt/Robot.html >>> >> > From mbien42 at gmail.com Tue Jul 20 15:31:20 2021 From: mbien42 at gmail.com (Michael Bien) Date: Tue, 20 Jul 2021 17:31:20 +0200 Subject: Call for Discussion: New Project: CRaC In-Reply-To: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> Message-ID: Hello, great to hear that there is research done in this area. I did some experimenting myself by just binding to the CRIU C-API via panama some time ago[1][2]. It quickly became clear that, although it worked surprisingly well, it probably required a lower level approach to properly implement it. (I was mostly interested in CRIUs rootless mode[3] and restoring warmed up JVMs, which came with its own issues and kernel bugs) Checkpointing the JVM is probably much safer when all threads have stopped and more economical when the heap is compacted - the JVM itself is in a better position to do that than the java application. CRIU can't deal with situations when files changed between checkpoint and restore. Restoring a java program which is logging to a file will only work once, a second attempt would fail since the file changed due to the first restore. An API might be able to mitigate a lot of this, e.g a logger could rotate the log to a empty file, or close the file on checkpoint an reopen it on restore. JFR should do this out of the box. I was wondering if the IO stream impl itself could help in some situations. Non-file related APIs might have to be made restore-aware too. For example SecureRandom might require re-seeding, keystores/SSL certs might need special attention etc. although it worked surprisingly well (restoring was also quite fast), implementing it at the java application level would be fairly limited. Looking forward to hear/see more from CRaC! best regards, michael [1] https://github.com/mbien/JCRIU/ [2] https://mbien.dev/blog/entry/java-and-rootless-criu-using [3] https://github.com/checkpoint-restore/criu/pull/1155 On 18.07.21 16:48, Anton Kozlov wrote: > Hi, > > It's been a while since we presented Coordinated Restore at Checkpoint > for the > first time [0].? We are still committed to the idea and researching > this topic. > > Java applications can avoid the long start-up and warm-up by saving > the state > of the Java runtime (snapshot, checkpoint).? The saved state is then > used to > start instances fast (restored).? But after the state was saved, the > execution > environment could change.? Also, if multiple instances are started > from the > saved state simultaneously, they should obtain some uniqueness, and their > executions should diverge at some point. > > We believe that the practical way to solve these problems is to make Java > applications aware of when the state is saved and restored.? Then an > application will be able to handle environmental changes.? The > application will > also be able to obtain uniqueness from the environment. > > The CRaC project aims to research Java API for coordination between > application > and runtime to save and restore the state.? Runtime should support > multiple > ways to save the state: virtual machine snapshot, container snapshot, > CRIU > project on Linux, etc.? We hope to come with an API that is general > enough for > any underlying mechanism.? We also plan to explore safety checks in > the API and > runtime, which prevent saving the state if it may not be restored or work > correctly after the restore. > > I propose myself as a Project Lead of the CRaC Project.? If you're > interested > or want to be the committer, please drop me a message. > > A fork of JDK [1] would be a starting point of this project. > > Thanks, > Anton > > [0] > https://mail.openjdk.java.net/pipermail/discuss/2020-September/005594.html > [1] https://github.com/CRaC/jdk > From chf at redhat.com Tue Jul 20 17:05:30 2021 From: chf at redhat.com (Christine Flood) Date: Tue, 20 Jul 2021 13:05:30 -0400 Subject: Call for Discussion: New Project: CRaC In-Reply-To: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> Message-ID: We at Red Hat have been working on this problem as well and I think now is a great time to sync our efforts. Our current project, Jigawatts 1.21 is based on allowing the user to specify precise checkpoints either by adding a method call, or manipulating bytecodes via Byteman. This code is separate from OpenJDK and will be distributed in it's own Linux rpm. The next phase will require some changes to OpenJDK, specifically we are looking to do some optimizations at checkpoint time to improve startup/runtime. Here are two ideas. 1) Shrink the heap to just the live data size, this both guarantees that there are no secrets hidden in garbage objects and minimizes restore time. We can restore and immediately grow the heap. 2) Hot swap garbage collectors, this allows us to give fast startup and fast runtime by using the epsilon collector on restore, eliminating the space for card table and time for gc barriers. This will be particularly useful for programs which wish to run fast small apps against an already initialized data set. So, my question to you is does it make sense to combine these into one effort, or do we want to keep the projects separate for now? The efforts are focusing in two different areas, specifically my understanding is that CRAC wants to be able to checkpoint a JVM based on an external signal so at any point in the runtime while Jigawatts is based more on user controlled and JVM optimized checkpoints. Christine On Sun, Jul 18, 2021 at 10:50 AM Anton Kozlov wrote: > Hi, > > It's been a while since we presented Coordinated Restore at Checkpoint for > the > first time [0]. We are still committed to the idea and researching this > topic. > > Java applications can avoid the long start-up and warm-up by saving the > state > of the Java runtime (snapshot, checkpoint). The saved state is then used > to > start instances fast (restored). But after the state was saved, the > execution > environment could change. Also, if multiple instances are started from the > saved state simultaneously, they should obtain some uniqueness, and their > executions should diverge at some point. > > We believe that the practical way to solve these problems is to make Java > applications aware of when the state is saved and restored. Then an > application will be able to handle environmental changes. The application > will > also be able to obtain uniqueness from the environment. > > The CRaC project aims to research Java API for coordination between > application > and runtime to save and restore the state. Runtime should support multiple > ways to save the state: virtual machine snapshot, container snapshot, CRIU > project on Linux, etc. We hope to come with an API that is general enough > for > any underlying mechanism. We also plan to explore safety checks in the > API and > runtime, which prevent saving the state if it may not be restored or work > correctly after the restore. > > I propose myself as a Project Lead of the CRaC Project. If you're > interested > or want to be the committer, please drop me a message. > > A fork of JDK [1] would be a starting point of this project. > > Thanks, > Anton > > [0] > https://mail.openjdk.java.net/pipermail/discuss/2020-September/005594.html > [1] https://github.com/CRaC/jdk > > From volker.simonis at gmail.com Wed Jul 21 07:56:23 2021 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 21 Jul 2021 09:56:23 +0200 Subject: Call for Discussion: New Project: CRaC In-Reply-To: References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> Message-ID: Hi Christine, thanks for joining the discussion :) Please find my further comments inline. On Tue, Jul 20, 2021 at 7:06 PM Christine Flood wrote: > > We at Red Hat have been working on this problem as well and I think now is > a great time to sync our efforts. > > Our current project, Jigawatts 1.21 is based on allowing the user to > specify precise checkpoints either by adding a method call, or manipulating > bytecodes via Byteman. > This code is separate from OpenJDK and will be distributed in it's own > Linux rpm. > > The next phase will require some changes to OpenJDK, specifically we are > looking to do some optimizations at checkpoint time to improve > startup/runtime. > > Here are two ideas. > > 1) Shrink the heap to just the live data size, this both guarantees that > there are no secrets hidden in garbage objects and minimizes restore time. > > We can restore and immediately grow the heap. > > 2) Hot swap garbage collectors, this allows us to give fast startup and > fast runtime by using the epsilon collector on restore, eliminating the > space for card table and time for gc barriers. This will be particularly > useful for programs which wish to run fast small apps against an already > initialized data set. > > So, my question to you is does it make sense to combine these into one > effort, or do we want to keep the projects separate for now? The efforts > are focusing in two different areas, specifically my understanding is that > CRAC wants to be able to checkpoint a JVM based on an external signal so at > any point in the runtime while Jigawatts is based more on user controlled > and JVM optimized checkpoints. >From my point of view it makes sense to combine the efforts. I think CRAC should explore different ideas and directions (see my previous mail). One of them will be be how the JVM can implement and control checkpointing functionality. That's what your Jigawatts project is doing, but also what CRAC did in a POC based on CRIU. The other direction CRAC should explore is how the JVM could react an externally triggered checkpointing events. Finally, I think we need a mechanism exposed through a standard API which allows applications and frameworks to react on checkpointing events no matter if these events are triggered internally, by the JVM or externally. Such a mechanism is especially needed in situations where an applications is not simply suspended and resumed but also cloned several times after it was resumed (or resumed several times from the same checkpointed state). > > > Christine > > > > > > On Sun, Jul 18, 2021 at 10:50 AM Anton Kozlov wrote: > > > Hi, > > > > It's been a while since we presented Coordinated Restore at Checkpoint for > > the > > first time [0]. We are still committed to the idea and researching this > > topic. > > > > Java applications can avoid the long start-up and warm-up by saving the > > state > > of the Java runtime (snapshot, checkpoint). The saved state is then used > > to > > start instances fast (restored). But after the state was saved, the > > execution > > environment could change. Also, if multiple instances are started from the > > saved state simultaneously, they should obtain some uniqueness, and their > > executions should diverge at some point. > > > > We believe that the practical way to solve these problems is to make Java > > applications aware of when the state is saved and restored. Then an > > application will be able to handle environmental changes. The application > > will > > also be able to obtain uniqueness from the environment. > > > > The CRaC project aims to research Java API for coordination between > > application > > and runtime to save and restore the state. Runtime should support multiple > > ways to save the state: virtual machine snapshot, container snapshot, CRIU > > project on Linux, etc. We hope to come with an API that is general enough > > for > > any underlying mechanism. We also plan to explore safety checks in the > > API and > > runtime, which prevent saving the state if it may not be restored or work > > correctly after the restore. > > > > I propose myself as a Project Lead of the CRaC Project. If you're > > interested > > or want to be the committer, please drop me a message. > > > > A fork of JDK [1] would be a starting point of this project. > > > > Thanks, > > Anton > > > > [0] > > https://mail.openjdk.java.net/pipermail/discuss/2020-September/005594.html > > [1] https://github.com/CRaC/jdk > > > > From akozlov at azul.com Wed Jul 21 08:56:15 2021 From: akozlov at azul.com (Anton Kozlov) Date: Wed, 21 Jul 2021 11:56:15 +0300 Subject: Call for Discussion: New Project: CRaC In-Reply-To: References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> Message-ID: <0a492fef-7f14-de61-d03a-3dbb35583f0e@azul.com> Hi Christine, For me, it also makes sense to combine efforts. 1) In CRaC, the heap is already shrunk at the checkpoint [1]. However, it required minor changes in each GC, and ZGC is not covered yet. The target of this is optimization is solely image size. As we wrote, we optimized image loading size with page-in of the image data. 2) I'm not sure how much GC barriers cost for a small application, do you have some data it is that high? This use-case is definitely interesting, but this optimization seems rather complex to implement. As Volker correctly noted, coordination is required no matter if the checkpoint request came externally or internally. Actually, in CRaC there is an API for internal checkpoint request [2], that is used to implement the external request via jcmd. In Tomcat Catalina example the method is used as one of the starting modes [3], which saves Tomcat instance right after it was initialized and is ready to serve requests. Strictly speaking, this is another kind of coordination, but one of the targets of the project is to produce an API that is general enough for different use-cases and mechanisms. So I don't see a big difference in what we are trying to do, if we continue exposing such kind of an API for internal checkpoint request. Thanks, Anton [1] https://github.com/CRaC/jdk/compare/jdk-base..jdk-crac#diff-c3cf48d74cc5c7b6326bd3602e87cd7ea7277a5b856c7aa0940ec307f51f5281 [2] https://crac.github.io/jdk/jdk-crac/api/java.base/jdk/crac/Core.html#checkpointRestore() [3] https://github.com/CRaC/tomcat/compare/release-crac-jdbc...crac#diff-2d3e1f08ceeedd89b50366360b3541dddba9f2b7b1602a2071ca6359ace4a62eR529 On 7/21/21 10:56 AM, Volker Simonis wrote: > Hi Christine, > > thanks for joining the discussion :) > Please find my further comments inline. > > On Tue, Jul 20, 2021 at 7:06 PM Christine Flood wrote: >> >> We at Red Hat have been working on this problem as well and I think now is >> a great time to sync our efforts. >> >> Our current project, Jigawatts 1.21 is based on allowing the user to >> specify precise checkpoints either by adding a method call, or manipulating >> bytecodes via Byteman. >> This code is separate from OpenJDK and will be distributed in it's own >> Linux rpm. >> >> The next phase will require some changes to OpenJDK, specifically we are >> looking to do some optimizations at checkpoint time to improve >> startup/runtime. >> >> Here are two ideas. >> >> 1) Shrink the heap to just the live data size, this both guarantees that >> there are no secrets hidden in garbage objects and minimizes restore time. >> >> We can restore and immediately grow the heap. >> >> 2) Hot swap garbage collectors, this allows us to give fast startup and >> fast runtime by using the epsilon collector on restore, eliminating the >> space for card table and time for gc barriers. This will be particularly >> useful for programs which wish to run fast small apps against an already >> initialized data set. >> >> So, my question to you is does it make sense to combine these into one >> effort, or do we want to keep the projects separate for now? The efforts >> are focusing in two different areas, specifically my understanding is that >> CRAC wants to be able to checkpoint a JVM based on an external signal so at >> any point in the runtime while Jigawatts is based more on user controlled >> and JVM optimized checkpoints. > > From my point of view it makes sense to combine the efforts. I think > CRAC should explore different ideas and directions (see my previous > mail). One of them will be be how the JVM can implement and control > checkpointing functionality. That's what your Jigawatts project is > doing, but also what CRAC did in a POC based on CRIU. > > The other direction CRAC should explore is how the JVM could react an > externally triggered checkpointing events. > > Finally, I think we need a mechanism exposed through a standard API > which allows applications and frameworks to react on checkpointing > events no matter if these events are triggered internally, by the JVM > or externally. Such a mechanism is especially needed in situations > where an applications is not simply suspended and resumed but also > cloned several times after it was resumed (or resumed several times > from the same checkpointed state). > >> >> >> Christine >> >> >> >> >> >> On Sun, Jul 18, 2021 at 10:50 AM Anton Kozlov wrote: >> >>> Hi, >>> >>> It's been a while since we presented Coordinated Restore at Checkpoint for >>> the >>> first time [0]. We are still committed to the idea and researching this >>> topic. >>> >>> Java applications can avoid the long start-up and warm-up by saving the >>> state >>> of the Java runtime (snapshot, checkpoint). The saved state is then used >>> to >>> start instances fast (restored). But after the state was saved, the >>> execution >>> environment could change. Also, if multiple instances are started from the >>> saved state simultaneously, they should obtain some uniqueness, and their >>> executions should diverge at some point. >>> >>> We believe that the practical way to solve these problems is to make Java >>> applications aware of when the state is saved and restored. Then an >>> application will be able to handle environmental changes. The application >>> will >>> also be able to obtain uniqueness from the environment. >>> >>> The CRaC project aims to research Java API for coordination between >>> application >>> and runtime to save and restore the state. Runtime should support multiple >>> ways to save the state: virtual machine snapshot, container snapshot, CRIU >>> project on Linux, etc. We hope to come with an API that is general enough >>> for >>> any underlying mechanism. We also plan to explore safety checks in the >>> API and >>> runtime, which prevent saving the state if it may not be restored or work >>> correctly after the restore. >>> >>> I propose myself as a Project Lead of the CRaC Project. If you're >>> interested >>> or want to be the committer, please drop me a message. >>> >>> A fork of JDK [1] would be a starting point of this project. >>> >>> Thanks, >>> Anton >>> >>> [0] >>> https://mail.openjdk.java.net/pipermail/discuss/2020-September/005594.html >>> [1] https://github.com/CRaC/jdk >>> >>> From adinn at redhat.com Wed Jul 21 09:26:28 2021 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 21 Jul 2021 10:26:28 +0100 Subject: Call for Discussion: New Project: CRaC In-Reply-To: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> Message-ID: <695e1fac-d924-bfcc-1f13-511f35158ba7@redhat.com> On 18/07/2021 15:48, Anton Kozlov wrote: > Java applications can avoid the long start-up and warm-up by saving the > state > of the Java runtime (snapshot, checkpoint).? The saved state is then > used to > start instances fast (restored).? But after the state was saved, the > execution > environment could change.? Also, if multiple instances are started from the > saved state simultaneously, they should obtain some uniqueness, and their > executions should diverge at some point. This proposal rings bells with project Leyden. I'm not proposing the need for any absolute tie between the two projects but I just want to note that Leyden faces some similar concerns and that an integrated approach to resolving them might be beneficial. With static Java programs much of the fast startup and low footprint comes from having a pre-populated heap that contains primitive data and objects created by running static initialization during generation of the static image. This is somewhat dissimilar to CRAC in that the initial heap is not really a snapshot of a prior heap state. Instead it is an explicitly constructed initial data state for operation of the static compiled program. That includes linking it to an associated, complete and closed meta-data model. Despite that difference similar concerns arise. Experience from GraalVM indicates that not all heap data can be fully constructed in advance to cater for all possible variations in the target platform. This suggests that it would be beneficial to provide some language or runtime mechanism to 'complete' or 'repair' the initial heap state at startup. If we can see some way to align the needs of these two projects then we might be able to align any language or runtime capabilities support required to resolve those needs. regards, Andrew Dinn ----------- From akozlov at azul.com Wed Jul 21 10:36:41 2021 From: akozlov at azul.com (Anton Kozlov) Date: Wed, 21 Jul 2021 13:36:41 +0300 Subject: Call for Discussion: New Project: CRaC In-Reply-To: References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> Message-ID: <104c81ed-8d37-7010-16ba-a94b31d65cce@azul.com> Hi Michael, Interesting links! CRIU project did a terrific job in checkpointing and restoring an arbitrary process. But if we think about how to continue the execution of the saved Java runtime instance, multiple times simultaneously, the examples are what we should do better. The internal state of runtime, standard library, or application (like a crypto random seed) needs fixing after the restore. External resources could not always be captured. These are files for a process-based checkpoint or network connections for VM-based snapshotting. JFR is a good point and more such changes will likely appear over time. CRaC handles perfdata temp file, which is used to implement jcmd and jps functionality. Without special care on the JVM side, the missing perfdata will likely prevent the second restore with CRIU (the first restored instance deletes the file as not needed) or the restore after reboot. The Logger example is invaluable to demonstrate why coordination is needed. Without knowledge about semantic, it's impossible to distinguish between e.g. a log file, previous content of which is not important, and a config file, which should be re-read after restore. So automatic handling of files does not seem possible in general. Some convenience can be implemented (like automatic log rotation), but this needs to be done with the awareness of the semantic and should allow error handling on the Java application side. We require to re-acquire resources at the restore and allow such code to throw exceptions. Thanks, Anton On 7/20/21 6:31 PM, Michael Bien wrote: > Hello, > > great to hear that there is research done in this area. > > I did some experimenting myself by just binding to the CRIU C-API via panama some time ago[1][2]. It quickly became clear that, although it worked surprisingly well, it probably required a lower level approach to properly implement it. (I was mostly interested in CRIUs rootless mode[3] and restoring warmed up JVMs, which came with its own issues and kernel bugs) > > Checkpointing the JVM is probably much safer when all threads have stopped and more economical when the heap is compacted - the JVM itself is in a better position to do that than the java application. > > CRIU can't deal with situations when files changed between checkpoint and restore. Restoring a java program which is logging to a file will only work once, a second attempt would fail since the file changed due to the first restore. An API might be able to mitigate a lot of this, e.g a logger could rotate the log to a empty file, or close the file on checkpoint an reopen it on restore. JFR should do this out of the box. I was wondering if the IO stream impl itself could help in some situations. > > Non-file related APIs might have to be made restore-aware too. For example SecureRandom might require re-seeding, keystores/SSL certs might need special attention etc. > > > although it worked surprisingly well (restoring was also quite fast), implementing it at the java application level would be fairly limited. Looking forward to hear/see more from CRaC! > > best regards, > michael > > [1] https://github.com/mbien/JCRIU/ > [2] https://mbien.dev/blog/entry/java-and-rootless-criu-using > [3] https://github.com/checkpoint-restore/criu/pull/1155 > > On 18.07.21 16:48, Anton Kozlov wrote: >> Hi, >> >> It's been a while since we presented Coordinated Restore at Checkpoint for the >> first time [0].? We are still committed to the idea and researching this topic. >> >> Java applications can avoid the long start-up and warm-up by saving the state >> of the Java runtime (snapshot, checkpoint).? The saved state is then used to >> start instances fast (restored).? But after the state was saved, the execution >> environment could change.? Also, if multiple instances are started from the >> saved state simultaneously, they should obtain some uniqueness, and their >> executions should diverge at some point. >> >> We believe that the practical way to solve these problems is to make Java >> applications aware of when the state is saved and restored.? Then an >> application will be able to handle environmental changes.? The application will >> also be able to obtain uniqueness from the environment. >> >> The CRaC project aims to research Java API for coordination between application >> and runtime to save and restore the state.? Runtime should support multiple >> ways to save the state: virtual machine snapshot, container snapshot, CRIU >> project on Linux, etc.? We hope to come with an API that is general enough for >> any underlying mechanism.? We also plan to explore safety checks in the API and >> runtime, which prevent saving the state if it may not be restored or work >> correctly after the restore. >> >> I propose myself as a Project Lead of the CRaC Project.? If you're interested >> or want to be the committer, please drop me a message. >> >> A fork of JDK [1] would be a starting point of this project. >> >> Thanks, >> Anton >> >> [0] https://mail.openjdk.java.net/pipermail/discuss/2020-September/005594.html >> [1] https://github.com/CRaC/jdk >> > From kasperni at gmail.com Wed Jul 21 10:59:30 2021 From: kasperni at gmail.com (Kasper Nielsen) Date: Wed, 21 Jul 2021 11:59:30 +0100 Subject: Call for Discussion: New Project: CRaC In-Reply-To: <104c81ed-8d37-7010-16ba-a94b31d65cce@azul.com> References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> <104c81ed-8d37-7010-16ba-a94b31d65cce@azul.com> Message-ID: On Wed, 21 Jul 2021 at 11:38, Anton Kozlov wrote: > But if we think about how to continue the execution of the saved Java > runtime > instance, multiple times simultaneously, the examples are what we should do > better. The internal state of runtime, standard library, or application > (like > a crypto random seed) needs fixing after the restore. These are issues that are relevant to native executable images as well. For example, GraalVM native image by default defers static initialization of classes until runtime instead of build-time to avoid such issues. A solution that comes with its own set of problems, and would not be relevant here. /Kasper From akozlov at azul.com Wed Jul 21 12:11:43 2021 From: akozlov at azul.com (Anton Kozlov) Date: Wed, 21 Jul 2021 15:11:43 +0300 Subject: Call for Discussion: New Project: CRaC In-Reply-To: <695e1fac-d924-bfcc-1f13-511f35158ba7@redhat.com> References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> <695e1fac-d924-bfcc-1f13-511f35158ba7@redhat.com> Message-ID: <0fcaa219-3969-c521-8483-1d620bd6de06@azul.com> Hi Andrew, This is a very interesting topic. The targeted problems are similar, but the means and implications are different. There may be few intersections. However, I'm not sure how practical they are. Taking this another way, to reduce the start-up of the static Java program, as much as possible of execution is performed in the compile time. From GraalVM, sometimes you need human assistance or a profiling run that provides missing pieces of information. The warm-up is solved by analyzing and optimizing all the future execution paths. A run with CRaC API could be a profiling phase that constructs 100% of the state without a complex analysis of the initialization phase. We don't need an analysis if we have a complete result of a single run. Even if it caught too many details of that run, a java program can register beforeCheckpoint callback to clear some parts of the state before it is saved. For example, it could be possible to dump java heap, metaspace, etc at the time of "checkpoint", while the result is not a real runtime image. And then use this JVM data as a starting point for the static Java program compilation, compiling in afterRestore methods that update the state after the restore in CRaC. In theory, any choice that cannot be decided by static analysis from the initial run could even be treated as unreachable. So the amount of information for static analysis and the outcome of compilation will be regulated by the extensiveness of the profiling run. And still, the same Java code with CRaC API could be used saved as a part of a complete Java runtime image, done by VM, CRIU, etc. Thanks, Anton On 7/21/21 12:26 PM, Andrew Dinn wrote: > On 18/07/2021 15:48, Anton Kozlov wrote: >> Java applications can avoid the long start-up and warm-up by saving the state >> of the Java runtime (snapshot, checkpoint).? The saved state is then used to >> start instances fast (restored).? But after the state was saved, the execution >> environment could change.? Also, if multiple instances are started from the >> saved state simultaneously, they should obtain some uniqueness, and their >> executions should diverge at some point. > This proposal rings bells with project Leyden. I'm not proposing the need for any absolute tie between the two projects but I just want to note that Leyden faces some similar concerns and that an integrated approach to resolving them might be beneficial. > > With static Java programs much of the fast startup and low footprint comes from having a pre-populated heap that contains primitive data and objects created by running static initialization during generation of the static image. This is somewhat dissimilar to CRAC in that the initial heap is not really a snapshot of a prior heap state. Instead it is an explicitly constructed initial data state for operation of the static compiled program. That includes linking it to an associated, complete and closed meta-data model. Despite that difference similar concerns arise. > > Experience from GraalVM indicates that not all heap data can be fully constructed in advance to cater for all possible variations in the target platform. This suggests that it would be beneficial to provide some language or runtime mechanism to 'complete' or 'repair' the initial heap state at startup. If we can see some way to align the needs of these two projects then we might be able to align any language or runtime capabilities support required to resolve those needs. > > regards, > > > Andrew Dinn > ----------- > From mbien42 at gmail.com Wed Jul 21 12:08:13 2021 From: mbien42 at gmail.com (Michael Bien) Date: Wed, 21 Jul 2021 14:08:13 +0200 Subject: Call for Discussion: New Project: CRaC In-Reply-To: <104c81ed-8d37-7010-16ba-a94b31d65cce@azul.com> References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> <104c81ed-8d37-7010-16ba-a94b31d65cce@azul.com> Message-ID: <9f19ee53-4bde-3ee7-46af-67269c7d784d@gmail.com> I remember from discussions on #criu, CRIU interestingly doesn't actually care that much if a file changed between cp and restore. It has the rudimentary check comparing file size (!) mostly only for the scenario where a loaded lib changed, causing seg faults on restore, in best case. Maybe criu could skip changed files instead of failing and notify the JVM. The JVM would apply black magic and let IO streams throw IOE on next read or write - giving the application a chance to recover even without knowing of a CRaC-API. There might be situations where throwing an IOE isn't even needed. just a thought i don't know if any of this is doable, best regards, michael On 21.07.21 12:36, Anton Kozlov wrote: > Hi Michael, > > Interesting links! > > CRIU project did a terrific job in checkpointing and restoring an > arbitrary > process. > > But if we think about how to continue the execution of the saved Java > runtime > instance, multiple times simultaneously, the examples are what we > should do > better.? The internal state of runtime, standard library, or > application (like > a crypto random seed) needs fixing after the restore.? External > resources could > not always be captured.? These are files for a process-based > checkpoint or > network connections for VM-based snapshotting. > > JFR is a good point and more such changes will likely appear over > time.? CRaC > handles perfdata temp file, which is used to implement jcmd and jps > functionality.? Without special care on the JVM side, the missing > perfdata will > likely prevent the second restore with CRIU (the first restored instance > deletes the file as not needed) or the restore after reboot. > > The Logger example is invaluable to demonstrate why coordination is > needed. > Without knowledge about semantic, it's impossible to distinguish > between e.g. a > log file, previous content of which is not important, and a config > file, which > should be re-read after restore.? So automatic handling of files does > not seem > possible in general.? Some convenience can be implemented (like > automatic log > rotation), but this needs to be done with the awareness of the > semantic and > should allow error handling on the Java application side.? We require to > re-acquire resources at the restore and allow such code to throw > exceptions. > > Thanks, > Anton > > On 7/20/21 6:31 PM, Michael Bien wrote: >> Hello, >> >> great to hear that there is research done in this area. >> >> I did some experimenting myself by just binding to the CRIU C-API via >> panama some time ago[1][2]. It quickly became clear that, although it >> worked surprisingly well, it probably required a lower level approach >> to properly implement it. (I was mostly interested in CRIUs rootless >> mode[3] and restoring warmed up JVMs, which came with its own issues >> and kernel bugs) >> >> Checkpointing the JVM is probably much safer when all threads have >> stopped and more economical when the heap is compacted - the JVM >> itself is in a better position to do that than the java application. >> >> CRIU can't deal with situations when files changed between checkpoint >> and restore. Restoring a java program which is logging to a file will >> only work once, a second attempt would fail since the file changed >> due to the first restore. An API might be able to mitigate a lot of >> this, e.g a logger could rotate the log to a empty file, or close the >> file on checkpoint an reopen it on restore. JFR should do this out of >> the box. I was wondering if the IO stream impl itself could help in >> some situations. >> >> Non-file related APIs might have to be made restore-aware too. For >> example SecureRandom might require re-seeding, keystores/SSL certs >> might need special attention etc. >> >> >> although it worked surprisingly well (restoring was also quite fast), >> implementing it at the java application level would be fairly >> limited. Looking forward to hear/see more from CRaC! >> >> best regards, >> michael >> >> [1] https://github.com/mbien/JCRIU/ >> [2] https://mbien.dev/blog/entry/java-and-rootless-criu-using >> [3] https://github.com/checkpoint-restore/criu/pull/1155 >> >> On 18.07.21 16:48, Anton Kozlov wrote: >>> Hi, >>> >>> It's been a while since we presented Coordinated Restore at >>> Checkpoint for the >>> first time [0].? We are still committed to the idea and researching >>> this topic. >>> >>> Java applications can avoid the long start-up and warm-up by saving >>> the state >>> of the Java runtime (snapshot, checkpoint).? The saved state is then >>> used to >>> start instances fast (restored).? But after the state was saved, the >>> execution >>> environment could change.? Also, if multiple instances are started >>> from the >>> saved state simultaneously, they should obtain some uniqueness, and >>> their >>> executions should diverge at some point. >>> >>> We believe that the practical way to solve these problems is to make >>> Java >>> applications aware of when the state is saved and restored. Then an >>> application will be able to handle environmental changes.? The >>> application will >>> also be able to obtain uniqueness from the environment. >>> >>> The CRaC project aims to research Java API for coordination between >>> application >>> and runtime to save and restore the state.? Runtime should support >>> multiple >>> ways to save the state: virtual machine snapshot, container >>> snapshot, CRIU >>> project on Linux, etc.? We hope to come with an API that is general >>> enough for >>> any underlying mechanism.? We also plan to explore safety checks in >>> the API and >>> runtime, which prevent saving the state if it may not be restored or >>> work >>> correctly after the restore. >>> >>> I propose myself as a Project Lead of the CRaC Project.? If you're >>> interested >>> or want to be the committer, please drop me a message. >>> >>> A fork of JDK [1] would be a starting point of this project. >>> >>> Thanks, >>> Anton >>> >>> [0] >>> https://mail.openjdk.java.net/pipermail/discuss/2020-September/005594.html >>> [1] https://github.com/CRaC/jdk >>> >> From adinn at redhat.com Wed Jul 21 13:27:56 2021 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 21 Jul 2021 14:27:56 +0100 Subject: Call for Discussion: New Project: CRaC In-Reply-To: <0fcaa219-3969-c521-8483-1d620bd6de06@azul.com> References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> <695e1fac-d924-bfcc-1f13-511f35158ba7@redhat.com> <0fcaa219-3969-c521-8483-1d620bd6de06@azul.com> Message-ID: <0a78943f-729e-c85e-4b71-9b2c0bfc15a7@redhat.com> Hi Anton, On 21/07/2021 13:11, Anton Kozlov wrote: > This is a very interesting topic.? The targeted problems are similar, > but the > means and implications are different.? There may be few intersections. > However, I'm not sure how practical they are. Me neither, but I still think it would be well worth considering if there are any commonalities. > Taking this another way, to reduce the start-up of the static Java > program, as > much as possible of execution is performed in the compile time.? From > GraalVM, > sometimes you need human assistance or a profiling run that provides > missing > pieces of information.? The warm-up is solved by analyzing and > optimizing all > the future execution paths. Well, that's how it works at present given the current language spec and behaviour of the JVM, runtime, middleware and apps that are all designed with the expectation of operating in a single continuous dynamic runtime. But the it does not necessarily have to remain that way. One option for Java that would accommodate the static case might be to factor out static state initialization into computation of state that it is acceptable to precompute vs state that needs either to be initialized because it was not precomputed or, if it was already precomputed, re-initialized at runtime. There is also the opportunity to determine when it gets re-initialized -- e.g. before the program starts running or at the point where the program needs to make use of it. It would be interesting to consider how a restart of a frozen JVM state might also profit from a similar mechanism. There is clearly the potential to recompute existing (static) class state when an app is restarted, just as with a static app startup. However, there may also be a need to refresh instance state. > A run with CRaC API could be a profiling phase that constructs 100% of the > state without a complex analysis of the initialization phase.? We don't > need an > analysis if we have a complete result of a single run.? Even if it > caught too > many details of that run, a java program can register beforeCheckpoint > callback > to clear some parts of the state before it is saved.? For example, it > could be > possible to dump java heap, metaspace, etc at the time of "checkpoint", > while > the result is not a real runtime image.? And then use this JVM data as a > starting point for the static Java program compilation, compiling in > afterRestore methods that update the state after the restore in CRaC.? In > theory, any choice that cannot be decided by static analysis from the > initial > run could even be treated as unreachable.? So the amount of information for > static analysis and the outcome of compilation will be regulated by the > extensiveness of the profiling run. Well, yes, currently build time init requires a complex analysis of static init code. However, you would not necessarily need to have a static analysis if the language, runtime, middleware and apps were provided with a mechanism to define what can be pre-computed vs runtime computed vs re-computed. It seems clear that for some apps it will be easy for them to correct their own internal app state using callbacks. However, it is not clear that JDK runtime state or JVM state will always be correct at restart of a frozen app. Certainly not if you transplant the app to a host with a different hardware, OS or process environment. So, I think there is a need to look into how we can make the JDK and JVM play ball here and preferably in much the same way as we need to look at making it play ball with a static compile model. regards, Andrew Dinn ----------- From akozlov at azul.com Wed Jul 21 17:09:21 2021 From: akozlov at azul.com (Anton Kozlov) Date: Wed, 21 Jul 2021 20:09:21 +0300 Subject: Call for Discussion: New Project: CRaC In-Reply-To: <9f19ee53-4bde-3ee7-46af-67269c7d784d@gmail.com> References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> <104c81ed-8d37-7010-16ba-a94b31d65cce@azul.com> <9f19ee53-4bde-3ee7-46af-67269c7d784d@gmail.com> Message-ID: <19e45abd-7776-900a-2832-fbfabfeb1e50@azul.com> I think for CRIU such checks are required since it works with an unaware process. Any change in the environment could be disastrous if the process has done decisions in the past that are incompatible with the new file contentt. File size, access modes are checked, modification time too, IIRC. Content of files could also be checked, but I suppose it is just too expensive. For CRaC so far we require no open files at the checkpoint (so CRIU or any other checkpoint mechanism is not bothered with them). On restore, you have to reopen necessary files, handling all problems along. But if you have an open file, the checkpoint is aborted with an exception. You cannot have an image that may be OK or maybe not if e.g. some java code is not ready to handle a missing file on the restore. So the checkpoint should be successful only if your application state is consistent with any possible future execution environment. That is, the application should not assume anything about the environment at the time of the checkpoint, for the latter to succeed. Thanks, Anton On 7/21/21 3:08 PM, Michael Bien wrote: > I remember from discussions on #criu, CRIU interestingly doesn't actually care that much if a file changed between cp and restore. It has the rudimentary check comparing file size (!) mostly only for the scenario where a loaded lib changed, causing seg faults on restore, in best case. > > Maybe criu could skip changed files instead of failing and notify the JVM. The JVM would apply black magic and let IO streams throw IOE on next read or write - giving the application a chance to recover even without knowing of a CRaC-API. There might be situations where throwing an IOE isn't even needed. > > just a thought i don't know if any of this is doable, > > best regards, > michael > > On 21.07.21 12:36, Anton Kozlov wrote: >> Hi Michael, >> >> Interesting links! >> >> CRIU project did a terrific job in checkpointing and restoring an arbitrary >> process. >> >> But if we think about how to continue the execution of the saved Java runtime >> instance, multiple times simultaneously, the examples are what we should do >> better.? The internal state of runtime, standard library, or application (like >> a crypto random seed) needs fixing after the restore.? External resources could >> not always be captured.? These are files for a process-based checkpoint or >> network connections for VM-based snapshotting. >> >> JFR is a good point and more such changes will likely appear over time.? CRaC >> handles perfdata temp file, which is used to implement jcmd and jps >> functionality.? Without special care on the JVM side, the missing perfdata will >> likely prevent the second restore with CRIU (the first restored instance >> deletes the file as not needed) or the restore after reboot. >> >> The Logger example is invaluable to demonstrate why coordination is needed. >> Without knowledge about semantic, it's impossible to distinguish between e.g. a >> log file, previous content of which is not important, and a config file, which >> should be re-read after restore.? So automatic handling of files does not seem >> possible in general.? Some convenience can be implemented (like automatic log >> rotation), but this needs to be done with the awareness of the semantic and >> should allow error handling on the Java application side.? We require to >> re-acquire resources at the restore and allow such code to throw exceptions. >> >> Thanks, >> Anton >> >> On 7/20/21 6:31 PM, Michael Bien wrote: >>> Hello, >>> >>> great to hear that there is research done in this area. >>> >>> I did some experimenting myself by just binding to the CRIU C-API via panama some time ago[1][2]. It quickly became clear that, although it worked surprisingly well, it probably required a lower level approach to properly implement it. (I was mostly interested in CRIUs rootless mode[3] and restoring warmed up JVMs, which came with its own issues and kernel bugs) >>> >>> Checkpointing the JVM is probably much safer when all threads have stopped and more economical when the heap is compacted - the JVM itself is in a better position to do that than the java application. >>> >>> CRIU can't deal with situations when files changed between checkpoint and restore. Restoring a java program which is logging to a file will only work once, a second attempt would fail since the file changed due to the first restore. An API might be able to mitigate a lot of this, e.g a logger could rotate the log to a empty file, or close the file on checkpoint an reopen it on restore. JFR should do this out of the box. I was wondering if the IO stream impl itself could help in some situations. >>> >>> Non-file related APIs might have to be made restore-aware too. For example SecureRandom might require re-seeding, keystores/SSL certs might need special attention etc. >>> >>> >>> although it worked surprisingly well (restoring was also quite fast), implementing it at the java application level would be fairly limited. Looking forward to hear/see more from CRaC! >>> >>> best regards, >>> michael >>> >>> [1] https://github.com/mbien/JCRIU/ >>> [2] https://mbien.dev/blog/entry/java-and-rootless-criu-using >>> [3] https://github.com/checkpoint-restore/criu/pull/1155 >>> >>> On 18.07.21 16:48, Anton Kozlov wrote: >>>> Hi, >>>> >>>> It's been a while since we presented Coordinated Restore at Checkpoint for the >>>> first time [0].? We are still committed to the idea and researching this topic. >>>> >>>> Java applications can avoid the long start-up and warm-up by saving the state >>>> of the Java runtime (snapshot, checkpoint).? The saved state is then used to >>>> start instances fast (restored).? But after the state was saved, the execution >>>> environment could change.? Also, if multiple instances are started from the >>>> saved state simultaneously, they should obtain some uniqueness, and their >>>> executions should diverge at some point. >>>> >>>> We believe that the practical way to solve these problems is to make Java >>>> applications aware of when the state is saved and restored. Then an >>>> application will be able to handle environmental changes.? The application will >>>> also be able to obtain uniqueness from the environment. >>>> >>>> The CRaC project aims to research Java API for coordination between application >>>> and runtime to save and restore the state.? Runtime should support multiple >>>> ways to save the state: virtual machine snapshot, container snapshot, CRIU >>>> project on Linux, etc.? We hope to come with an API that is general enough for >>>> any underlying mechanism.? We also plan to explore safety checks in the API and >>>> runtime, which prevent saving the state if it may not be restored or work >>>> correctly after the restore. >>>> >>>> I propose myself as a Project Lead of the CRaC Project.? If you're interested >>>> or want to be the committer, please drop me a message. >>>> >>>> A fork of JDK [1] would be a starting point of this project. >>>> >>>> Thanks, >>>> Anton >>>> >>>> [0] https://mail.openjdk.java.net/pipermail/discuss/2020-September/005594.html >>>> [1] https://github.com/CRaC/jdk >>>> >>> > From scott at adligo.com Wed Jul 21 23:22:23 2021 From: scott at adligo.com (Scott Morgan) Date: Wed, 21 Jul 2021 18:22:23 -0500 Subject: implicit 'return this' JCP suggestion In-Reply-To: References: Message-ID: Hi All, I have a JCP suggestion (between my first and 2nd signatures below), and I am wondering if anyone on this list can submit it on my behalf, or if they are interested in collaboration on it. Cheers and TIA!, Scott Summary ------- Implicit return of this; Provide better syntactic sugar for method chaining in Java, through allowing the 'this' keyword to be used in the return class slot of Java methods. In addition methods marked as returning 'this' would implicitly return this with out the explicit 'return this;' statement at the end of the method. For example; public interface MyInterface { this setFoo(String foo); } public class MyClass implements MyInterface { private String foo; public this setFoo(String foo) { this.foo = foo; //note the implicit return this; } } public class MyChildClass extends MyClass { // NO NEED FOR THIS METHOD NOW YEA, it now returns MyChildClass! // public MyChildClass setFoo(String foo) { // super.setFoo(foo); // return this; // } } Goals ----- The primary goal is to make Java easier to use, less typing required and improved readability. Removal of repetitive code. Non-Goals --------- This is not designed to promote method chaining, as that is subjective. However for the developers who like method chaining, it should improve their experience with Java. Success Metrics --------------- Take a vote of JCP members to see if they think this is a good idea, for developer productivity with the language. Motivation ---------- I have been spending a lot of time overriding methods in child classes in order to make all of the methods return the same type. This creates a lot of nearly identical code, violating the DRY principles (don't repeat yourself). The only difference is the return type in the leaf most class of the class parent child tree. Description ----------- I'm hoping someone else will implement this JEP. However I think it would be fairly straight forward step in the java compilers. When the compiler encounters a 'this' in the return type slot it would return the current class. For all subclasses and implementations the compiler would have bytecode generated that would call the super method and then return this. Alternatives ------------ I haven't considered any alternatives. The current work around is pretty repetitive. Testing ------- The parent child classes in the Summary of this JEP should suffice. Risks and Assumptions --------------------- I don't see any risks. I am assuming that this will NOT require a change to any bytecode, only a change to the language. Dependencies ------------ This would require a change to all Java compliers. -- Regards, Scott Morgan President & CEO Adligo Inc http://www.adligo.com https://www.linkedin.com/in/scott-morgan-21739415 A+ Better Business Bureau Rating https://github.com/adligo By Appointment Only: 1-866-968-1893 Ex 101 scott at adligo.com skype:adligo1?call Send Me Files Securely: *https://www.sendthisfile.com/f.jsp?id=ewOnyeFQM18IDRf7MMIdolfI * From brian.goetz at oracle.com Thu Jul 22 16:07:02 2021 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 22 Jul 2021 12:07:02 -0400 Subject: implicit 'return this' JCP suggestion In-Reply-To: References: Message-ID: <1ed02433-2b0f-5ce3-b9bc-be89d0dcde30@oracle.com> This is well-traveled ground. Clearly you could write your setter to return its receiver with today's language.? In reality, the elision of 'return this' is a pretty small benefit; in reality, the ihe implicit goal of your proposal is to make the transition from void-return to self-return for _existing libraries_ a _binary-compatible_ change, by having self-return be handled on the client side (permission to reuse the receiver). But this still has a deficit, which is that libraries must explicitly perform the migration from void-return to self-return, and many will not, either because they have other priorities, or because they don't want to tie themselves to the Java 18+ classfile version (limiting the applicability of their library.)? This means that it will be the better part of a decade before most libraries have migrated, with the result that an inconsistent mix of libraries will support it, to the frustration of users. An alternate approach that has been floated is to simply permit client-side receiver chaining on void-return methods: ?? foo.setX(3).setY(4) by simply reusing the receiver when chaining void methods. But still, this has a deficit; the real payoff for such a feature would be that one could create, set up, and return an object _in a single expression_.? The above only gets us halfway there; instead of: ?? Foo f = new Foo(); ?? f.setX(3); ?? f.setY(4); ?? use(f); we can do: ?? Foo f = new Foo(); ?? f.setX(3).setY(4); ?? use(f); But this only eliminates the lesser of the annoyances; the biggie is that we cannot create the Foo and initialize it in one go: ??? use(new Foo().setX(3).setY(4).WHAT_GOES_HERE()) To get there, we'd have to further interpret x.m(), where m() is a void method, as evaluating to x: ??? use(new Foo().setX(3).setY(4)) which is now a more invasive change (and conflicts with the desire to make `void` an actual type under Project Valhalla.) Further, under such a scheme, it is likely that existing API decisions will give users grief.? For example, Collection::add is not a void method, it returns a boolean indicating whether a change was made or not.? So users will surely try: ??? new ArrayList<>().add(3).add(4) and be surprised it doesn't work.? (Moral: mid-game rule changes are always more disruptive than you think.) In summary, while the goal of this proposal is well-intentioned, and parts of the proposed solution are workable, when you work through the details, you end up somewhere that is more invasive than expected, for a relatively small benefit. On 7/21/2021 7:22 PM, Scott Morgan wrote: > Hi All, > > I have a JCP suggestion (between my first and 2nd signatures below), and > I am wondering if anyone on this list can submit it on my behalf, or if > they are interested in collaboration on it. > > Cheers and TIA!, > Scott > > Summary > > ------- > > > > Implicit return of this; > > > > Provide better syntactic sugar for method chaining in Java, through > allowing the 'this' keyword to be used > > in the return class slot of Java methods. In addition methods marked as > returning 'this' would implicitly return > > this with out the explicit 'return this;' statement at the end of the > method. For example; > > > > public interface MyInterface { > > this setFoo(String foo); > > } > > > > public class MyClass implements MyInterface { > > private String foo; > > > > public this setFoo(String foo) { > > this.foo = foo; > > //note the implicit return this; > > } > > } > > > > public class MyChildClass extends MyClass { > > // NO NEED FOR THIS METHOD NOW YEA, it now returns MyChildClass! > > // public MyChildClass setFoo(String foo) { > > // super.setFoo(foo); > > // return this; > > // } > > } > > > > Goals > > ----- > > The primary goal is to make Java easier to use, less typing required and > improved readability. > > Removal of repetitive code. > > > > Non-Goals > > --------- > > > > This is not designed to promote method chaining, as that is subjective. > However for the developers > > who like method chaining, it should improve their experience with Java. > > > > Success Metrics > > --------------- > > > > Take a vote of JCP members to see if they think this is a good idea, for > > developer productivity with the language. > > > > Motivation > > ---------- > > > > I have been spending a lot of time overriding methods in child classes in > order to > > make all of the methods return the same type. This creates a lot of nearly > identical > > code, violating the DRY principles (don't repeat yourself). The only > difference is the > > return type in the leaf most class of the class parent child tree. > > > > Description > > ----------- > > > > I'm hoping someone else will implement this JEP. However I think > > it would be fairly straight forward step in the java compilers. > > When the compiler encounters a 'this' in the return type slot > > it would return the current class. For all subclasses and > > implementations the compiler would have bytecode generated that would > > call the super method and then return this. > > > > Alternatives > > ------------ > > > > I haven't considered any alternatives. The current work around is pretty > repetitive. > > > > > > Testing > > ------- > > The parent child classes in the Summary of this JEP should suffice. > > > > Risks and Assumptions > > --------------------- > > I don't see any risks. I am assuming that this will NOT require a change > to any bytecode, > > only a change to the language. > > > > Dependencies > > ------------ > > This would require a change to all Java compliers. > From akozlov at azul.com Thu Jul 22 19:17:29 2021 From: akozlov at azul.com (Anton Kozlov) Date: Thu, 22 Jul 2021 22:17:29 +0300 Subject: Call for Discussion: New Project: CRaC In-Reply-To: References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> Message-ID: <16f8b4c7-027b-d80b-96f3-e53a9e02a442@azul.com> On 7/19/21 7:46 PM, Volker Simonis wrote: > We were wondering if we could use the API which you've proposed in > your initial post (i.e. jdk.crac [3]) to notify the JDK and > applications of suspend and resume events. In the case of Firecracker, > the source of these events would be either the kernel (through a > System Generation ID kernel driver [5] or SystemD [6]). There is > ongoing work to push these mechanisms into the respective upstream > projects but SystemD "inhibitors" [7,8] events could be used already > now to trigger the callbacks of the envisioned API. I would prefer a pure file interface to something systemd specific (a file managed by the systemd is OK). In this case, the coordination could be implemented in the JVM without any dependencies, or be done in pure Java. For example, a thread waits for updates from the special file, triggers CRaC's beforeCheckpoint's, signals OK to the snapshotting. However, I don't see how prepare-for-snapshot is communicated via the kernel file. The LKML doc suggests only a workload-specific way [1, Example snapshot-safe workflow, point 1]. The interface seems to provide only notification after the VM is resumed. How the systemd would help is not clear, since the inhibitors are locks, how would it be possible to know the lock should be taken to run beforeCheckpoint's? From CRaC's side, it would be possible to break a single checkpointRestore method [1] into two steps (one calls beforeCheckpoint's and another calls afterRestore's). Steps can be exposed via API and e.g. jcmd. Will that help running beforeCheckpoint's before the actual snapshot is taken, e.g. to clear up the state from secrets? > There are several issues which we are currently investigating and > which we'd like to discuss in this project: > - Doe's it make sense to add timeouts (or a TimeoutException) to the > proposed API? This seems reasonable in some cases. But for some, you may want synchronous execution of callbacks that may take arbitrarily long. E.g. you want to shut down a web server that is connected to a database, and you want to be sure all clients are served before shutting down the DB connection. So a resource with the timeout should be probably done on top of synchronous Resource notification. Something like a separate thread that is waited with the timeout and is interrupted after the time is out. The callback is not guaranteed to stop the execution, but the Context will know the callback has failed to finish in time immediately. It looks good that Resources restricted in time will specify the timeout by themselves. > - How to deal with Pseudo Random Generators like j.u.Random? They are > specified to be deterministic and applications might rely on this > determinism. But we might also run into problems if several, cloned > JVM instances are using the same random values (e.g. as UIDs). Apparently, only a user for j.u.Random can distinguish two cases. At least, a Random that should provide distinct random values can be manually re-seeded after the restore. Probably, it's possible to differentiate two classes of j.u.Random instances (with deterministic outputs after restore and ones without) and handle them automatically by looking were they constructed with the seed or not. But this needs to be checked thoroughly. > - How to make the JVM/JDK behave gracefully after "time-jumps". I assume there should be no correctness problems, as the time-jump does not substantially differ from a time spent off-CPU due to OS scheduling. Some internal counters could overflow, but this does not look more than just a bug that needs fixing. However, I saw cases when CRIU did restore monotonic clock that broke timed waits, causing 100% of CPU loaded with an improper time limit. After not restoring the clock completely, the issue has gone away. That brought us again to the time jump, which was correctly handled. > - Is there anything special required to make the JVM "snap-safe" if > checkpointing can be initiated from outside the JVM at any arbitrary > time. Now in the CRaC, after beforeCheckpoint's have run, the actual checkpoint is done while JVM is in the safepoint. There we check that there are no open files, sockets and then call CRIU against the Java process. I'm not aware of problems with snapshotting the process at an arbitrary moment, so the safepoint matters only for the checks. Thanks, Anton [1] https://lkml.org/lkml/2021/3/8/677 [2] https://github.com/CRaC/jdk/blob/jdk-crac/src/java.base/share/classes/jdk/crac/Core.java#L102 From forax at univ-mlv.fr Thu Jul 22 20:31:35 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 22 Jul 2021 22:31:35 +0200 (CEST) Subject: implicit 'return this' JCP suggestion In-Reply-To: <1ed02433-2b0f-5ce3-b9bc-be89d0dcde30@oracle.com> References: <1ed02433-2b0f-5ce3-b9bc-be89d0dcde30@oracle.com> Message-ID: <942697929.339349.1626985895188.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Brian Goetz" > To: "Scott Morgan" , "discuss" > Sent: Jeudi 22 Juillet 2021 18:07:02 > Subject: Re: implicit 'return this' JCP suggestion > This is well-traveled ground. > > Clearly you could write your setter to return its receiver with today's > language.? In reality, the elision of 'return this' is a pretty small > benefit; in reality, the ihe implicit goal of your proposal is to make > the transition from void-return to self-return for _existing libraries_ > a _binary-compatible_ change, by having self-return be handled on the > client side (permission to reuse the receiver). > > But this still has a deficit, which is that libraries must explicitly > perform the migration from void-return to self-return, and many will > not, either because they have other priorities, or because they don't > want to tie themselves to the Java 18+ classfile version (limiting the > applicability of their library.)? This means that it will be the better > part of a decade before most libraries have migrated, with the result > that an inconsistent mix of libraries will support it, to the > frustration of users. > > An alternate approach that has been floated is to simply permit > client-side receiver chaining on void-return methods: > > ?? foo.setX(3).setY(4) > > by simply reusing the receiver when chaining void methods. > > But still, this has a deficit; the real payoff for such a feature would > be that one could create, set up, and return an object _in a single > expression_.? The above only gets us halfway there; instead of: > > ?? Foo f = new Foo(); > ?? f.setX(3); > ?? f.setY(4); > ?? use(f); > > we can do: > > ?? Foo f = new Foo(); > ?? f.setX(3).setY(4); > ?? use(f); > > But this only eliminates the lesser of the annoyances; the biggie is > that we cannot create the Foo and initialize it in one go: > > ??? use(new Foo().setX(3).setY(4).WHAT_GOES_HERE()) > > To get there, we'd have to further interpret x.m(), where m() is a void > method, as evaluating to x: > > ??? use(new Foo().setX(3).setY(4)) > > which is now a more invasive change (and conflicts with the desire to > make `void` an actual type under Project Valhalla.) > > Further, under such a scheme, it is likely that existing API decisions > will give users grief.? For example, Collection::add is not a void > method, it returns a boolean indicating whether a change was made or > not.? So users will surely try: > > ??? new ArrayList<>().add(3).add(4) > > and be surprised it doesn't work.? (Moral: mid-game rule changes are > always more disruptive than you think.) > > In summary, while the goal of this proposal is well-intentioned, and > parts of the proposed solution are workable, when you work through the > details, you end up somewhere that is more invasive than expected, for a > relatively small benefit. Also, the builder patterns is used more in Java than in other languages because there is no keyword arguments in Java. As part of Amber, it would be cool to have a way to match Map entries, this can be done using an ad-hoc syntax but it can also be done by saying that exactly like we use the dual of the varargs syntax to match arrays, we should use the dual of the keyword argument syntax to match Map entries. So introducing both a keyword argument syntax and a keyword matching syntax at the same time may make sense. R?mi > > > On 7/21/2021 7:22 PM, Scott Morgan wrote: >> Hi All, >> >> I have a JCP suggestion (between my first and 2nd signatures below), and >> I am wondering if anyone on this list can submit it on my behalf, or if >> they are interested in collaboration on it. >> >> Cheers and TIA!, >> Scott >> >> Summary >> >> ------- >> >> >> >> Implicit return of this; >> >> >> >> Provide better syntactic sugar for method chaining in Java, through >> allowing the 'this' keyword to be used >> >> in the return class slot of Java methods. In addition methods marked as >> returning 'this' would implicitly return >> >> this with out the explicit 'return this;' statement at the end of the >> method. For example; >> >> >> >> public interface MyInterface { >> >> this setFoo(String foo); >> >> } >> >> >> >> public class MyClass implements MyInterface { >> >> private String foo; >> >> >> >> public this setFoo(String foo) { >> >> this.foo = foo; >> >> //note the implicit return this; >> >> } >> >> } >> >> >> >> public class MyChildClass extends MyClass { >> >> // NO NEED FOR THIS METHOD NOW YEA, it now returns MyChildClass! >> >> // public MyChildClass setFoo(String foo) { >> >> // super.setFoo(foo); >> >> // return this; >> >> // } >> >> } >> >> >> >> Goals >> >> ----- >> >> The primary goal is to make Java easier to use, less typing required and >> improved readability. >> >> Removal of repetitive code. >> >> >> >> Non-Goals >> >> --------- >> >> >> >> This is not designed to promote method chaining, as that is subjective. >> However for the developers >> >> who like method chaining, it should improve their experience with Java. >> >> >> >> Success Metrics >> >> --------------- >> >> >> >> Take a vote of JCP members to see if they think this is a good idea, for >> >> developer productivity with the language. >> >> >> >> Motivation >> >> ---------- >> >> >> >> I have been spending a lot of time overriding methods in child classes in >> order to >> >> make all of the methods return the same type. This creates a lot of nearly >> identical >> >> code, violating the DRY principles (don't repeat yourself). The only >> difference is the >> >> return type in the leaf most class of the class parent child tree. >> >> >> >> Description >> >> ----------- >> >> >> >> I'm hoping someone else will implement this JEP. However I think >> >> it would be fairly straight forward step in the java compilers. >> >> When the compiler encounters a 'this' in the return type slot >> >> it would return the current class. For all subclasses and >> >> implementations the compiler would have bytecode generated that would >> >> call the super method and then return this. >> >> >> >> Alternatives >> >> ------------ >> >> >> >> I haven't considered any alternatives. The current work around is pretty >> repetitive. >> >> >> >> >> >> Testing >> >> ------- >> >> The parent child classes in the Summary of this JEP should suffice. >> >> >> >> Risks and Assumptions >> >> --------------------- >> >> I don't see any risks. I am assuming that this will NOT require a change >> to any bytecode, >> >> only a change to the language. >> >> >> >> Dependencies >> >> ------------ >> >> This would require a change to all Java compliers. From mbien42 at gmail.com Thu Jul 22 20:51:53 2021 From: mbien42 at gmail.com (Michael Bien) Date: Thu, 22 Jul 2021 22:51:53 +0200 Subject: Call for Discussion: New Project: CRaC In-Reply-To: <16f8b4c7-027b-d80b-96f3-e53a9e02a442@azul.com> References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> <16f8b4c7-027b-d80b-96f3-e53a9e02a442@azul.com> Message-ID: <61279d07-d00e-f29e-cd59-38e2ab1608c8@gmail.com> On 22.07.21 21:17, Anton Kozlov wrote: >> - How to make the JVM/JDK behave gracefully after "time-jumps". > > I assume there should be no correctness problems, as the time-jump > does not > substantially differ from a time spent off-CPU due to OS scheduling.? > Some > internal counters could overflow, but this does not look more than > just a bug > that needs fixing. this might certainly cause some interesting issues, e.g GC ergonomics getting confused after thinking the last pause lasted 5 days :) That is another aspect why I believe the only way to properly implement this is with cooperation of the JVM. CRIU via panama was nice for experiments but it would never be reliable. > > However, I saw cases when CRIU did restore monotonic clock that broke > timed > waits, causing 100% of CPU loaded with an improper time limit. After not > restoring the clock completely, the issue has gone away.? That brought > us again > to the time jump, which was correctly handled. if we are thinking of the same bug, this was fixed in linux 5.10 (https://lkml.org/lkml/2020/10/15/582 ) - possibly also backported. After 5.10 I never encountered 100% load after restoring JVMs again. -michael From akozlov at azul.com Fri Jul 23 07:38:15 2021 From: akozlov at azul.com (Anton Kozlov) Date: Fri, 23 Jul 2021 10:38:15 +0300 Subject: Call for Discussion: New Project: CRaC In-Reply-To: <0a78943f-729e-c85e-4b71-9b2c0bfc15a7@redhat.com> References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> <695e1fac-d924-bfcc-1f13-511f35158ba7@redhat.com> <0fcaa219-3969-c521-8483-1d620bd6de06@azul.com> <0a78943f-729e-c85e-4b71-9b2c0bfc15a7@redhat.com> Message-ID: On 7/21/21 4:27 PM, Andrew Dinn wrote: > > Well, that's how it works at present given the current language spec and behaviour of the JVM, runtime, middleware and apps that are all designed with the expectation of operating in a single continuous dynamic runtime. But the it does not necessarily have to remain that way. > > One option for Java that would accommodate the static case might be to factor out static state initialization into computation of state that it is acceptable to precompute vs state that needs either to be initialized because it was not precomputed or, if it was already precomputed, re-initialized at runtime. There is also the opportunity to determine when it gets re-initialized -- e.g. before the program starts running or at the point where the program needs to make use of it. > ... > Well, yes, currently build time init requires a complex analysis of static init code. However, you would not necessarily need to have a static analysis if the language, runtime, middleware and apps were provided with a mechanism to define what can be pre-computed vs runtime computed vs re-computed. New annotations sound like a substantial change in the language first. I don't sure I completely understand re-compute, and why is it needed if you have a runtime initialization. It's either equivalent to the callbacks, which are rather simple and provide a way to change parts of the state. Or it is capable to automatically track dependencies from the re-computed execution, potentially creating two "worlds": before and after recompute. For example, what happens if there is a singleton object, referenced from a static field and it is swapped during re-computation? What happends with another instances that have seen the previous singleton? Or if the singleton is referenced from a field of an instance that is referenced from a static field, is it the same? > It would be interesting to consider how a restart of a frozen JVM state might also profit from a similar mechanism. There is clearly the potential to recompute existing (static) class state when an app is restarted, just as with a static app startup. However, there may also be a need to refresh instance state. If there would be a magic mechanism that automatically decides what needs to be changed before run time, it could be re-used for, roughly speaking, for automatic generation of CRaC API callbacks. But, it should not invalidate too much, eleminating the benefit of having the complete saved state that is ready to run. Ability to manually write the code to prepare for checkpoint and to update the state after restore provide better control over the state, and the behavior and performance in the run time is clearer for a user. Interesting, it seems possible to express static java app start-up with CRaC, like doPrecompute(); // generated by javac jdk.crac.Core.checkpointRestore(); // also handles re-compute? doRuntimeCompute(); // generated by javac main(argc, argv); // some computatations probably are re-computed lazily If we have a magic checkpoint/restore mechanism that optimizes the state and compiles future executions, we'll have a static app. But with CRaC, the `checkpointRestore` call as well as updpating the state can happen after the main method has started and can capture some of actual runtime behavior. > It seems clear that for some apps it will be easy for them to correct their own internal app state using callbacks. However, it is not clear that JDK runtime state or JVM state will always be correct at restart of a frozen app. Certainly not if you transplant the app to a host with a different hardware, OS or process environment. > > So, I think there is a need to look into how we can make the JDK and JVM play ball here and preferably in much the same way as we need to look at making it play ball with a static compile model. In CRaC, JVM and JDK are involed in the saving and restoring the state now. They will very likely remain so with external request to save the state (e.g. VM snapshot). The footprint of the required changes is rather small, I assume much smaller than would be required for the static image. JDK library has a few resources that needs releasing on checkpoint and acquire on restore. It does not differ much in this sense from an application or framework. JVM synchronizes own state with the checkpoint, also to run safety checks for resources. Such checks and the implementation are intended to provide some independence from the process environment. In a proper implementation of CRaC, JVM and JDK should be correct if restored on the same CPU and the same operating system. This could be relaxed into one or another way. In theory, leaving JNI aside, it's possible to imagine a fully abstract checkpoint/restore mechanism, that saves the runtime state in an abstract form, that can be restored on another CPU and OS. But it's is not a goal for the CRaC project. But more practical mechanism would sacrifice generality for ability to use existing generated code. For CRaC we especially hope to reuse JIT compiled code. Handling minor differences between CPU featres on the same arch is what we likely need to do in the Project. For example, this can be done by using a conservative set before the checkpoint and switching to the full set after the restore. Template interpreter and JIT code could be optionally regenerated, likely while the old versions are still executed. For the JIT code, this does not seem hard. Thanks, Anton From michal at kleczek.org Fri Jul 23 07:50:04 2021 From: michal at kleczek.org (=?utf-8?Q?Micha=C5=82_K=C5=82eczek?=) Date: Fri, 23 Jul 2021 09:50:04 +0200 Subject: Call for Discussion: New Project: CRaC In-Reply-To: References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> <695e1fac-d924-bfcc-1f13-511f35158ba7@redhat.com> <0fcaa219-3969-c521-8483-1d620bd6de06@azul.com> <0a78943f-729e-c85e-4b71-9b2c0bfc15a7@redhat.com> Message-ID: <660DD485-4048-4579-9677-BA07B233F666@kleczek.org> > > In theory, leaving JNI aside, it's possible to imagine a fully abstract > checkpoint/restore mechanism, that saves the runtime state in an abstract form, > that can be restored on another CPU and OS. I would say this overlaps with serialisation a lot (my 2 cents). ? Michal From akozlov at azul.com Fri Jul 23 07:56:24 2021 From: akozlov at azul.com (Anton Kozlov) Date: Fri, 23 Jul 2021 10:56:24 +0300 Subject: Call for Discussion: New Project: CRaC In-Reply-To: <61279d07-d00e-f29e-cd59-38e2ab1608c8@gmail.com> References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> <16f8b4c7-027b-d80b-96f3-e53a9e02a442@azul.com> <61279d07-d00e-f29e-cd59-38e2ab1608c8@gmail.com> Message-ID: <9548bd9a-4523-3bfe-f5a5-6716ebea3479@azul.com> On 7/22/21 11:51 PM, Michael Bien wrote: > > On 22.07.21 21:17, Anton Kozlov wrote: >>> - How to make the JVM/JDK behave gracefully after "time-jumps". >> >> I assume there should be no correctness problems, as the time-jump does not >> substantially differ from a time spent off-CPU due to OS scheduling.? Some >> internal counters could overflow, but this does not look more than just a bug >> that needs fixing. > this might certainly cause some interesting issues, e.g GC ergonomics getting confused after thinking the last pause lasted 5 days :) Heuristics may suffer, agree. Not sure if this is the case now (in CRaC), since the time spent in checkpoint should not be attributed to the GC pause. But this or similar issues are possible. Testing will be required with the possible tuning after. > > if we are thinking of the same bug, this was fixed in linux 5.10 (https://lkml.org/lkml/2020/10/15/582 ) - possibly also backported. After 5.10 I never encountered 100% load after restoring JVMs again. This looks very relevant. I didn't dig down to the root of the problem, but the description is very close. Thanks, good to know. Thanks, Anton From vladimir.kozlov at oracle.com Tue Jul 27 17:07:56 2021 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 27 Jul 2021 10:07:56 -0700 Subject: Call for Discussion: New Project: CRaC In-Reply-To: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> Message-ID: The HotSpot group will participate in sponsoring this project. Regards, Vladimir Kozlov On 7/18/21 7:48 AM, Anton Kozlov wrote: > Hi, > > It's been a while since we presented Coordinated Restore at Checkpoint for the > first time [0].? We are still committed to the idea and researching this topic. > > Java applications can avoid the long start-up and warm-up by saving the state > of the Java runtime (snapshot, checkpoint).? The saved state is then used to > start instances fast (restored).? But after the state was saved, the execution > environment could change.? Also, if multiple instances are started from the > saved state simultaneously, they should obtain some uniqueness, and their > executions should diverge at some point. > > We believe that the practical way to solve these problems is to make Java > applications aware of when the state is saved and restored.? Then an > application will be able to handle environmental changes.? The application will > also be able to obtain uniqueness from the environment. > > The CRaC project aims to research Java API for coordination between application > and runtime to save and restore the state.? Runtime should support multiple > ways to save the state: virtual machine snapshot, container snapshot, CRIU > project on Linux, etc.? We hope to come with an API that is general enough for > any underlying mechanism.? We also plan to explore safety checks in the API and > runtime, which prevent saving the state if it may not be restored or work > correctly after the restore. > > I propose myself as a Project Lead of the CRaC Project.? If you're interested > or want to be the committer, please drop me a message. > > A fork of JDK [1] would be a starting point of this project. > > Thanks, > Anton > > [0] https://mail.openjdk.java.net/pipermail/discuss/2020-September/005594.html > [1] https://github.com/CRaC/jdk > From akozlov at azul.com Tue Jul 27 20:45:23 2021 From: akozlov at azul.com (Anton Kozlov) Date: Tue, 27 Jul 2021 23:45:23 +0300 Subject: Call for Discussion: New Project: CRaC In-Reply-To: References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> Message-ID: <128c612b-be0f-5b01-0c3e-7b4ccff2754a@azul.com> On 7/27/21 8:07 PM, Vladimir Kozlov wrote: > The HotSpot group will participate in sponsoring this project. Vladimir, thank you and the HotSpot group! Now I can send the Project Proposal [1] (Step 1). I'll do this in a couple of days, in case of late comments are coming. Thanks, Anton [1] https://openjdk.java.net/projects/#new-project From rs at jelastic.com Wed Jul 28 08:16:53 2021 From: rs at jelastic.com (Ruslan Synytsky) Date: Wed, 28 Jul 2021 11:16:53 +0300 Subject: Call for Discussion: New Project: CRaC Message-ID: Hi Anton, thank you for bringing this discussion up. At Jelastic, we have been using CRIU technology for many years in combination with various application runtimes, mainly for live migration. The ability to speed up the startup time of the Java runtime is certainly an interesting feature for cloud service providers and their customers. We would like to participate in the development and testing of this improvement. I also forwarded this thread to Virtuozzo, the team that invented CRIU, for getting their support if needed. Tech question: what do you think about the need to adjust the heap size after restoration from a checkpointed runtime? As I understand, in some cases, the restored runtimes may need different heap size compared to the initial runtime from which the state was saved. There is a JEP https://openjdk.java.net/jeps/8204088 that might be relevant to this discussion. Regards -- Ruslan Synytsky CEO @ Jelastic Multi-Cloud PaaS On 7/18/21 7:48 AM, Anton Kozlov wrote: >* Hi, *> >* It's been a while since we presented Coordinated Restore at Checkpoint for the *>* first time [0]. We are still committed to the idea and researching this topic. *> >* Java applications can avoid the long start-up and warm-up by saving the state *>* of the Java runtime (snapshot, checkpoint). The saved state is then used to *>* start instances fast (restored). But after the state was saved, the execution *>* environment could change. Also, if multiple instances are started from the *>* saved state simultaneously, they should obtain some uniqueness, and their *>* executions should diverge at some point. *> >* We believe that the practical way to solve these problems is to make Java *>* applications aware of when the state is saved and restored. Then an *>* application will be able to handle environmental changes. The application will *>* also be able to obtain uniqueness from the environment. *> >* The CRaC project aims to research Java API for coordination between application *>* and runtime to save and restore the state. Runtime should support multiple *>* ways to save the state: virtual machine snapshot, container snapshot, CRIU *>* project on Linux, etc. We hope to come with an API that is general enough for *>* any underlying mechanism. We also plan to explore safety checks in the API and *>* runtime, which prevent saving the state if it may not be restored or work *>* correctly after the restore. *> >* I propose myself as a Project Lead of the CRaC Project. If you're interested *>* or want to be the committer, please drop me a message. *> >* A fork of JDK [1] would be a starting point of this project. *> >* Thanks, *>* Anton *> >* [0] https://mail.openjdk.java.net/pipermail/discuss/2020-September/005594.html *>* [1] https://github.com/CRaC/jdk * From heidinga at redhat.com Wed Jul 28 14:23:03 2021 From: heidinga at redhat.com (Dan Heidinga) Date: Wed, 28 Jul 2021 10:23:03 -0400 Subject: Call for Discussion: New Project: CRaC In-Reply-To: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> Message-ID: Hi, I'm a little late to the conversation but wanted to add that the Eclipse OpenJ9 project [1] is interested in collaborating in this area as well. We've been exploring implementing a checkpoint/restore mechanism at the JVM level [2] along with the required language-level Lifecycle APIs ("hooks") to allow pre-snapshot / post-restore fixups. Recently, we've shifted gears to base our efforts on CRIU as the checkpoint mechanism to make faster progress on the hooks support. There's overlap between the various approaches in this space and we see a lot of benefit to standardizing the language support for saving/restoring state. Count us in as interested in engaging in this project. --Dan [1] https://github.com/eclipse-openj9/openj9 [2] https://danheidinga.github.io/Everyone_wants_fast_startup/ On Sun, Jul 18, 2021 at 10:49 AM Anton Kozlov wrote: > > Hi, > > It's been a while since we presented Coordinated Restore at Checkpoint for the > first time [0]. We are still committed to the idea and researching this topic. > > Java applications can avoid the long start-up and warm-up by saving the state > of the Java runtime (snapshot, checkpoint). The saved state is then used to > start instances fast (restored). But after the state was saved, the execution > environment could change. Also, if multiple instances are started from the > saved state simultaneously, they should obtain some uniqueness, and their > executions should diverge at some point. > > We believe that the practical way to solve these problems is to make Java > applications aware of when the state is saved and restored. Then an > application will be able to handle environmental changes. The application will > also be able to obtain uniqueness from the environment. > > The CRaC project aims to research Java API for coordination between application > and runtime to save and restore the state. Runtime should support multiple > ways to save the state: virtual machine snapshot, container snapshot, CRIU > project on Linux, etc. We hope to come with an API that is general enough for > any underlying mechanism. We also plan to explore safety checks in the API and > runtime, which prevent saving the state if it may not be restored or work > correctly after the restore. > > I propose myself as a Project Lead of the CRaC Project. If you're interested > or want to be the committer, please drop me a message. > > A fork of JDK [1] would be a starting point of this project. > > Thanks, > Anton > > [0] https://mail.openjdk.java.net/pipermail/discuss/2020-September/005594.html > [1] https://github.com/CRaC/jdk > From akozlov at azul.com Wed Jul 28 18:45:49 2021 From: akozlov at azul.com (Anton Kozlov) Date: Wed, 28 Jul 2021 21:45:49 +0300 Subject: Call for Discussion: New Project: CRaC In-Reply-To: References: Message-ID: Hi Ruslan, On 7/28/21 11:16 AM, Ruslan Synytsky wrote: > Tech question: what do you think about the need to adjust the heap size > after restoration from a checkpointed runtime? As I understand, in some > cases, the restored runtimes may need different heap size compared to the > initial runtime from which the state was saved. There is a JEP > https://openjdk.java.net/jeps/8204088 that might be relevant to this > discussion. Before saving the state, in GCs where it was easy to implement, we uncommit unused parts of the heap. In other cases (except ZGC), we re-commit parts of the heap, so the RSS is still low. The driver was to avoid saving garbage, but this also makes RSS size equal to the size of the live set of the heap. Coordination with checkpoint/restore mechanism includes JVM, so there is a trigger to give up resources that may be unneeded after restoring the state. Resizing the heap, in general, seems to be a lot of effort. However, if the implementation for the enhancement would exist, it likely could be reused for what we need and do. Thanks, Anton From rs at jelastic.com Wed Jul 28 19:46:30 2021 From: rs at jelastic.com (Ruslan Synytsky) Date: Wed, 28 Jul 2021 22:46:30 +0300 Subject: Call for Discussion: New Project: CRaC In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 at 21:46, Anton Kozlov wrote: > Hi Ruslan, > > On 7/28/21 11:16 AM, Ruslan Synytsky wrote: > > Tech question: what do you think about the need to adjust the heap size > > after restoration from a checkpointed runtime? As I understand, in some > > cases, the restored runtimes may need different heap size compared to the > > initial runtime from which the state was saved. There is a JEP > > https://openjdk.java.net/jeps/8204088 that might be relevant to this > > discussion. > > > Before saving the state, in GCs where it was easy to implement, we uncommit > unused parts of the heap. In other cases (except ZGC), we re-commit parts > of > the heap, so the RSS is still low. The driver was to avoid saving garbage, > but > this also makes RSS size equal to the size of the live set of the heap. > Coordination with checkpoint/restore mechanism includes JVM, so there is a > trigger to give up resources that may be unneeded after restoring the > state. > Hi Anton, very good, it sounds like a major use case for the memory uncommit improvements that were introduced in different GCs. > > Resizing the heap, in general, seems to be a lot of effort. However, if the > implementation for the enhancement would exist, it likely could be reused > for > what we need and do. > Thank you for the confirmation. While it's not a blocker, in my opinion the ability to adjust the Xmx after restoring the state with respect to the new environment requirements will unlock even more outcome from CRaC and reduce overhead on the orchestration level. It's much more flexible and cost efficient to have one state template which can be restored with different memory limits compared to storing multiple identical templates with a possible variety of heap sizes. Indeed, it's a complex topic, but as a good news, Rodrigo Bruno (cc'd) has implemented a working prototype for changing Xmx on the fly. And it did not cost too much effort yet, at least for now. So, we have made some progress on this well. Also, it will be useful to get a better understanding of what can be improved outside of JVM, on the container level. It may help us to avoid fixing the issues that should be resolved in general or can be resolved easily on the underlying level. Any thoughts and ideas on this are welcome as well. Regards > Thanks, > Anton > -- Ruslan Synytsky CEO @ Jelastic Multi-Cloud PaaS From akozlov at azul.com Thu Jul 29 07:36:30 2021 From: akozlov at azul.com (Anton Kozlov) Date: Thu, 29 Jul 2021 10:36:30 +0300 Subject: Call for Discussion: New Project: CRaC In-Reply-To: References: <71d4c0c3-f9d9-ed0d-29f5-51bac94d24e9@azul.com> Message-ID: <5886e5a0-182c-72ce-2e0c-f17e0fed4c2d@azul.com> Hi Dan, On 7/28/21 5:23 PM, Dan Heidinga wrote: > Hi, I'm a little late to the conversation but wanted to add that the > Eclipse OpenJ9 project [1] is interested in collaborating in this area > as well. > > We've been exploring implementing a checkpoint/restore mechanism at > the JVM level [2] along with the required language-level Lifecycle > APIs ("hooks") to allow pre-snapshot / post-restore fixups. Recently, > we've shifted gears to base our efforts on CRIU as the checkpoint > mechanism to make faster progress on the hooks support. There's > overlap between the various approaches in this space and we see a lot > of benefit to standardizing the language support for saving/restoring > state. > > Count us in as interested in engaging in this project. It's really nice to have you here! I will be glad to hear how the CRaC API is aligned with what you're doing and how does it fit your use cases. Although we are rather far from standardization, the outcome should be, of course, something general and suitable for other implementations. The experience of implementing the in-JVM checkpoint/restore mechanism is also very interesting, as well as its usability. We always considered CRIU as the simplest bootstrap mechanism with others possible. So the API should require no changes for the in-JVM mechanism, at least a hypothetical one. It will be interesting to look at where we are in this sense. Thanks, Anton