JVM Pretouch
Volker Simonis
volker.simonis at gmail.com
Thu Nov 28 17:14:59 UTC 2024
Martin Welgemoed <martin.welgemoed at commercetools.com> schrieb am Fr., 15.
Nov. 2024, 16:55:
> Hey Anton,
>
> > To keep restore time low, we mmap image pages, o first access will
> require load
> > the content first. You can try this before executing restore command:
> >
> > export CRAC_CRIU_OPTS=--no-mmap-page-image
> >
> > Now the decision is done by integration with CRIU.It i should make
> images pages
> > available right away.
>
> Thanks for the suggestion. This did help a little bit (the process
> after restore has slightly more physical memory in use) but
> unfortunately the problem is a bit more involved:
>
> * We start the Java application with a few gigabytes of RAM as Xms and
> with AlwaysPretouch
> * This causes many gigabytes of RAM to be physically mapped into the
> process
> * A CRaC snapshot is taken
> * CRaC in "crac::checkpoint" does a full GC that explicitly ignores
> Xms. This is a good thing, because otherwise it'd dump gigabytes of
> empty heap to disk.
> * The heap is shrunk significantly by this, way below Xms
> * CRIU dumps this much smaller heap to disk
> * The snapshot is restored with this much smaller heap
> * The "G1HeapSizingPolicy::full_collection_resize_amount" does not
> check if "capacity_after_gc" is below MinHeapSize (because this is an
> invariant for G1GC)
> * As the application takes on full load requests trigger heap growth
> which causes worse tail latencies until it reaches normal operating
> heap size
>
> Essentially after the snapshot restore I'd like the application to be
> in the same position as before, that is with gigabytes of heap already
> committed and mapped to physical memory.
>
> Given the generic heap interface available to crac::checkpoint I don't
> see a clean way to work around this in the forked OpenJDK during a
> restore. Patching G1HeapSizingPolicy to work around this is fairly
> trivial but I'm not sure if that's desirable.
>
> Until now the best workaround I've found is still just allocating a
> giant blob of memory after the restore to grow the heap and then
> immediately freeing it. This does work because once the heap is past
> MinHeapSize the G1GC won't let it shrink below it again. But it feels
> a bit hacky.
>
You could do the allocation in the beforeCheckpoint() hook and immediately
free it in the afterRestore() hook. But that would considerably increase
the image size.
I think it would also be possible to implement a CRaC version of
AlwaysPretouch which pretouches the complete heap on restore. This would
keep your image small and only increase the restore time.
> Kind regards,
> --
> Martin Welgemoed
> Scala Engineer
>
> martin.welgemoed at commercetools.com
> T. +49 (89) 99 82 996-0
> commercetools B.V. | Prins Bernhardplein 200 | 1097 JB Amsterdam
>
> --
>
>
>
>
>
> commercetools B.V.
> Prins Bernhardplein 200, 1097 JB Amsterdam |
> Netherlands
> www.commercetools.com <http://www.commercetools.com/>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/crac-dev/attachments/20241128/85ef75c8/attachment.htm>
More information about the crac-dev
mailing list