RFR: Convert old-gen single threaded pretouch to multi-threaded during

Amit Pawar amith.pawar at gmail.com
Mon Jan 18 15:46:20 UTC 2021


On Fri, Jan 8, 2021 at 6:38 PM Amit Pawar <amith.pawar at gmail.com> wrote:

> Hi
>
> I am trying to improve the pre-touch time taken during old-gen resizing.
> Need your suggestions whether following change will be accepted or not.
>
> What is happening ?
> Every GC thread resizes the old-gen during object promotion if there is no
> enough room for the object. After expanding GC thread will pre-touch the
> pages alone and cant pre-touch in parallel using PretouchTask task as it is
> already executing a GC task. The total GC pause time depends upon resize
> size and number of resizes.
>
> What is fix?
> Create another WorkGang and then GC thread can execute pre-touch task with
> this new WorkGang to reduce the pre-touch time taken. The code change is
> given below.
>
> Improvement:
> 1. Pre-touch improved by 50-70% for SPECjbb composite test.
> 2. This depends upon number of resize request and resize size. SPECJbb
> composite testing shows old-gen resized with sizes like 2MB-32MB with G1GC
> and up-to 64MB with ParallelGC. Also number of resizes are more than
> 100-200.
> 3. PretouchTask class uses PreTouchParallelChunkSize and current default
> is 4MB for x86 to split the pre-touch task. So time taken depends upon
> old-gen resize and this change wont help if it lesser than
> PreTouchParallelChunkSize value.
> 4. Please refer excel file from bug report for more details on improvement
> for different sizes. https://bugs.openjdk.java.net/browse/JDK-8254699
>
> Though it helps to reduce the pre-touch time taken but not sure whether
> adding another WorkGang is allowed. Please suggest.
>
> diff --git a/src/hotspot/share/gc/shared/gc_globals.hpp
> b/src/hotspot/share/gc/shared/gc_globals.hpp
> index aca8d6b6c34..b5d40b47480 100644
> --- a/src/hotspot/share/gc/shared/gc_globals.hpp
> +++ b/src/hotspot/share/gc/shared/gc_globals.hpp
> @@ -200,6 +200,12 @@
>    product(bool, AlwaysPreTouch, false,
>    \
>            "Force all freshly committed pages to be pre-touched")
>    \
>
>    \
> +  product(size_t, OldGenPreTouchWorkers, 1,
>   \
> +          "During object promotion old-gen can be expanded as required
> by"  \
> +          "ParallelGCThreads. OldGenPreTouchWorkers can be used to "
>    \
> +          "pre-touch the pages by ParallelGCThreads")
>   \
> +          range(1,  1024)
>   \
> +
>    \
>    product_pd(size_t, PreTouchParallelChunkSize,
>   \
>            "Per-thread chunk size for parallel memory pre-touch.")
>   \
>            range(4*K, SIZE_MAX / 2)
>    \
> diff --git a/src/hotspot/share/gc/shared/pretouchTask.cpp
> b/src/hotspot/share/gc/shared/pretouchTask.cpp
> index 4398d3924cc..435ec2ee76f 100644
> --- a/src/hotspot/share/gc/shared/pretouchTask.cpp
> +++ b/src/hotspot/share/gc/shared/pretouchTask.cpp
> @@ -27,6 +27,7 @@
>  #include "runtime/atomic.hpp"
>  #include "runtime/globals.hpp"
>  #include "runtime/os.hpp"
> +#include "utilities/ticks.hpp"
>
>  PretouchTask::PretouchTask(const char* task_name,
>                             char* start_address,
> @@ -62,6 +63,8 @@ void PretouchTask::work(uint worker_id) {
>    }
>  }
>
> +#define TIME_FORMAT "%0.3lfms"
> +
>  void PretouchTask::pretouch(const char* task_name, char* start_address,
> char* end_address,
>                              size_t page_size, WorkGang* pretouch_gang) {
>
> @@ -83,14 +86,30 @@ void PretouchTask::pretouch(const char* task_name,
> char* start_address, char* en
>      size_t num_chunks = (total_bytes + chunk_size - 1) / chunk_size;
>
>      uint num_workers = (uint)MIN2(num_chunks,
> (size_t)pretouch_gang->total_workers());
> -    log_debug(gc, heap)("Running %s with %u workers for " SIZE_FORMAT "
> work units pre-touching " SIZE_FORMAT "B.",
> -                        task.name(), num_workers, num_chunks,
> total_bytes);
> -
> +    Ticks mark_start = Ticks::now();
>      pretouch_gang->run_task(&task, num_workers);
> +    Ticks mark_end = Ticks::now();
> +    log_debug(gc, heap)("Running %s with %u workers for " SIZE_FORMAT "
> work units pre-touching " SIZE_FORMAT "B. " TIME_FORMAT ,
> +                        task.name(), num_workers, num_chunks,
> total_bytes, (mark_end-mark_start).seconds());
> +
>    } else {
> -    log_debug(gc, heap)("Running %s pre-touching " SIZE_FORMAT "B.",
> -                        task.name(), total_bytes);
> -    task.work(0);
> +    if(OldGenPreTouchWorkers > 1) {
> +      const char *oldgen_workers="Old-gen Pre-touch workers";
> +      static WorkGang *pretouch_workers= NULL ;
> +      if (! pretouch_workers) {
> + // pretouch_workers are used when pretouch_gang is null. This usually
> happens during old-gen
> + // resizing due to object promotion.
> +        pretouch_workers = new WorkGang(oldgen_workers,
> OldGenPreTouchWorkers, true, false);
> +        pretouch_workers->initialize_workers();
> +      }
> +      pretouch(oldgen_workers, start_address, end_address, page_size,
> pretouch_workers);
> +    } else {
> +      Ticks mark_start = Ticks::now();
> +      task.work(0);
> +      Ticks mark_end = Ticks::now();
> +      log_debug(gc, heap)("Running %s pre-touching " SIZE_FORMAT "B. "
> TIME_FORMAT,
> +                          task.name(), total_bytes,
> (mark_end-mark_start).seconds());
> +    }
>    }
>  }
>
>
>
>
Ping!



More information about the hotspot-gc-dev mailing list