RFR: 8254758: Change G1ServiceThread to be task based [v4]

Fri Oct 23 15:04:43 UTC 2020

On Tue, 20 Oct 2020 12:59:32 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:

>> Please review this enhancement to make `G1ServiceThread` task based. This thread has evolved from having a single responsibility to do more things. And there are plans to add more to it. To make this easier and to allow different tasks to run at different intervals this change adds a simple task mechanism to the service thread.
>> 
>> The idea is that the service thread keeps a list of `G1ServiceTask`-objects that are ordered by when they should execute. When no tasks are ready to be run the service thread check how far into the future the next task is scheduled and sleeps/waits for that duration. If a new task is added while the thread is waiting it will be woken up to check for ready tasks. 
>> 
>> There are currently two tasks in the list. They have been extracted from the old service thread into the tasks: `G1PeriodicGCTask` and `G1RemSetSamplingTask`. These are more or less identical to the old code, but in the form of tasks. The only difference is that `G1PeriodicGCTask` no longer need to keep track of the last attempt, since it will be rescheduled with the correct interval. 
>> 
>> To reschedule a task is to update the time when it should be executed next and adding it back to the list. This insertion is O(n) and it should be fine since the expected number of tasks handled by the service thread is low. If a task don't want to be rescheduled it can set a negative timeout and that will cause the task to be dropped from the list. There are currently no such tasks and adding such tasks come with the responsibility of making sure resources are taken care of when the task is no longer active.
>> 
>> I've added a gtest to do some basic VM-testing and apart from a lot of local testing I've run mach5 tier 1-4 a few times with good results.
>> 
>> I have created a few follow up tasks to look at moving the current tasks to more appropriate locations and also updated a few old service thread related tasks with the label: [`gc-g1-service-thread`](https://bugs.openjdk.java.net/issues/?jql=labels%20%3D%20gc-g1-service-thread)
>
> Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Self review: Test cleanup
>  - Albert review: Remove run_tasks and _ms post-fix

Changes requested by tschatzl (Reviewer).

src/hotspot/share/gc/g1/g1ServiceThread.hpp line 87:

> 85: //   - check if a periodic GC should be scheduled.
> 86: class G1ServiceThread: public ConcurrentGCThread {
> 87: private:

unnecessary "private"

src/hotspot/share/gc/g1/g1ServiceThread.hpp line 88:

> 86: class G1ServiceThread: public ConcurrentGCThread {
> 87: private:
> 88:   // The monitor is used to ensure thread saftey for the task queue

s/saftey/safety

test/hotspot/gtest/gc/g1/test_g1ServiceThread.cpp line 73:

> 71: // waiting gets run in a timely manner.
> 72: TEST_VM(G1ServiceThread, test_add_while_waiting) {
> 73:   // Make sure default tasks use long intervals.

... so that they do not interfere with the test as the ServiceTask automatically registers them.

test/hotspot/gtest/gc/g1/test_g1ServiceThread.cpp line 135:

> 133: 
> 134:   // Now fake a run-loop, that reschedules the tasks using a
> 135:   // random multiplyer.

s/multiplyer/multiplier

test/hotspot/gtest/gc/g1/test_g1ServiceThread.cpp line 137:

> 135:   // random multiplyer.
> 136:   for (double now = 0; now < 1000; now++) {
> 137:     // Random multiplyier is at least 1 to ensure progress.

s/multiplyer/multiplier

src/hotspot/share/gc/g1/g1ServiceThread.hpp line 32:

> 30: 
> 31: class G1ServiceTask : public CHeapObj<mtGC> {
> 32:   // The next time this task should be executed.

is this absolute or relative time? Would be nice to state here.

src/hotspot/share/gc/g1/g1ServiceThread.hpp line 62:

> 60:   virtual void execute();
> 61:   virtual uint64_t delay_ms();
> 62:   virtual bool should_reschedule();

I was wondering whether it wouldn't be an easier interface that the Task must call something like

  reschedule(delay-in-ms); (protected method)

every time it wants to be rescheduled?

That seems easier and more direct than having to fill in more methods.

src/hotspot/share/gc/g1/g1ServiceThread.cpp line 283:

> 281:   task->execute();
> 282: 
> 283:   double duration = os::elapsedVTime() - start;

use of elapsedVTime() seems wrong here and above as I would only care about wall time between executions.

src/hotspot/share/gc/g1/g1ServiceThread.cpp line 308:

> 306:       _vtime_accum = (os::elapsedVTime() - vtime_start);
> 307:     } else {
> 308:       _vtime_accum = 0.0;

Why is this use of elapsedVTime() guarded by os::supports_vtime() and not the others (if they are valid)?

src/hotspot/share/gc/g1/g1ServiceThread.cpp line 223:

> 221:   assert(!_task_queue.is_empty(), "Should not be called for empty list");
> 222: 
> 223:   double time_diff = _task_queue.peek()->time() - os::elapsedTime();

I dislike the use of os::elapsedTime() wherever I see it :) I would prefer if an integer time were used.

Also, this code implicitly assumes that start time is start of the VM, not start of the thread. So if startup takes 100ms (random number), all items registered in the queue are immediately executed (assuming they have an interval < 100ms).

The os class actually provides a os::elapsed_counter()/javaTimeNanos/Millis for that which I think is better here.

src/hotspot/share/gc/g1/g1ServiceThread.cpp line 188:

> 186: 
> 187:   virtual uint64_t delay_ms() {
> 188:     return G1ConcRefinementServiceIntervalMillis;

I think there is a (pre-existing) issue: we do not really want to re-schedule the G1RemSetSamplingTask every 100ms (i.e. time stamp 100/200/300/...), but 100ms after the last GC. Eg. consider that the task will be scheduled at 300 for 400 (G1ConcRefinementServiceIntervalMillis), then there is a 200ms GC making the G1RemSetSamplingTask due right after GC.

That's why a "schedule(delay)" call on the task (or the queue, or the servicethread) makes a lot of sense to me (see also below for rationale).

src/hotspot/share/gc/g1/g1ServiceThread.cpp line 112:

> 110:     // again to see if the value has been updated. Otherwise use the
> 111:     // real value provided.
> 112:     return G1PeriodicGCInterval == 0 ? 1000 : G1PeriodicGCInterval;

pre-existing: I think this delay has the same issue as the young rem set sampling thread below: G1PeriodicGCInterval specifies that there should be at least one GC after G1PeriodicGCInterval ms after the last gc. Not every G1PeriodicGCInterval ms.

We probably need a task that polls the flag which then schedules the actual periodic gc or something like that.

-------------

PR: https://git.openjdk.java.net/jdk/pull/734