RFR: 8254758: Change G1ServiceThread to be task based [v4]

Tue Oct 27 21:36:38 UTC 2020

On Fri, 23 Oct 2020 14:46:26 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision:
>> 
>>  - Self review: Test cleanup
>>  - Albert review: Remove run_tasks and _ms post-fix
>
> src/hotspot/share/gc/g1/g1ServiceThread.cpp line 223:
> 
>> 221:   assert(!_task_queue.is_empty(), "Should not be called for empty list");
>> 222: 
>> 223:   double time_diff = _task_queue.peek()->time() - os::elapsedTime();
> 
> I dislike the use of os::elapsedTime() wherever I see it :) I would prefer if an integer time were used.
> 
> Also, this code implicitly assumes that start time is start of the VM, not start of the thread. So if startup takes 100ms (random number), all items registered in the queue are immediately executed (assuming they have an interval < 100ms).
> 
> The os class actually provides a os::elapsed_counter()/javaTimeNanos/Millis for that which I think is better here.

I kind of agree, I started looking at using an integer time but switched after looking a bit at how the timing is done for the concurrent mark tasks and also that the logging is using elapsedTime. 

Since both you and Albert (and me too) prefer using an integer I will take a look at using elapsedCounter, this should be easily converted to the same timestamps as used by the logging.

> src/hotspot/share/gc/g1/g1ServiceThread.cpp line 283:
> 
>> 281:   task->execute();
>> 282: 
>> 283:   double duration = os::elapsedVTime() - start;
> 
> use of elapsedVTime() seems wrong here and above as I would only care about wall time between executions.

For walltime **between** executions you will have to enable `gc+task+start` to get the start time of tasks and look in the log how long it was between two executions (or have the specific task log this). This is measuring the time it took to execute a task, the reason I use vTime is because a task can join the suspendible thread set, and because of this be waiting for a big part of the execution. Using vTime will then show how much the task really used.

If it should be allowed that a task wait for a GC to complete could of course be changed, and then using wall clock time would work.

> src/hotspot/share/gc/g1/g1ServiceThread.cpp line 308:
> 
>> 306:       _vtime_accum = (os::elapsedVTime() - vtime_start);
>> 307:     } else {
>> 308:       _vtime_accum = 0.0;
> 
> Why is this use of elapsedVTime() guarded by os::supports_vtime() and not the others (if they are valid)?

Here it might make sense to not report anything if we don't get a "real" VTime. For the duration of the task I think returning normal elapsedTime is fine in the event of `supports_vtime()` returning `false`. 
If we want to change this use I think we can do that when moving the measurement into the `G1RemSetSamplingTask`, see [8252645](https://bugs.openjdk.java.net/browse/JDK-8252645).

> src/hotspot/share/gc/g1/g1ServiceThread.hpp line 32:
> 
>> 30: 
>> 31: class G1ServiceTask : public CHeapObj<mtGC> {
>> 32:   // The next time this task should be executed.
> 
> is this absolute or relative time? Would be nice to state here.

Fixed

> src/hotspot/share/gc/g1/g1ServiceThread.hpp line 87:
> 
>> 85: //   - check if a periodic GC should be scheduled.
>> 86: class G1ServiceThread: public ConcurrentGCThread {
>> 87: private:
> 
> unnecessary "private"

Fixed

> src/hotspot/share/gc/g1/g1ServiceThread.hpp line 88:
> 
>> 86: class G1ServiceThread: public ConcurrentGCThread {
>> 87: private:
>> 88:   // The monitor is used to ensure thread saftey for the task queue
> 
> s/saftey/safety

Fixed

> test/hotspot/gtest/gc/g1/test_g1ServiceThread.cpp line 73:
> 
>> 71: // waiting gets run in a timely manner.
>> 72: TEST_VM(G1ServiceThread, test_add_while_waiting) {
>> 73:   // Make sure default tasks use long intervals.
> 
> ... so that they do not interfere with the test as the ServiceTask automatically registers them.

Added to the comment, the reason we use the long interval is to make sure that the service thread is waiting when the new task is added and is woken up correctly.

> test/hotspot/gtest/gc/g1/test_g1ServiceThread.cpp line 137:
> 
>> 135:   // random multiplyer.
>> 136:   for (double now = 0; now < 1000; now++) {
>> 137:     // Random multiplyier is at least 1 to ensure progress.
> 
> s/multiplyer/multiplier

Fixed, how many ways can one spell multipler without getting it correct :)

> test/hotspot/gtest/gc/g1/test_g1ServiceThread.cpp line 135:
> 
>> 133: 
>> 134:   // Now fake a run-loop, that reschedules the tasks using a
>> 135:   // random multiplyer.
> 
> s/multiplyer/multiplier

Fixed

> src/hotspot/share/gc/g1/g1ServiceThread.hpp line 62:
> 
>> 60:   virtual void execute();
>> 61:   virtual uint64_t delay_ms();
>> 62:   virtual bool should_reschedule();
> 
> I was wondering whether it wouldn't be an easier interface that the Task must call something like
> 
>   reschedule(delay-in-ms); (protected method)
> 
> every time it wants to be rescheduled?
> 
> That seems easier and more direct than having to fill in more methods.

I agree that a more explicit approach has advantages. What do you think about this approach:
* Changed `G1ServiceThread::reschedule_task(task)` `G1ServiceThread::schedule_task(task, delay)`
* Add protected `G1ServiceThread::reschedule(delay)` that calls `G1ServiceThread::schedule_task()`
* Removed call to `reschedule_task()` from main loop.

> src/hotspot/share/gc/g1/g1ServiceThread.cpp line 188:
> 
>> 186: 
>> 187:   virtual uint64_t delay_ms() {
>> 188:     return G1ConcRefinementServiceIntervalMillis;
> 
> I think there is a (pre-existing) issue: we do not really want to re-schedule the G1RemSetSamplingTask every 100ms (i.e. time stamp 100/200/300/...), but 100ms after the last GC. Eg. consider that the task will be scheduled at 300 for 400 (G1ConcRefinementServiceIntervalMillis), then there is a 200ms GC making the G1RemSetSamplingTask due right after GC.
> 
> That's why a "schedule(delay)" call on the task (or the queue, or the servicethread) makes a lot of sense to me (see also below for rationale).

I didn't want to change when the tasks are run in this change and I think could be addressed as part of [JDK-8254999](https://bugs.openjdk.java.net/browse/JDK-8254999). 

But I totally agree with your comment, it would probably make sense to check when the last GC occured and schedule the task with that in mind.

> src/hotspot/share/gc/g1/g1ServiceThread.cpp line 112:
> 
>> 110:     // again to see if the value has been updated. Otherwise use the
>> 111:     // real value provided.
>> 112:     return G1PeriodicGCInterval == 0 ? 1000 : G1PeriodicGCInterval;
> 
> pre-existing: I think this delay has the same issue as the young rem set sampling thread below: G1PeriodicGCInterval specifies that there should be at least one GC after G1PeriodicGCInterval ms after the last gc. Not every G1PeriodicGCInterval ms.
> 
> We probably need a task that polls the flag which then schedules the actual periodic gc or something like that.

I agree that with this new approach we can do better than before and schedule the task to check at a more appropriate time. I think this should be addressed as part of [JDK-8255001](https://bugs.openjdk.java.net/browse/JDK-8255001), so I added a comment to the issue.

-------------

PR: https://git.openjdk.java.net/jdk/pull/734