RFR: 8314850: SharedRuntime::handle_wrong_method() gets called too often when resolving Continuation.enter

Thu Aug 24 18:08:29 UTC 2023

On Wed, 23 Aug 2023 15:06:16 GMT, Ron Pressler <rpressler at openjdk.org> wrote:

>> Thanks for working on this.
>> 
>> I'm not sure if this is useful, but I just did a quick test with the patch and I see some interesting performance changes. Context switches in the test go from ~160k to ~1m, and total time increases from 4 seconds to 4.8 seconds. However I do see lower CPU utilization.
>> 
>> 
>> import java.util.concurrent.*;
>> 
>> public class Main {
>>     public static void main(String[] args) throws Exception {
>>         for (int i = 0; i < 100; i++) {
>>             ExecutorService = Executors.newVirtualThreadPerTaskExecutor();
>>             long start = System.currentTimeMillis();
>>             for (int j = 0; j < 100_000; j++) e.submit(Main::task);
>>             e.shutdown(); e.awaitTermination(100, TimeUnit.MINUTES);
>>             long dur = System.currentTimeMillis() - start;
>>             System.out.println(dur + "ms");
>>         }
>>     }
>> 
>>     static void task() { }
>> }
>> 
>> 
>> 
>> perf stat java -XX:+UnlockExperimentalVMOptions -XX:-DoJVMTIVirtualThreadTransitions Main.java
>> 
>> 
>> Before patch
>> 
>>          45,069.79 msec task-clock                #   11.214 CPUs utilized
>>            159,503      context-switches          #    0.004 M/sec
>>             16,250      cpu-migrations            #    0.361 K/sec
>>            182,820      page-faults               #    0.004 M/sec
>>            
>>        4.019077281 seconds time elapsed
>> 
>>       44.004389000 seconds user
>>        1.844216000 seconds sys
>> 
>> 
>> 
>> After patch
>> 
>>          39,830.77 msec task-clock                #    8.217 CPUs utilized <--- reduced
>>          1,094,192      context-switches          #    0.027 M/sec <--- increased
>>             91,803      cpu-migrations            #    0.002 M/sec
>>            185,841      page-faults               #    0.005 M/sec
>> 
>>        4.847430580 seconds time elapsed <--- increased
>> 
>>       34.643548000 seconds user
>>       10.396489000 seconds sys
>> 
>> 
>> I appreciate this test may be total unrealistic and flawed, and I haven't had time to check where the context-switches are coming from, but figured I'd share the early results just in case.
>
> @olivergillespie That happens because the scheduler is starved. This is a known phenomenon that usually appears in artificial benchmarks with a lot of work but low parallelism. We've considered trying to fix that, but a fix would probably harm real workloads. If you set the parallelism to a lower number you will see a lower latency.

Thanks for the reviews @pron and @theRealAph, and @olivergillespie for reporting this.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15403#issuecomment-1692181523