RFR: 8178287: AsyncGetCallTrace fails to traverse valid Java stacks [v4]
    Marcus Hirt 
    hirt at openjdk.java.net
       
    Wed Aug 18 19:19:33 UTC 2021
    
    
  
On Mon, 19 Jul 2021 09:25:59 GMT, Ludovic Henry <luhenry at openjdk.org> wrote:
>> When the signal sent for AsyncGetCallTrace or JFR would land on a runtime stub (like arraycopy), a vtable stub, or the prolog of a compiled method,  it wouldn't be able to detect the sender (caller) frame for multiple reasons. This patch fixes these cases through adding CodeBlob-specific frame parser which are in the best position to know how a frame is setup.
>> 
>> The following examples have been profiled with honest-profiler which uses `AsyncGetCallTrace`.
>> 
>> # `Prof1`
>> 
>> public class Prof1 {
>> 
>>     public static void main(String[] args) {
>>         StringBuilder sb = new StringBuilder();
>>         for (int i = 0; i < 1000000; i++) {
>>             sb.append("ab");
>>             sb.delete(0, 1);
>>         }
>>         System.out.println(sb.length());
>>     }
>> }
>> 
>> 
>> - Baseline:
>> 
>> Flat Profile (by method):
>>         (t 99.4,s 99.4) AGCT::Unknown Java[ERR=-5]
>>         (t  0.5,s  0.2) Prof1::main
>>         (t  0.2,s  0.2) java.lang.AbstractStringBuilder::append
>>         (t  0.1,s  0.1) AGCT::Unknown not Java[ERR=-3]
>>         (t  0.0,s  0.0) java.lang.AbstractStringBuilder::ensureCapacityInternal
>>         (t  0.0,s  0.0) java.lang.AbstractStringBuilder::shift
>>         (t  0.0,s  0.0) java.lang.String::getBytes
>>         (t  0.0,s  0.0) java.lang.AbstractStringBuilder::putStringAt
>>         (t  0.0,s  0.0) java.lang.StringBuilder::delete
>>         (t  0.2,s  0.0) java.lang.StringBuilder::append
>>         (t  0.0,s  0.0) java.lang.AbstractStringBuilder::delete
>>         (t  0.0,s  0.0) java.lang.AbstractStringBuilder::putStringAt
>> 
>> - With `StubRoutinesBlob::FrameParser`:
>> 
>> Flat Profile (by method):
>>         (t 98.7,s 98.7) java.lang.AbstractStringBuilder::ensureCapacityInternal
>>         (t  0.9,s  0.9) java.lang.AbstractStringBuilder::delete
>>         (t 99.8,s  0.2) Prof1::main
>>         (t  0.1,s  0.1) AGCT::Unknown not Java[ERR=-3]
>>         (t  0.0,s  0.0) AGCT::Unknown Java[ERR=-5]
>>         (t 98.8,s  0.0) java.lang.AbstractStringBuilder::append
>>         (t 98.8,s  0.0) java.lang.StringBuilder::append
>>         (t  0.9,s  0.0) java.lang.StringBuilder::delete
>> 
>> 
>> # `Prof2`
>> 
>> import java.util.function.Supplier;
>> 
>> public class Prof2 {
>> 
>>     public static void main(String[] args) {
>>         var rand = new java.util.Random(0);
>>         Supplier[] suppliers = {
>>                 () -> 0,
>>                 () -> 1,
>>                 () -> 2,
>>                 () -> 3,
>>         };
>> 
>>         long sum = 0;
>>         for (int i = 0; i >= 0; i++) {
>>             sum += (int)suppliers[i % suppliers.length].get();
>>         }
>>     }
>> }
>> 
>> 
>> - Baseline:
>> 
>> Flat Profile (by method):
>>         (t 60.7,s 60.7) AGCT::Unknown Java[ERR=-5]
>>         (t 39.2,s 35.2) Prof2::main
>>         (t  1.4,s  1.4) Prof2::lambda$main$3
>>         (t  1.0,s  1.0) Prof2::lambda$main$2
>>         (t  0.9,s  0.9) Prof2::lambda$main$1
>>         (t  0.7,s  0.7) Prof2::lambda$main$0
>>         (t  0.1,s  0.1) AGCT::Unknown not Java[ERR=-3]
>>         (t  0.0,s  0.0) java.lang.Thread::exit
>>         (t  0.9,s  0.0) Prof2$$Lambda$2.0x0000000800c00c28::get
>>         (t  1.0,s  0.0) Prof2$$Lambda$3.0x0000000800c01000::get
>>         (t  1.4,s  0.0) Prof2$$Lambda$4.0x0000000800c01220::get
>>         (t  0.7,s  0.0) Prof2$$Lambda$1.0x0000000800c00a08::get
>> 
>> 
>> - With `VtableBlob::FrameParser` and `nmethod::FrameParser`:
>> 
>> Flat Profile (by method):
>>         (t 74.1,s 70.3) Prof2::main
>>         (t  6.5,s  5.5) Prof2$$Lambda$29.0x0000000800081220::get
>>         (t  6.6,s  5.4) Prof2$$Lambda$28.0x0000000800081000::get
>>         (t  5.7,s  5.0) Prof2$$Lambda$26.0x0000000800080a08::get
>>         (t  5.9,s  5.0) Prof2$$Lambda$27.0x0000000800080c28::get
>>         (t  4.9,s  4.9) AGCT::Unknown Java[ERR=-5]
>>         (t  1.2,s  1.2) Prof2::lambda$main$2
>>         (t  0.9,s  0.9) Prof2::lambda$main$3
>>         (t  0.9,s  0.9) Prof2::lambda$main$1
>>         (t  0.7,s  0.7) Prof2::lambda$main$0
>>         (t  0.1,s  0.1) AGCT::Unknown not Java[ERR=-3]
>
> Ludovic Henry has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:
> 
>  - Remove FrameParser and the required allocation
>    
>    The need for the allocation would be it non async-safe. However,
>    AsyncGetCallTrace is async-safe and thus can't allow for allocations.
>  - Merge branch 'master' of https://github.com/openjdk/jdk into fix-8178287
>  - Fix comments
>  - Disable checks in FrameParser when known to be safe
>  - Allow AsyncGetCallTrace and JFR to unwind stack from vtable stub
>    
>    The program is the following:
>    
>    ```
>    import java.util.function.Supplier;
>    
>    public class Prof2 {
>    
>        public static void main(String[] args) {
>            var rand = new java.util.Random(0);
>            Supplier[] suppliers = {
>                    () -> 0,
>                    () -> 1,
>                    () -> 2,
>                    () -> 3,
>            };
>    
>            long sum = 0;
>            for (int i = 0; i >= 0; i++) {
>                sum += (int)suppliers[i % suppliers.length].get();
>            }
>        }
>    }
>    ```
>    
>    The results are as follows:
>    
>    - Baseline (from previous commit):
>    
>    Flat Profile (by method):
>                (t 39.3,s 39.3) AGCT::Unknown Java[ERR=-5]
>                (t 40.3,s 36.1) Prof2::main
>                (t  6.4,s  5.3) Prof2$$Lambda$28.0x0000000800081000::get
>                (t  6.1,s  5.1) Prof2$$Lambda$29.0x0000000800081220::get
>                (t  6.0,s  5.0) Prof2$$Lambda$27.0x0000000800080c28::get
>                (t  6.1,s  5.0) Prof2$$Lambda$26.0x0000000800080a08::get
>                (t  1.1,s  1.1) Prof2::lambda$main$2
>                (t  1.1,s  1.1) Prof2::lambda$main$0
>                (t  1.0,s  1.0) Prof2::lambda$main$1
>                (t  0.9,s  0.9) Prof2::lambda$main$3
>                (t  0.1,s  0.1) AGCT::Unknown not Java[ERR=-3]
>    
>    - With unwind from vtable stub
>    
>    Flat Profile (by method):
>            (t 74.1,s 70.3) Prof2::main
>            (t  6.5,s  5.5) Prof2$$Lambda$29.0x0000000800081220::get
>            (t  6.6,s  5.4) Prof2$$Lambda$28.0x0000000800081000::get
>            (t  5.7,s  5.0) Prof2$$Lambda$26.0x0000000800080a08::get
>            (t  5.9,s  5.0) Prof2$$Lambda$27.0x0000000800080c28::get
>            (t  4.9,s  4.9) AGCT::Unknown Java[ERR=-5]
>            (t  1.2,s  1.2) Prof2::lambda$main$2
>            (t  0.9,s  0.9) Prof2::lambda$main$3
>            (t  0.9,s  0.9) Prof2::lambda$main$1
>            (t  0.7,s  0.7) Prof2::lambda$main$0
>            (t  0.1,s  0.1) AGCT::Unknown not Java[ERR=-3]
>    
>    We attribute the vtable stub to the caller and not the callee, which is
>    already an improvement from the existing case.
>  - Allow AsyncGetCallTrace and JFR to unwind stack from nmethod's prolog
>    
>    When sampling hits the prolog of a method, Hotspot assumes it's unable
>    to parse the frame. This change allows to parse such frame on x86 by
>    specializing which instruction it's hitting in the prolog.
>    
>    The results are as follows:
>    
>    - Baseline:
>    
>    Flat Profile (by method):
>            (t 60.7,s 60.7) AGCT::Unknown Java[ERR=-5]
>            (t 39.2,s 35.2) Prof2::main
>            (t  1.4,s  1.4) Prof2::lambda$main$3
>            (t  1.0,s  1.0) Prof2::lambda$main$2
>            (t  0.9,s  0.9) Prof2::lambda$main$1
>            (t  0.7,s  0.7) Prof2::lambda$main$0
>            (t  0.1,s  0.1) AGCT::Unknown not Java[ERR=-3]
>            (t  0.0,s  0.0) java.lang.Thread::exit
>            (t  0.9,s  0.0) Prof2$$Lambda$2.0x0000000800c00c28::get
>            (t  1.0,s  0.0) Prof2$$Lambda$3.0x0000000800c01000::get
>            (t  1.4,s  0.0) Prof2$$Lambda$4.0x0000000800c01220::get
>            (t  0.7,s  0.0) Prof2$$Lambda$1.0x0000000800c00a08::get
>    
>    - With incomplete frame parsing:
>    
>    Flat Profile (by method):
>            (t 39.3,s 39.3) AGCT::Unknown Java[ERR=-5]
>            (t 40.3,s 36.1) Prof2::main
>            (t  6.4,s  5.3) Prof2$$Lambda$28.0x0000000800081000::get
>            (t  6.1,s  5.1) Prof2$$Lambda$29.0x0000000800081220::get
>            (t  6.0,s  5.0) Prof2$$Lambda$27.0x0000000800080c28::get
>            (t  6.1,s  5.0) Prof2$$Lambda$26.0x0000000800080a08::get
>            (t  1.1,s  1.1) Prof2::lambda$main$2
>            (t  1.1,s  1.1) Prof2::lambda$main$0
>            (t  1.0,s  1.0) Prof2::lambda$main$1
>            (t  0.9,s  0.9) Prof2::lambda$main$3
>            (t  0.1,s  0.1) AGCT::Unknown not Java[ERR=-3]
>            (t  0.0,s  0.0) java.util.Locale::getInstance
>            (t  0.0,s  0.0) AGCT::Not walkable Java[ERR=-6]
>            (t  0.0,s  0.0) jdk.internal.loader.BuiltinClassLoader::loadClassOrNull
>            (t  0.0,s  0.0) java.lang.ClassLoader::loadClass
>            (t  0.0,s  0.0) sun.net.util.URLUtil::urlNoFragString
>            (t  0.0,s  0.0) java.lang.Class::forName0
>            (t  0.0,s  0.0) java.util.Locale::initDefault
>            (t  0.0,s  0.0) jdk.internal.loader.BuiltinClassLoader::loadClass
>            (t  0.0,s  0.0) jdk.internal.loader.URLClassPath::getLoader
>            (t  0.0,s  0.0) jdk.internal.loader.URLClassPath::getResource
>            (t  0.0,s  0.0) java.lang.String::toLowerCase
>            (t  0.0,s  0.0) sun.launcher.LauncherHelper::loadMainClass
>            (t  0.0,s  0.0) sun.launcher.LauncherHelper::checkAndLoadMain
>            (t  0.0,s  0.0) java.util.Locale::<clinit>
>            (t  0.0,s  0.0) jdk.internal.loader.BuiltinClassLoader::findClassOnClassPathOrNull
>            (t  0.0,s  0.0) jdk.internal.loader.ClassLoaders$AppClassLoader::loadClass
>            (t  0.0,s  0.0) java.lang.Class::forName
>    
>    The program is as follows:
>    
>    ```
>    import java.util.function.Supplier;
>    
>    public class Prof2 {
>    
>        public static void main(String[] args) {
>            var rand = new java.util.Random(0);
>            Supplier[] suppliers = {
>                    () -> 0,
>                    () -> 1,
>                    () -> 2,
>                    () -> 3,
>            };
>    
>            long sum = 0;
>            for (int i = 0; i >= 0; i++) {
>                sum += (int)suppliers[i % suppliers.length].get();
>            }
>        }
>    }
>    ```
>    
>    We see that the results are particularely useful in this case as the
>    methods are very short (it only returns an integer), and the probability
>    of hitting the prolog is then very high.
>  - Allow AsyncGetCallTrace and JFR to walk a stub frame
>    
>    When the signal sent for AsyncGetCallTrace or JFR would land on a stub
>    (like arraycopy), it wouldn't be able to detect the sender (caller)
>    frame because `_cb->frame_size() == 0`.
>    
>    Because we fully control how the prolog and epilog of stub code is
>    generated, we know there are two cases:
>    1. A stack frame is allocated via macroAssembler->enter(), and consists
>    in `push rbp; mov rsp, rbp;`.
>    2. No stack frames are allocated and rbp is left unchanged and rsp is
>    decremented with the `call` instruction that push the return `pc` on the
>    stack.
>    
>    For case 1., we can easily know the sender frame by simply looking at
>    rbp, especially since we know that all stubs preserve the frame pointer
>    (on x86 at least).
>    
>    For case 2., we end up returning the sender's sender, but that already
>    gives us more information than what we have today.
>    
>    The results are as follows:
>    
>    - Baseline:
>    
>    Flat Profile (by method):
>            (t 99.4,s 99.4) AGCT::Unknown Java[ERR=-5]
>            (t  0.5,s  0.2) Prof1::main
>            (t  0.2,s  0.2) java.lang.AbstractStringBuilder::append
>            (t  0.1,s  0.1) AGCT::Unknown not Java[ERR=-3]
>            (t  0.0,s  0.0) java.lang.AbstractStringBuilder::ensureCapacityInternal
>            (t  0.0,s  0.0) java.lang.AbstractStringBuilder::shift
>            (t  0.0,s  0.0) java.lang.String::getBytes
>            (t  0.0,s  0.0) java.lang.AbstractStringBuilder::putStringAt
>            (t  0.0,s  0.0) java.lang.StringBuilder::delete
>            (t  0.2,s  0.0) java.lang.StringBuilder::append
>            (t  0.0,s  0.0) java.lang.AbstractStringBuilder::delete
>            (t  0.0,s  0.0) java.lang.AbstractStringBuilder::putStringAt
>    
>    - With StubRoutinesBlob::FrameParser
>    
>    Flat Profile (by method):
>            (t 98.7,s 98.7) java.lang.AbstractStringBuilder::ensureCapacityInternal
>            (t  0.9,s  0.9) java.lang.AbstractStringBuilder::delete
>            (t 99.8,s  0.2) Prof1::main
>            (t  0.1,s  0.1) AGCT::Unknown not Java[ERR=-3]
>            (t  0.0,s  0.0) AGCT::Unknown Java[ERR=-5]
>            (t 98.8,s  0.0) java.lang.AbstractStringBuilder::append
>            (t 98.8,s  0.0) java.lang.StringBuilder::append
>            (t  0.9,s  0.0) java.lang.StringBuilder::delete
>    
>    The program is as follows:
>    
>    ```
>    public class Prof1 {
>    
>        public static void main(String[] args) {
>            StringBuilder sb = new StringBuilder();
>            for (int i = 0; i < 1000000; i++) {
>                sb.append("ab");
>                sb.delete(0, 1);
>            }
>            System.out.println(sb.length());
>        }
>    }
>    ```
>    
>    We now account for the arraycopy stub which is called by
>    AbstractStringBuilder::ensureCapacityInternal. It was previously ignored
>    because it would not know how to parse the frame for the arraycopy stub
>    and would fall in the AGCT::Unknown Java[ERR=-5] section.
>    
>    However, it still isn't perfect since it doesn't point to the arraycopy stub
>    directly.
>  - Extract sender frame parsing to CodeBlock::FrameParser
>    
>    Whether and how a frame is setup is controlled by the code generator
>    for the specific CodeBlock. The CodeBlock is then in the best place to know how
>    to parse the sender's frame from the current frame in the given CodeBlock.
>    
>    This refactoring proposes to extract this parsing out of `frame` and into a
>    `CodeBlock::FrameParser`. This FrameParser is then specialized in the relevant
>    inherited children of CodeBlock.
>    
>    This change is to largely facilitate adding new supported cases for JDK-8252417
>    like runtime stubs.
Ok. Do you know who should be the primary reviewer? Rickard Bäckman? Nils Eliasson?
-------------
PR: https://git.openjdk.java.net/jdk/pull/4436
    
    
More information about the hotspot-jfr-dev
mailing list