RFR: 8178287: AsyncGetCallTrace fails to traverse valid Java stacks [v4]
Serguei Spitsyn
sspitsyn at openjdk.java.net
Wed Aug 18 06:54:34 UTC 2021
On Mon, 19 Jul 2021 09:25:59 GMT, Ludovic Henry <luhenry at openjdk.org> wrote:
>> When the signal sent for AsyncGetCallTrace or JFR would land on a runtime stub (like arraycopy), a vtable stub, or the prolog of a compiled method, it wouldn't be able to detect the sender (caller) frame for multiple reasons. This patch fixes these cases through adding CodeBlob-specific frame parser which are in the best position to know how a frame is setup.
>>
>> The following examples have been profiled with honest-profiler which uses `AsyncGetCallTrace`.
>>
>> # `Prof1`
>>
>> public class Prof1 {
>>
>> public static void main(String[] args) {
>> StringBuilder sb = new StringBuilder();
>> for (int i = 0; i < 1000000; i++) {
>> sb.append("ab");
>> sb.delete(0, 1);
>> }
>> System.out.println(sb.length());
>> }
>> }
>>
>>
>> - Baseline:
>>
>> Flat Profile (by method):
>> (t 99.4,s 99.4) AGCT::Unknown Java[ERR=-5]
>> (t 0.5,s 0.2) Prof1::main
>> (t 0.2,s 0.2) java.lang.AbstractStringBuilder::append
>> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3]
>> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::ensureCapacityInternal
>> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::shift
>> (t 0.0,s 0.0) java.lang.String::getBytes
>> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt
>> (t 0.0,s 0.0) java.lang.StringBuilder::delete
>> (t 0.2,s 0.0) java.lang.StringBuilder::append
>> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::delete
>> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt
>>
>> - With `StubRoutinesBlob::FrameParser`:
>>
>> Flat Profile (by method):
>> (t 98.7,s 98.7) java.lang.AbstractStringBuilder::ensureCapacityInternal
>> (t 0.9,s 0.9) java.lang.AbstractStringBuilder::delete
>> (t 99.8,s 0.2) Prof1::main
>> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3]
>> (t 0.0,s 0.0) AGCT::Unknown Java[ERR=-5]
>> (t 98.8,s 0.0) java.lang.AbstractStringBuilder::append
>> (t 98.8,s 0.0) java.lang.StringBuilder::append
>> (t 0.9,s 0.0) java.lang.StringBuilder::delete
>>
>>
>> # `Prof2`
>>
>> import java.util.function.Supplier;
>>
>> public class Prof2 {
>>
>> public static void main(String[] args) {
>> var rand = new java.util.Random(0);
>> Supplier[] suppliers = {
>> () -> 0,
>> () -> 1,
>> () -> 2,
>> () -> 3,
>> };
>>
>> long sum = 0;
>> for (int i = 0; i >= 0; i++) {
>> sum += (int)suppliers[i % suppliers.length].get();
>> }
>> }
>> }
>>
>>
>> - Baseline:
>>
>> Flat Profile (by method):
>> (t 60.7,s 60.7) AGCT::Unknown Java[ERR=-5]
>> (t 39.2,s 35.2) Prof2::main
>> (t 1.4,s 1.4) Prof2::lambda$main$3
>> (t 1.0,s 1.0) Prof2::lambda$main$2
>> (t 0.9,s 0.9) Prof2::lambda$main$1
>> (t 0.7,s 0.7) Prof2::lambda$main$0
>> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3]
>> (t 0.0,s 0.0) java.lang.Thread::exit
>> (t 0.9,s 0.0) Prof2$$Lambda$2.0x0000000800c00c28::get
>> (t 1.0,s 0.0) Prof2$$Lambda$3.0x0000000800c01000::get
>> (t 1.4,s 0.0) Prof2$$Lambda$4.0x0000000800c01220::get
>> (t 0.7,s 0.0) Prof2$$Lambda$1.0x0000000800c00a08::get
>>
>>
>> - With `VtableBlob::FrameParser` and `nmethod::FrameParser`:
>>
>> Flat Profile (by method):
>> (t 74.1,s 70.3) Prof2::main
>> (t 6.5,s 5.5) Prof2$$Lambda$29.0x0000000800081220::get
>> (t 6.6,s 5.4) Prof2$$Lambda$28.0x0000000800081000::get
>> (t 5.7,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get
>> (t 5.9,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get
>> (t 4.9,s 4.9) AGCT::Unknown Java[ERR=-5]
>> (t 1.2,s 1.2) Prof2::lambda$main$2
>> (t 0.9,s 0.9) Prof2::lambda$main$3
>> (t 0.9,s 0.9) Prof2::lambda$main$1
>> (t 0.7,s 0.7) Prof2::lambda$main$0
>> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3]
>
> Ludovic Henry has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:
>
> - Remove FrameParser and the required allocation
>
> The need for the allocation would be it non async-safe. However,
> AsyncGetCallTrace is async-safe and thus can't allow for allocations.
> - Merge branch 'master' of https://github.com/openjdk/jdk into fix-8178287
> - Fix comments
> - Disable checks in FrameParser when known to be safe
> - Allow AsyncGetCallTrace and JFR to unwind stack from vtable stub
>
> The program is the following:
>
> ```
> import java.util.function.Supplier;
>
> public class Prof2 {
>
> public static void main(String[] args) {
> var rand = new java.util.Random(0);
> Supplier[] suppliers = {
> () -> 0,
> () -> 1,
> () -> 2,
> () -> 3,
> };
>
> long sum = 0;
> for (int i = 0; i >= 0; i++) {
> sum += (int)suppliers[i % suppliers.length].get();
> }
> }
> }
> ```
>
> The results are as follows:
>
> - Baseline (from previous commit):
>
> Flat Profile (by method):
> (t 39.3,s 39.3) AGCT::Unknown Java[ERR=-5]
> (t 40.3,s 36.1) Prof2::main
> (t 6.4,s 5.3) Prof2$$Lambda$28.0x0000000800081000::get
> (t 6.1,s 5.1) Prof2$$Lambda$29.0x0000000800081220::get
> (t 6.0,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get
> (t 6.1,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get
> (t 1.1,s 1.1) Prof2::lambda$main$2
> (t 1.1,s 1.1) Prof2::lambda$main$0
> (t 1.0,s 1.0) Prof2::lambda$main$1
> (t 0.9,s 0.9) Prof2::lambda$main$3
> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3]
>
> - With unwind from vtable stub
>
> Flat Profile (by method):
> (t 74.1,s 70.3) Prof2::main
> (t 6.5,s 5.5) Prof2$$Lambda$29.0x0000000800081220::get
> (t 6.6,s 5.4) Prof2$$Lambda$28.0x0000000800081000::get
> (t 5.7,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get
> (t 5.9,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get
> (t 4.9,s 4.9) AGCT::Unknown Java[ERR=-5]
> (t 1.2,s 1.2) Prof2::lambda$main$2
> (t 0.9,s 0.9) Prof2::lambda$main$3
> (t 0.9,s 0.9) Prof2::lambda$main$1
> (t 0.7,s 0.7) Prof2::lambda$main$0
> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3]
>
> We attribute the vtable stub to the caller and not the callee, which is
> already an improvement from the existing case.
> - Allow AsyncGetCallTrace and JFR to unwind stack from nmethod's prolog
>
> When sampling hits the prolog of a method, Hotspot assumes it's unable
> to parse the frame. This change allows to parse such frame on x86 by
> specializing which instruction it's hitting in the prolog.
>
> The results are as follows:
>
> - Baseline:
>
> Flat Profile (by method):
> (t 60.7,s 60.7) AGCT::Unknown Java[ERR=-5]
> (t 39.2,s 35.2) Prof2::main
> (t 1.4,s 1.4) Prof2::lambda$main$3
> (t 1.0,s 1.0) Prof2::lambda$main$2
> (t 0.9,s 0.9) Prof2::lambda$main$1
> (t 0.7,s 0.7) Prof2::lambda$main$0
> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3]
> (t 0.0,s 0.0) java.lang.Thread::exit
> (t 0.9,s 0.0) Prof2$$Lambda$2.0x0000000800c00c28::get
> (t 1.0,s 0.0) Prof2$$Lambda$3.0x0000000800c01000::get
> (t 1.4,s 0.0) Prof2$$Lambda$4.0x0000000800c01220::get
> (t 0.7,s 0.0) Prof2$$Lambda$1.0x0000000800c00a08::get
>
> - With incomplete frame parsing:
>
> Flat Profile (by method):
> (t 39.3,s 39.3) AGCT::Unknown Java[ERR=-5]
> (t 40.3,s 36.1) Prof2::main
> (t 6.4,s 5.3) Prof2$$Lambda$28.0x0000000800081000::get
> (t 6.1,s 5.1) Prof2$$Lambda$29.0x0000000800081220::get
> (t 6.0,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get
> (t 6.1,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get
> (t 1.1,s 1.1) Prof2::lambda$main$2
> (t 1.1,s 1.1) Prof2::lambda$main$0
> (t 1.0,s 1.0) Prof2::lambda$main$1
> (t 0.9,s 0.9) Prof2::lambda$main$3
> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3]
> (t 0.0,s 0.0) java.util.Locale::getInstance
> (t 0.0,s 0.0) AGCT::Not walkable Java[ERR=-6]
> (t 0.0,s 0.0) jdk.internal.loader.BuiltinClassLoader::loadClassOrNull
> (t 0.0,s 0.0) java.lang.ClassLoader::loadClass
> (t 0.0,s 0.0) sun.net.util.URLUtil::urlNoFragString
> (t 0.0,s 0.0) java.lang.Class::forName0
> (t 0.0,s 0.0) java.util.Locale::initDefault
> (t 0.0,s 0.0) jdk.internal.loader.BuiltinClassLoader::loadClass
> (t 0.0,s 0.0) jdk.internal.loader.URLClassPath::getLoader
> (t 0.0,s 0.0) jdk.internal.loader.URLClassPath::getResource
> (t 0.0,s 0.0) java.lang.String::toLowerCase
> (t 0.0,s 0.0) sun.launcher.LauncherHelper::loadMainClass
> (t 0.0,s 0.0) sun.launcher.LauncherHelper::checkAndLoadMain
> (t 0.0,s 0.0) java.util.Locale::<clinit>
> (t 0.0,s 0.0) jdk.internal.loader.BuiltinClassLoader::findClassOnClassPathOrNull
> (t 0.0,s 0.0) jdk.internal.loader.ClassLoaders$AppClassLoader::loadClass
> (t 0.0,s 0.0) java.lang.Class::forName
>
> The program is as follows:
>
> ```
> import java.util.function.Supplier;
>
> public class Prof2 {
>
> public static void main(String[] args) {
> var rand = new java.util.Random(0);
> Supplier[] suppliers = {
> () -> 0,
> () -> 1,
> () -> 2,
> () -> 3,
> };
>
> long sum = 0;
> for (int i = 0; i >= 0; i++) {
> sum += (int)suppliers[i % suppliers.length].get();
> }
> }
> }
> ```
>
> We see that the results are particularely useful in this case as the
> methods are very short (it only returns an integer), and the probability
> of hitting the prolog is then very high.
> - Allow AsyncGetCallTrace and JFR to walk a stub frame
>
> When the signal sent for AsyncGetCallTrace or JFR would land on a stub
> (like arraycopy), it wouldn't be able to detect the sender (caller)
> frame because `_cb->frame_size() == 0`.
>
> Because we fully control how the prolog and epilog of stub code is
> generated, we know there are two cases:
> 1. A stack frame is allocated via macroAssembler->enter(), and consists
> in `push rbp; mov rsp, rbp;`.
> 2. No stack frames are allocated and rbp is left unchanged and rsp is
> decremented with the `call` instruction that push the return `pc` on the
> stack.
>
> For case 1., we can easily know the sender frame by simply looking at
> rbp, especially since we know that all stubs preserve the frame pointer
> (on x86 at least).
>
> For case 2., we end up returning the sender's sender, but that already
> gives us more information than what we have today.
>
> The results are as follows:
>
> - Baseline:
>
> Flat Profile (by method):
> (t 99.4,s 99.4) AGCT::Unknown Java[ERR=-5]
> (t 0.5,s 0.2) Prof1::main
> (t 0.2,s 0.2) java.lang.AbstractStringBuilder::append
> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3]
> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::ensureCapacityInternal
> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::shift
> (t 0.0,s 0.0) java.lang.String::getBytes
> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt
> (t 0.0,s 0.0) java.lang.StringBuilder::delete
> (t 0.2,s 0.0) java.lang.StringBuilder::append
> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::delete
> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt
>
> - With StubRoutinesBlob::FrameParser
>
> Flat Profile (by method):
> (t 98.7,s 98.7) java.lang.AbstractStringBuilder::ensureCapacityInternal
> (t 0.9,s 0.9) java.lang.AbstractStringBuilder::delete
> (t 99.8,s 0.2) Prof1::main
> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3]
> (t 0.0,s 0.0) AGCT::Unknown Java[ERR=-5]
> (t 98.8,s 0.0) java.lang.AbstractStringBuilder::append
> (t 98.8,s 0.0) java.lang.StringBuilder::append
> (t 0.9,s 0.0) java.lang.StringBuilder::delete
>
> The program is as follows:
>
> ```
> public class Prof1 {
>
> public static void main(String[] args) {
> StringBuilder sb = new StringBuilder();
> for (int i = 0; i < 1000000; i++) {
> sb.append("ab");
> sb.delete(0, 1);
> }
> System.out.println(sb.length());
> }
> }
> ```
>
> We now account for the arraycopy stub which is called by
> AbstractStringBuilder::ensureCapacityInternal. It was previously ignored
> because it would not know how to parse the frame for the arraycopy stub
> and would fall in the AGCT::Unknown Java[ERR=-5] section.
>
> However, it still isn't perfect since it doesn't point to the arraycopy stub
> directly.
> - Extract sender frame parsing to CodeBlock::FrameParser
>
> Whether and how a frame is setup is controlled by the code generator
> for the specific CodeBlock. The CodeBlock is then in the best place to know how
> to parse the sender's frame from the current frame in the given CodeBlock.
>
> This refactoring proposes to extract this parsing out of `frame` and into a
> `CodeBlock::FrameParser`. This FrameParser is then specialized in the relevant
> inherited children of CodeBlock.
>
> This change is to largely facilitate adding new supported cases for JDK-8252417
> like runtime stubs.
Hi Markus! I have a lack of expertize in this area and can't be the primary reviewer.
The problem is we have no test coverage to verify fixes.
-------------
PR: https://git.openjdk.java.net/jdk/pull/4436
More information about the serviceability-dev
mailing list