RFR: 8268829: Provide an optimized way to walk the stack with Class object only

Mandy Chung mchung at openjdk.org
Mon Aug 21 20:37:47 UTC 2023


8268829: Provide an optimized way to walk the stack with Class object only

`StackWalker::walk` creates one `StackFrame` per frame and the current implementation
allocates one `StackFrameInfo` and one `MemberName` objects per frame. Some frameworks
like logging may only interest in the Class object but not the method name nor the BCI,
for example, filters out its implementation classes to find the caller class.  It's
similar to `StackWalker::getCallerClass` but allows a predicate to filter out the element.

This PR proposes to add `Option.NO_METHOD_INFO` new stack walking option.
If no method information is needed, this option can be used such that the
stack walker will save the overhead (1) to extract the method information
and (2) the memory used for the stack walking.   In addition, this can also fix

- [8311500](https://bugs.openjdk.org/browse/JDK-8311500): StackWalker.getCallerClass() throws UOE if invoked reflectively

Adding `NO_METHOD_TIME` option provides a simple way for existing code,
for example logging frameworks, to take advantage of this enhancement with
the least change as it can keep the existing implementation in traversing
`StackFrame`s.

For example: to find the first caller filtering a known list of implementation class,
existing code can just add `NO_METHOD_INFO` in the call to `StackWalker::getInstance`
to create a stack walker instance:


     StackWalker walker = StackWalker.getInstance(Option.RETAIN_CLASS_REFERENCE, NO_METHOD_INFO);
     Optional<Class<?>> callerClass = walker.walk(s ->
             s.map(StackFrame::getDeclaringClass)
              .filter(interestingClasses::contains)
              .findFirst());


If method information is accessed on the `StackFrame`s produced by this stack walker such as
`StackFrame::getMethodName`, then `UnsupportedOperationException` will be thrown.

The alternative considered is to provide a new API:
`<T> T walkClass(Function<? super Stream<Class<?>, ? extends T> function)`

In this case, the caller would need to pass a function that takes a stream
of `Class` object instead of `StackFrame`.  Existing code would have to
modify calls to the `walk` method to `walkClass` and the function body.

### Implementation Details

If `NO_METHOD_NAME` is set, the implementation creates `ClassFrameInfo[]`
buffer that is filled by the VM during stack walking.   `ClassFrameInfo` holds the 
Class instance plus `flags`  which indicate if it's caller sensitive or hidden.  
With this change, `StackWalker::getCallerClass` can also use `ClassFrameInfo[]` buffer
to replace the special `Class` buffer and remove the special check in the VM. 
JDK-8311500 can be fixed in Java.

If `NO_METHOD_NAME` is not set, the implementation creates `StackFrameInfo[]` buffer. 
`StackFrameInfo` is a subclass of `ClassFrameInfo`.  It keeps `ResolvedMethodName` and all other method information.

### Performance

The memory usage of the data structure (shown by -XX:PrintFieldLayout):

| Type | Instance size |
| ---- | ------------ |
| `ClassFrameInfo` | 24 bytes |
| `StackFrameInfo` | 48 bytes |
| `ResolvedMethodName` | 24 bytes |
| `MemberName` | 48 bytes |
| `StackFrameInfo` (old) | 32 bytes |

The existing implementation allocates a total of 104 bytes for each frame
 (`StackFrameInfo` + `MemberName` + `ResolvedMethodName`).
The new implementation, without `NO_METHOD_INFO` option, allocates a total of 72 bytes
for each frame (`StackFrameInfo` + `ResolvedMethodName`) which saves 30% of the
buffer memory.
With `NO_METHOD_INFO`, 24 bytes is allocated for each frame (only `ClassFrameInfo` is needed).
In addition, it saves the overhead in creating `ResolvedMethodName` object in the VM.

The microbenchmark shows that the runtime performance of stack walking with method information
is 15-31% faster than the old implementation.   Using `NO_METHOD_INFO` option, it
is about 21-43% faster compared to traversing the frames without `NO_METHOD_INFO`
in the new implementation.

#### `StackWalker::getCallerClass`

`StackWalkBench::getCallerClass` shows about 30-60 ns degradation.    The old implementation creates
a `Class` array of size 8 whereas the new implementation which creates a `ClassFrameInfo` array
of size 8 initialized with 6 `ClassFrameInfo` elements.    The new implementation of `getCallerClass`
adds 144 bytes more, which is insignificant.   The benefit of this is to do the filtering in Java rather 
than doing in VM.

`StackWalkBench::getCallerClass` includes the cost of setting up the call stack.  There is no good way
to write a JMH benchmark just to measure the performance of `getCallerClass`.   If I measured `getCallerClass`
with a standalone benchmark, it shows `getCallerClass` takes roughly about 8500-10000 ns range on
my macos-aarch64 machine.  The 30-60 ns degradation is negligible.

-------------

Commit messages:
 - fix trailing whitespace
 - Merge branch 'master' of https://github.com/openjdk/jdk into stackwalker-class
 - Update LocalLongHelper.java
 - clean up
 - Merge branch 'master' of https://github.com/openjdk/jdk into stackwalker-class
 - Add StackWalker.Option.NO_METHOD_INFO
 - Merge branch 'master' of https://github.com/openjdk/jdk into stackwalker-class
 - Merge branch 'master' of https://github.com/openjdk/jdk into stackwalker-class
 - Merge branch 'master' of https://github.com/openjdk/jdk into stackwalker-class
 - Merge branch 'master' of https://github.com/openjdk/jdk into stackwalker-class
 - ... and 3 more: https://git.openjdk.org/jdk/compare/78f74bc8...e9e2b390

Changes: https://git.openjdk.org/jdk/pull/15370/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15370&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8268829
  Stats: 1026 lines in 30 files changed: 628 ins; 178 del; 220 mod
  Patch: https://git.openjdk.org/jdk/pull/15370.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15370/head:pull/15370

PR: https://git.openjdk.org/jdk/pull/15370


More information about the core-libs-dev mailing list