RFC: JFR Recording Context
Ludovic Henry
ludovic.henry at datadoghq.com
Tue Jun 15 14:49:35 UTC 2021
Hello Erik,
Thank you for your suggestion. This is what we do today with what we call
CheckpointEvent (same as what you suggest with TracingContext, see [1]).
The main and major problem with this approach is the cost of such a
solution, especially in async-heavy projects (like Akka, Play Framework, or
reactive programming in general). In such cases, you can have millions of
transitions in a very short amount of time, leading to the creation of
millions of CheckpointEvent while little to no JFR Events carrying useful
data are generated. There is no heuristics that can predict whether an
event is going to be generated during the execution of this trace on this
thread/task so we _need_ to generate a CheckpointEvent. We are improving
when exactly a CheckpointEvent is being generated to minimize duplication,
but there are still a large number of these events.
If we are to attach context directly to JFR Events, then we do not need
this overhead of generating a CheckpointEvent when starting/stopping a
trace on a thread/task and the only added overhead is a single long field
on the emitted events. In practice, we observe the size and CPU time
overhead of generating these CheckpointEvent to be greater than adding this
single long field.
Let me know if this answers your question.
Thank you,
Ludovic
[1]
https://github.com/DataDog/dd-trace-java/blob/master/dd-trace-core/jfr-openjdk/src/main/java/datadog/trace/core/jfr/openjdk/CheckpointEvent.java
On Tue, Jun 15, 2021 at 4:30 PM Erik Gahlin <erik.gahlin at oracle.com> wrote:
> Hello Ludovic,
>
> To better understand the rationale for this API, let me first go through
> what can and can’t be done today. You probably know this, but it's good to
> have everybody on the same page.
>
> Using today’s API, it’s possible to create your own user-defined event and
> have a field where a context ID can be stored. It’s also possible to define
> you own annotation, i.e. Context, and user-defined setting, i.e.
> ContextSetting, for turning context tracing on and off.
>
> What is not possible, is to add a field to other events, defined in the
> JVM/JDK or in a third-party library, and populate it with some
> value/context.
>
> What you can do, is to create an event, i.e. TracingContext, with an ID
> field and call begin() when the context starts and end() when it ends, and
> then when parsing the recording, see what other events happened, in the
> same thread, during that event.
>
> What would be the problem with such a solution?
>
> Thanks
> Erik
> ------------------------------
> *From:* hotspot-jfr-dev <hotspot-jfr-dev-retn at openjdk.java.net> on behalf
> of Ludovic Henry <ludovic.henry at datadoghq.com>
> *Sent:* Friday, June 11, 2021 10:09 AM
> *To:* hotspot-jfr-dev at openjdk.java.net <hotspot-jfr-dev at openjdk.java.net>
> *Subject:* RFC: JFR Recording Context
>
> Hello,
>
> I want to gather your feedback on the following proposal to add a concept
> of Context to JFR Events. This context is initialized by the application
> and is captured and serialized in relevant JFR events.
>
> That is an important feature for us at Datadog to enable multiple
> scenarios. The main one is the ability to correlate an Event to a specific
> Trace (it marks the execution of a specific request/transaction across many
> services). With this correlation, we are able to extract Events which are
> relevant to a specific Trace, allowing us to give more specific information
> to our customers.
>
> This context would have the following properties:
> - it is enabled similarly to stacktraces, and would only be captured on
> specified Events (with a `context=“true”` attribute in the metadata.xml or
> a `@Context` annotation on the Event class).
> - the context is captured when committing the Event
> - the context consist conceptually in a `Map<String,String>`
> - the entries in the context are defined by the application
> - there is a maximum number of entries in the context
> - that setting can be modified through configuration but still in a
> predetermined range (for example, default of maximum 64 context entries,
> with a configurable maximum value of 512)
> - once initialized, the entries are immutable (the value can’t be changed)
> and keys can’t be removed without overriding the context.
> - contexts are stackable, a new context doesn’t override an older one but
> complement it, and closing a context doesn’t close any previous ones.
> - the relevant context is propagated through tasks, threads, and other
> async operations in the Class Libraries, similarly to ScopeLocal or
> ThreadLocal
> - for external libraries, a capture/restore mechanism is available
> - it is serialized efficiently in the JFR recording
> - it is assumed that the keys and values are repeated across many events,
> similarly to stacktraces
> - the immutability of keys and values greatly helps caching and
> compression (with `StringPool` and a `JfrContextRepository`, for example)
> - the context is accessible from the VM without having to switch to Java
> - performance would otherwise be poor, and it would limit where and when a
> native Event can be committed.
>
> To achieve these properties, I propose the following public Java API:
> ```
> class RecordingContextEntry {
> public static RecordingContextEntry forName(String name);
> public static RecordingContextEntry inheritableForName(String name);
> public boolean isBound();
> }
> class RecordingContext implements AutoCloseable {
> // snapshot + run
> public static class Snapshot {}
> public static Snapshot snapshot();
> public static <R> R callWithSnapshot(Callable<R> op, Snapshot s);
> public static void runWithSnapshot(Runnable op, Snapshot s);
> // initialize
> public static class Builder {
> // implicitly initialized with current context’s inheritable entries
> // build and set current context
> public Builder where(RecordingContextEntry key, String value);
> public RecordingContext build();
> }
> public static Builder builder();
> // close
> public void close();
> }
> ```
>
> Let's take the example of a sample Trace class that uses this
> RecordingContext class to associate a trace_id to events:
> ```
> class Trace implements AutoCloseable {
> static RecordingContextEntry contextTraceId =
> RecordingContextEntry.inheritableForName(“trace_id”);
>
> RecordingContext context;
>
> public Trace() {
> context =
> RecordingContext.builder()
> .where(contextTraceId, this.getTrace().getId())
> .build();
> }
>
> public void close() {
> context.close();
> }
> }
> ```
>
> Let’s also take a hypothetical async-heavy library which uses a custom
> threadpool:
> ```
> class CustomThreadPool {
> public void run(Runnable r) {
> this.workQueue.push(r);
> }
> // background threads then pop from this.workQueue
> // and execute the Runnable.
> }
> ```
>
> To propagate the context in this threadpool’s threads, the library or any
> tracing framework would instrument the code with:
> ```
> class RunableWithSnapshot implements Runnable {
> final private Runnable runnable;
> final private RecordingContext.Snapshot snapshot;
> RunableWithSnapshot(Runnable runnable) {
> this.runnable = runnable;
> this.snapshot = RecordingContext.snapshot();
> }
> public void run() {
> RecordingContext.runWithSnapshot(runnable, snapshot);
> }
> }
> class CustomThreadPool {
> public void run(Runnable r) {
> r = new RunnableWithSnapshot(r);
> this.workQueue.push(r);
> }
> // there are then background threads which pop from
> // this.workQueue and execute the Runnable.
> }
> ```
>
> The main challenges I have faced when implementing this feature [1] are:
> - most of the changes are in Hotspot
> - it’s tempting to rely on the ScopeLocal proposal but it doesn’t provide
> the required API to integrate to existing TWR-like APIs
> (try-with-resources).
> - it changes the serialized format of an Event since it adds a long field
> for the index of the context in the JfrContextRepository (serialized in
> each chunk).
>
> Thank you, and looking for your feedback.
> Ludovic
>
> [1] Beware, it’s a very early stage proof of concept, it doesn’t fully
> capture the context, and the implementation is out of date with this API
> proposal.
> https://github.com/luhenry/jdk/compare/openjdk:master...jfr-context
>
More information about the hotspot-jfr-dev
mailing list