RFC: JFR Recording Context
Ludovic Henry
ludovic.henry at datadoghq.com
Thu Jul 8 07:20:10 UTC 2021
Hello,
As it's been some time already, I want to give an update on where I'm
standing with this prototype today. The good news is there is something
working. You can find the WIP at [1].
With the test case [2], we have the following output (removed some fields
for conciseness, see [3] for full output):
```
Event: jdk.ThreadSleep {
context = [
Key1 -> Key1Value2,
]
}
Event: jdk.ThreadSleep {
context = [
Key2 -> Key2Value1,
Key1 -> Key1Value1,
]
}
Event: jdk.ThreadSleep {
context = [
Key1 -> Key1Value3,
Key2 -> Key2Value3,
]
}
Event: jdk.ThreadSleep {
context = [
Key1 -> Key1Value3,
Key2 -> Key2Value3,
]
}
Event: jdk.ThreadSleep {
context = [
Key1 -> Key1Value4.1,
]
}
Event: jdk.ThreadSleep {
context = [
Key1 -> Key1Value5.1,
Key1 -> Key1Value5.2,
]
}
```
Some things to note:
- I've slightly tweaked the API to better suite practical needs
- the contexts strings are deduplicated and a single additional `long`
field is needed on events enabling support for Context (it works similarly
to StackTrace)
- in the last case, deduplication of keys doesn't work properly, it needs
fixing
- lifetime and propagation of the context is done in java only
- it's trivial today and needs more work
The next steps are to measure overhead over our current CheckpointEvent (as
described in the previous email). I'll come back to you as soon as I've
some more numbers.
I'm still very much looking forward to your feedback on this proposal.
Thank you,
Ludovic
[1] https://github.com/luhenry/jdk/tree/jfr-context
[2]
```
import java.io.File;
import java.io.IOException;
import java.nio.file.Path;
import java.util.List;
import jdk.jfr.Recording;
import jdk.jfr.RecordingContext;
import jdk.jfr.RecordingContextKey;
import jdk.jfr.consumer.RecordedContext;
import jdk.jfr.consumer.RecordedContextEntry;
import jdk.jfr.consumer.RecordedEvent;
import jdk.jfr.consumer.RecordingFile;
public class TestFullContext {
private final static RecordingContextKey contextKey1 =
RecordingContextKey.forName("Key1");
private final static RecordingContextKey contextKey2 =
RecordingContextKey.forName("Key2");
public static void main(String[] args) throws Throwable {
try (Recording recording = new Recording()) {
recording.enable("jdk.ThreadSleep");
recording.start();
try (RecordingContext context =
RecordingContext
.where(contextKey1, "Key1Value1")
.build()) {
Thread.sleep(1);
}
try (RecordingContext context =
RecordingContext
.where(contextKey1, "Key1Value2")
.where(contextKey2, "Key2Value2")
.build()) {
Thread.sleep(1);
}
try (
RecordingContext context1 =
RecordingContext
.where(contextKey1, "Key1Value3")
.build();
RecordingContext context2 =
RecordingContext
.where(contextKey2, "Key2Value3")
.build()) {
Thread.sleep(1);
}
try (
RecordingContext context1 =
RecordingContext
.where(contextKey1, "Key1Value3")
.build();
RecordingContext context2 =
RecordingContext
.where(contextKey2, "Key2Value3")
.build()) {
Thread.sleep(1);
}
try (
RecordingContext context1 =
RecordingContext
.where(contextKey1, "Key1Value4.1")
.where(contextKey1, "Key1Value4.2")
.build()) {
Thread.sleep(1);
}
try (
RecordingContext context1 =
RecordingContext
.where(contextKey1, "Key1Value5.1")
.build();
RecordingContext context2 =
RecordingContext
.where(contextKey1, "Key1Value5.2")
.build()) {
Thread.sleep(1);
}
recording.stop();
for (RecordedEvent event : fromRecording(recording)) {
System.out.println("Event: " + event);
}
}
}
private static List<RecordedEvent> fromRecording(Recording recording)
throws Throwable {
return RecordingFile.readAllEvents(makeCopy(recording));
}
private static Path makeCopy(Recording recording) throws Throwable {
Path p = recording.getDestination();
if (p == null) {
File directory = new File(".");
// FIXME: Must come up with a way to give human-readable name
// this will at least not clash when running parallel.
ProcessHandle h = ProcessHandle.current();
p = new File(directory.getAbsolutePath(), "recording-" +
recording.getId() + "-pid" + h.pid() + ".jfr").toPath();
recording.dump(p);
}
return p;
}
}
```
[3]
```
Event: jdk.ThreadSleep {
startTime = 14:39:27.403
duration = 1.11 ms
time = 1.00 ms
eventThread = "main" (javaThreadId = 1)
stackTrace = [
java.lang.Thread.sleep(long)
TestFullContext.main(String[]) line: 62
]
context = [
Key1 -> Key1Value1,
]
}
Event: jdk.ThreadSleep {
startTime = 14:39:27.405
duration = 1.09 ms
time = 1.00 ms
eventThread = "main" (javaThreadId = 1)
stackTrace = [
java.lang.Thread.sleep(long)
TestFullContext.main(String[]) line: 69
]
context = [
Key2 -> Key2Value2,
Key1 -> Key1Value2,
]
}
Event: jdk.ThreadSleep {
startTime = 14:39:27.406
duration = 1.08 ms
time = 1.00 ms
eventThread = "main" (javaThreadId = 1)
stackTrace = [
java.lang.Thread.sleep(long)
TestFullContext.main(String[]) line: 80
]
context = [
Key1 -> Key1Value3,
Key2 -> Key2Value3,
]
}
Event: jdk.ThreadSleep {
startTime = 14:39:27.407
duration = 1.06 ms
time = 1.00 ms
eventThread = "main" (javaThreadId = 1)
stackTrace = [
java.lang.Thread.sleep(long)
TestFullContext.main(String[]) line: 91
]
context = [
Key1 -> Key1Value3,
Key2 -> Key2Value3,
]
}
Event: jdk.ThreadSleep {
startTime = 14:39:27.409
duration = 1.08 ms
time = 1.00 ms
eventThread = "main" (javaThreadId = 1)
stackTrace = [
java.lang.Thread.sleep(long)
TestFullContext.main(String[]) line: 99
]
context = [
Key1 -> Key1Value4.1,
]
}
Event: jdk.ThreadSleep {
startTime = 14:39:27.410
duration = 1.08 ms
time = 1.00 ms
eventThread = "main" (javaThreadId = 1)
stackTrace = [
java.lang.Thread.sleep(long)
TestFullContext.main(String[]) line: 110
]
context = [
Key1 -> Key1Value5.1,
Key1 -> Key1Value5.2,
]
}
```
On Tue, Jun 15, 2021 at 4:49 PM Ludovic Henry <ludovic.henry at datadoghq.com>
wrote:
> Hello Erik,
>
> Thank you for your suggestion. This is what we do today with what we call
> CheckpointEvent (same as what you suggest with TracingContext, see [1]).
> The main and major problem with this approach is the cost of such a
> solution, especially in async-heavy projects (like Akka, Play Framework, or
> reactive programming in general). In such cases, you can have millions of
> transitions in a very short amount of time, leading to the creation of
> millions of CheckpointEvent while little to no JFR Events carrying useful
> data are generated. There is no heuristics that can predict whether an
> event is going to be generated during the execution of this trace on this
> thread/task so we _need_ to generate a CheckpointEvent. We are improving
> when exactly a CheckpointEvent is being generated to minimize duplication,
> but there are still a large number of these events.
>
> If we are to attach context directly to JFR Events, then we do not need
> this overhead of generating a CheckpointEvent when starting/stopping a
> trace on a thread/task and the only added overhead is a single long field
> on the emitted events. In practice, we observe the size and CPU time
> overhead of generating these CheckpointEvent to be greater than adding this
> single long field.
>
> Let me know if this answers your question.
>
> Thank you,
> Ludovic
>
> [1]
> https://github.com/DataDog/dd-trace-java/blob/master/dd-trace-core/jfr-openjdk/src/main/java/datadog/trace/core/jfr/openjdk/CheckpointEvent.java
>
> On Tue, Jun 15, 2021 at 4:30 PM Erik Gahlin <erik.gahlin at oracle.com>
> wrote:
>
>> Hello Ludovic,
>>
>> To better understand the rationale for this API, let me first go through
>> what can and can’t be done today. You probably know this, but it's good to
>> have everybody on the same page.
>>
>> Using today’s API, it’s possible to create your own user-defined event
>> and have a field where a context ID can be stored. It’s also possible to
>> define you own annotation, i.e. Context, and user-defined setting, i.e.
>> ContextSetting, for turning context tracing on and off.
>>
>> What is not possible, is to add a field to other events, defined in the
>> JVM/JDK or in a third-party library, and populate it with some
>> value/context.
>>
>> What you can do, is to create an event, i.e. TracingContext, with an ID
>> field and call begin() when the context starts and end() when it ends, and
>> then when parsing the recording, see what other events happened, in the
>> same thread, during that event.
>>
>> What would be the problem with such a solution?
>>
>> Thanks
>> Erik
>> ------------------------------
>> *From:* hotspot-jfr-dev <hotspot-jfr-dev-retn at openjdk.java.net> on
>> behalf of Ludovic Henry <ludovic.henry at datadoghq.com>
>> *Sent:* Friday, June 11, 2021 10:09 AM
>> *To:* hotspot-jfr-dev at openjdk.java.net <hotspot-jfr-dev at openjdk.java.net>
>> *Subject:* RFC: JFR Recording Context
>>
>> Hello,
>>
>> I want to gather your feedback on the following proposal to add a concept
>> of Context to JFR Events. This context is initialized by the application
>> and is captured and serialized in relevant JFR events.
>>
>> That is an important feature for us at Datadog to enable multiple
>> scenarios. The main one is the ability to correlate an Event to a specific
>> Trace (it marks the execution of a specific request/transaction across
>> many
>> services). With this correlation, we are able to extract Events which are
>> relevant to a specific Trace, allowing us to give more specific
>> information
>> to our customers.
>>
>> This context would have the following properties:
>> - it is enabled similarly to stacktraces, and would only be captured on
>> specified Events (with a `context=“true”` attribute in the metadata.xml or
>> a `@Context` annotation on the Event class).
>> - the context is captured when committing the Event
>> - the context consist conceptually in a `Map<String,String>`
>> - the entries in the context are defined by the application
>> - there is a maximum number of entries in the context
>> - that setting can be modified through configuration but still in a
>> predetermined range (for example, default of maximum 64 context entries,
>> with a configurable maximum value of 512)
>> - once initialized, the entries are immutable (the value can’t be changed)
>> and keys can’t be removed without overriding the context.
>> - contexts are stackable, a new context doesn’t override an older one but
>> complement it, and closing a context doesn’t close any previous ones.
>> - the relevant context is propagated through tasks, threads, and other
>> async operations in the Class Libraries, similarly to ScopeLocal or
>> ThreadLocal
>> - for external libraries, a capture/restore mechanism is available
>> - it is serialized efficiently in the JFR recording
>> - it is assumed that the keys and values are repeated across many events,
>> similarly to stacktraces
>> - the immutability of keys and values greatly helps caching and
>> compression (with `StringPool` and a `JfrContextRepository`, for example)
>> - the context is accessible from the VM without having to switch to Java
>> - performance would otherwise be poor, and it would limit where and when
>> a
>> native Event can be committed.
>>
>> To achieve these properties, I propose the following public Java API:
>> ```
>> class RecordingContextEntry {
>> public static RecordingContextEntry forName(String name);
>> public static RecordingContextEntry inheritableForName(String name);
>> public boolean isBound();
>> }
>> class RecordingContext implements AutoCloseable {
>> // snapshot + run
>> public static class Snapshot {}
>> public static Snapshot snapshot();
>> public static <R> R callWithSnapshot(Callable<R> op, Snapshot s);
>> public static void runWithSnapshot(Runnable op, Snapshot s);
>> // initialize
>> public static class Builder {
>> // implicitly initialized with current context’s inheritable entries
>> // build and set current context
>> public Builder where(RecordingContextEntry key, String value);
>> public RecordingContext build();
>> }
>> public static Builder builder();
>> // close
>> public void close();
>> }
>> ```
>>
>> Let's take the example of a sample Trace class that uses this
>> RecordingContext class to associate a trace_id to events:
>> ```
>> class Trace implements AutoCloseable {
>> static RecordingContextEntry contextTraceId =
>> RecordingContextEntry.inheritableForName(“trace_id”);
>>
>> RecordingContext context;
>>
>> public Trace() {
>> context =
>> RecordingContext.builder()
>> .where(contextTraceId, this.getTrace().getId())
>> .build();
>> }
>>
>> public void close() {
>> context.close();
>> }
>> }
>> ```
>>
>> Let’s also take a hypothetical async-heavy library which uses a custom
>> threadpool:
>> ```
>> class CustomThreadPool {
>> public void run(Runnable r) {
>> this.workQueue.push(r);
>> }
>> // background threads then pop from this.workQueue
>> // and execute the Runnable.
>> }
>> ```
>>
>> To propagate the context in this threadpool’s threads, the library or any
>> tracing framework would instrument the code with:
>> ```
>> class RunableWithSnapshot implements Runnable {
>> final private Runnable runnable;
>> final private RecordingContext.Snapshot snapshot;
>> RunableWithSnapshot(Runnable runnable) {
>> this.runnable = runnable;
>> this.snapshot = RecordingContext.snapshot();
>> }
>> public void run() {
>> RecordingContext.runWithSnapshot(runnable, snapshot);
>> }
>> }
>> class CustomThreadPool {
>> public void run(Runnable r) {
>> r = new RunnableWithSnapshot(r);
>> this.workQueue.push(r);
>> }
>> // there are then background threads which pop from
>> // this.workQueue and execute the Runnable.
>> }
>> ```
>>
>> The main challenges I have faced when implementing this feature [1] are:
>> - most of the changes are in Hotspot
>> - it’s tempting to rely on the ScopeLocal proposal but it doesn’t provide
>> the required API to integrate to existing TWR-like APIs
>> (try-with-resources).
>> - it changes the serialized format of an Event since it adds a long field
>> for the index of the context in the JfrContextRepository (serialized in
>> each chunk).
>>
>> Thank you, and looking for your feedback.
>> Ludovic
>>
>> [1] Beware, it’s a very early stage proof of concept, it doesn’t fully
>> capture the context, and the implementation is out of date with this API
>> proposal.
>> https://github.com/luhenry/jdk/compare/openjdk:master...jfr-context
>>
>
More information about the hotspot-jfr-dev
mailing list