RFR: 8333182: Add truncated tracing mode for TraceBytecodes

Thu May 30 08:20:02 UTC 2024

On Wed, 29 May 2024 15:37:07 GMT, Claes Redestad <redestad at openjdk.org> wrote:

> We routinely use `-XX:+TraceBytecodes` to instrument what's going on during application startup. However, much of the information traced in this mode is not really used, and adding a mode which truncates non-relevant information would be helpful to speed up instrumentation, especially on larger apps.
> 
> To illustrate, running `-XX:+TraceBytecodes` on a version of netty can produce more than 844MB of output:
> 
> 14759490 bytecodes executed in 151.1s (0.098MHz)
> [BytecodeCounter::counter_value = 14759531]
> 
> 
> A sample output block looks like this:
> 
> [34051] static jobject sun.reflect.annotation.AnnotationParser.parseAnnotations2(jobject, jobject, jobject, jobject)
> [34051] 14759490    58  astore #9
> [34051] 14759491    60  aload #9
> [34051] 14759492    62  invokestatic 57 
> ... 19 more lines
> [34051] 14759512    41  invokestatic 47 <sun/reflect/annotation/AnnotationParser.parseAnnotation2(Ljava/nio/ByteBuffer;Ljdk/internal/reflect/ConstantPool;Ljava/lang/Class;Z[Ljava/lang/Class;)Ljava/lang/annotation/Annotation;> 
> 
> [34051] static jboolean sun.reflect.annotation.AnnotationParser.contains(jobject, jobject)
> [34051] 14759513     0  nofast_aload_0
> [34051] 14759514     1  astore_2
> [34051] 14759515     2  aload_2
> ...
> 
> 
> This PR add a develop flag, `TraceBytecodesTruncated` meant to be used in addition to `TraceBytecodes`. It has the effect that all but the first bytecode in each method transition as well as returns and throws will be omitted. With the patch the output may look like this:
> 
> 
> [42243] static jobject sun.reflect.annotation.AnnotationParser.parseAnnotations2(jobject, jobject, jobject, jobject)
> [42243] 14858715    58  astore
> 
> [42243] static jboolean sun.reflect.annotation.AnnotationParser.contains(jobject, jobject)
> [42243] 14858737     0  nofast_aload_0
> [42243] 14858772    35  ireturn
> 
> 
> This truncated output is sufficient for certain purposes such as when digesting the output with [bytestacks](https://github.com/cl4es/bytestacks).
> 
> Running the same application with the patch and the added flag, `-XX:+TraceBytecodes -XX:+TraceBytecodesTruncated` generates a file that is 177MB, and in about a third of the time:
> 
> 14867664 bytecodes executed in 52.4s (0.284MHz)
> [BytecodeCounter::counter_value = 14867704]
> 
> 
> This enhancement would greatly speed up some of our diagnostics, and enable using these tools on even larger apps.

The tracer code isn't all that large or convoluted, but some of the printing code is shared with `Method::print_codes_on`, which is used in a number of places in product code. Some of the use cases of that, like the one in idealGraphPrinter.cpp, doesn't seem easily portable to UL since we're streaming the output into some XML document. Of course we could disentangle all that if we accept a bit of code duplication. 

Porting to UL also adds a bit of (optional) logging to each line, growing the output and requiring parser changes. Perhaps there are some ways to tame UL to output exactly - or almost exactly - the same thing, but regardless it definitely complicates parsing. That also begs the question if we need to retain `-XX:+TraceBytecodes` and the exact format it outputs in case there are other consumers out there. I don't know. 

I don't have a strong opinion. I do _really_ want this feature in ASAP (to speed up my work and make the startup tracing useful on larger apps), but I have no idea how high of a priority it is to port this to UL.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19457#issuecomment-2138967875