Decreased latency performance with Stack Walker API compared to sun.misc.JavaLangAccess

Fri Oct 20 13:54:57 UTC 2017

Hi Rafael,
stream.iterator() is usually super slow*, did you try with toArray() or forEach() instead ?

Rémi
* you want to see a push based API (Stream) as a pull based API (Iterator)

----- Mail original -----
> De: "Rafael Winterhalter" <rafael.wth at gmail.com>
> À: "core-libs-dev" <core-libs-dev at openjdk.java.net>
> Envoyé: Vendredi 20 Octobre 2017 15:32:33
> Objet: Decreased latency performance with Stack Walker API compared to sun.misc.JavaLangAccess

> Hello,
> 
> a typical patern when reading the stack of the current thread in tooling
> like performance monitoring used to imply the creation of an instance of
> Throwable and to process this instance's attached stack in another thread.
> The performance cost is shared about 10/90 for creating a new throwable
> compared to reading its frames, so this is really a worthy optimization.
> 
> It is also common to use the JavaLangAccess API which offers selective
> access of single frames.
> 
> This API does no longer exist as it was superseeded by the Stack Walker API
> which is of course much safer and even a more performant alternative when
> looking at the total performance. However, using a stack walker, it is no
> longer possible to move the stack processing out of the user thread but it
> must be done at the moment the snapshot of the stack is taken. It turns out
> that this increases latency dramatically when processing stacks compared to
> the asyncronous alternative.
> 
> In a quick benchmark, it seems like walking 35 frames of a 100 frames stack
> allows me 70k operations per second whereas creating a new throwable yields
> about 200k operations per second. Also, within a less isolated test, I can
> infer this additional overhead from the actual latency numbers of a web
> service when using the stack walker API to extract the top 35 frames
> compared to the "old" solution using JavaLangAccess.
> 
> For this reason, it seems to be the best solution to avoid the stack walker
> when aiming for latency at the moment if the stack is not required
> immediately and if access resources are available in other threads.
> 
> I would therefore like to propose to extend the stack walker API to allow
> walking the stack of an existing throwable to allow for similar performance
> as with JavaLangAccess. I understand that the VM must do more work
> altogether. When receving the full stack from a throwable, this takes about
> three times as long. In practice, for a product I am involved in, this
> casues a noticable overhead when running a Java 9 VM compared to Java 8.
> 
> Alternatively, it would of course even be better if one could take a
> snapshot of only the top x frames to walk on this object rather then a
> throwable.
> 
> I have added my benchmarks (snapshot for the current user thread operation,
> complete for the entire processing) into this Gist:
> https://gist.github.com/raphw/96e7c81d7c719cf7991b361bb7266c70
> 
> Thank you for any feedback on my finding!
> 
> Best regards, Rafael