Decreased latency performance with Stack Walker API compared to sun.misc.JavaLangAccess

Fri Oct 20 14:03:35 UTC 2017

Hi Remi,
thank you for your quick reply! I just changed the benchmark to use:

stackFrameStream.skip(startFrame).limit(stackDepth).collect(Collectors.toList());

and it yields a small improvement.
Best regards, Rafael

2017-10-20 15:54 GMT+02:00 Remi Forax <forax at univ-mlv.fr>:

> Hi Rafael,
> stream.iterator() is usually super slow*, did you try with toArray() or
> forEach() instead ?
>
> Rémi
> * you want to see a push based API (Stream) as a pull based API (Iterator)
>
> ----- Mail original -----
> > De: "Rafael Winterhalter" <rafael.wth at gmail.com>
> > À: "core-libs-dev" <core-libs-dev at openjdk.java.net>
> > Envoyé: Vendredi 20 Octobre 2017 15:32:33
> > Objet: Decreased latency performance with Stack Walker API compared to
> sun.misc.JavaLangAccess
>
> > Hello,
> >
> > a typical patern when reading the stack of the current thread in tooling
> > like performance monitoring used to imply the creation of an instance of
> > Throwable and to process this instance's attached stack in another
> thread.
> > The performance cost is shared about 10/90 for creating a new throwable
> > compared to reading its frames, so this is really a worthy optimization.
> >
> > It is also common to use the JavaLangAccess API which offers selective
> > access of single frames.
> >
> > This API does no longer exist as it was superseeded by the Stack Walker
> API
> > which is of course much safer and even a more performant alternative when
> > looking at the total performance. However, using a stack walker, it is no
> > longer possible to move the stack processing out of the user thread but
> it
> > must be done at the moment the snapshot of the stack is taken. It turns
> out
> > that this increases latency dramatically when processing stacks compared
> to
> > the asyncronous alternative.
> >
> > In a quick benchmark, it seems like walking 35 frames of a 100 frames
> stack
> > allows me 70k operations per second whereas creating a new throwable
> yields
> > about 200k operations per second. Also, within a less isolated test, I
> can
> > infer this additional overhead from the actual latency numbers of a web
> > service when using the stack walker API to extract the top 35 frames
> > compared to the "old" solution using JavaLangAccess.
> >
> > For this reason, it seems to be the best solution to avoid the stack
> walker
> > when aiming for latency at the moment if the stack is not required
> > immediately and if access resources are available in other threads.
> >
> > I would therefore like to propose to extend the stack walker API to allow
> > walking the stack of an existing throwable to allow for similar
> performance
> > as with JavaLangAccess. I understand that the VM must do more work
> > altogether. When receving the full stack from a throwable, this takes
> about
> > three times as long. In practice, for a product I am involved in, this
> > casues a noticable overhead when running a Java 9 VM compared to Java 8.
> >
> > Alternatively, it would of course even be better if one could take a
> > snapshot of only the top x frames to walk on this object rather then a
> > throwable.
> >
> > I have added my benchmarks (snapshot for the current user thread
> operation,
> > complete for the entire processing) into this Gist:
> > https://gist.github.com/raphw/96e7c81d7c719cf7991b361bb7266c70
> >
> > Thank you for any feedback on my finding!
> >
> > Best regards, Rafael
>