Source.baseURL() overhead when using custom URL protocol scheme / stream handler

Tue Jun 28 07:02:39 UTC 2016

Hi Axel,

Thanks for the explanation and code to reproduce the problem.

I’m looking at it right now.

Hannes

> Am 27.06.2016 um 23:53 schrieb Axel Faust <axel.faust.g at googlemail.com>:
> 
> Hello,
> 
> TL;DR : I use custom URL protocol schemes and stream handlers that are not
> globally registered. This causes excessive handler resolution overhead in
> URL.getURLStreamHandler() called implicitly in Source.baseURL(). I can't
> find a way to avoid this overhead (in JDK 1.8.0_71) without two impossible
> choices: complete refactoring or registering a JVM global
> URLStreamHandlerFactory.
> A test case for sampling the overhead is provided in
> https://gist.github.com/AFaust/04ec0c65a560e306b6b547dcaf38fd21
> 
> 
> 
> This is a follow-up to my tweet of mine from yesterday:
> https://twitter.com/ReluctantBird83/status/747145726703075328
> In this tweet I was commenting on an obversvation I made from CPU sampling
> the current state of my Nashorn-based script engine for the open source ECM
> platform Alfresco (https://github.com/AFaust/alfresco-nashorn-script-engine
> ).
> 
> What prompted the comment where the following hot spot methods from my
> jvisualvm sampling session, when I was testing a trivial ReST endpoint
> backed by a Nashorn-executed script:
> 
> "Hot Spots - Method","Self Time [%]","Self Time","Self Time (CPU)","Total
> Time","Total Time (CPU)","Samples"
> "java.lang.invoke.LambdaForm$MH.771977685.linkToCallSite()","15.152575","793.365
> ms","793.365 ms","1126.483 ms","1126.483 ms","63"
> "java.net.URL.<init>()","11.350913","594.316 ms","594.316 ms","594.316
> ms","594.316 ms","33"
> "java.lang.Throwable.<init>()","7.248728","379.532 ms","379.532
> ms","379.532 ms","379.532 ms","21"
> [...]
> "jdk.nashorn.internal.runtime.Source.baseURL()","0.0","0.0 ms","0.0
> ms","594.316 ms","594.316 ms","33"
> [...]
> 
> The 1st and 3rd hot spot are directly related to frequently called code in
> my scripts / my utilities and somewhat expected, but I was not expecting
> the URL constructor to be up there.
> The backtraces view of the snapshot showed Source.baseURL() as the
> immediate and only caller of the URL constructor, even though I have other
> calls in my code which apparently don't trigger the sampling threshold.
> The total time per execution of the script is around 50-60ms with few
> outliers up to 90-100ms (sampling started only after reasonably stable
> state was reached). Sampling was limited specifically on the jdk.nashorn.*,
> jdk.internal.* and de.* packages.
> 
> A bit of background on my Alfresco Nashorn engine:
> - embedded into a web application that may potentially run in Tomcat or JEE
> servers (JBoss, WebSphere...)
> - JavaScript in Alfresco is extensively used for embedded rules, policies
> (event handling), ReST API endpoints and server-side UI pre-composition
> - use of an AMD-like module system allowing flexible extension of script
> API by 3rd party developers of Alfresco "addons"
> - one file per module, lazily loaded when required by other module or
> executed script
> - frequently used "core" modules will be pre-loaded and cached on startup
> - scripts are referenced via "logical" URLs using custom protocol schemes
> to denote different script resolution and load scopes/mechanisms (example:
> "webscript:///my/module/id" for a module in the lookup scope for ReST
> endpoint scripts; some scripts may be user-managed within the content
> repository / database itself)
> - custom protocol schemes are handled by custom URL stream handlers *NOT*
> globally registered (to avoid interfering with other web applications or
> other URL-related functionality in the same JVM)
> 
> 
> It turns out that the last two points are essential. I created a
> generalised test case in a GitHub gist:
> https://gist.github.com/AFaust/04ec0c65a560e306b6b547dcaf38fd21
> Essentially it is URL.getURLStreamHandler() which is responsible for the
> overhead. The Source.baseURL() creates a "base" name from the source URL
> and if the protocol is not "file://" then a new URL will be created. Since
> I use custom URL stream handlers and have not registered a global stream
> handler factory (and won't ever do so), the new URL will try to resolve the
> handler via URL.getURLStreamHandler(), go through all the hoops and always
> fail in the end. A failed resolution is never cached, so every time
> Source.baseURL() is called this whole process / overhead is repeated.
> 
> 
> I am currently trying to reduce all global overheads of my script engine
> setup, but can't find a way to avoid this overhead without registering a
> global URL stream factory, which is out of the question for various reasons
> (web application; 3rd party loaders; engine-specific semantics) or
> completely refactoring the engine so all scripts are copied to simple
> "file://" before execution (requiring constant sync-checking with original
> script in source storage location).
> 
> Ideally, I would like the see options to provide both a base URL myself as
> pre-resolved information via URLReader/Global.load() and register a custom
> stream handler factory with my Nashorn engine instance. This would allow
> "simple" loaders to use simple URL-Strings instead of real URL instances to
> load script files via Global.load(), as well as "complex" loaders to
> continue using state-ful custom URL stream handlers where necessary. And it
> would allow Nashorn to resolve a potential custom URL stream handler before
> relegating to default JVM global handling if no handler is found.
> 
> I am sure I am not aware of all the implications - and certainly I am aware
> that such a change in a core class might be impossible - but
> URL.getURLStreamHandler() should really cache failed stream handler
> resolutions and avoid repeating the entire lookup routine...
> 
> 
> Kind regards, and sorry for this overly long "summary"
> Axel Faust