Performance issue with Nashorn and C2's global code motion

Vladimir Kozlov vladimir.kozlov at oracle.com
Thu Sep 10 20:15:55 UTC 2015


Hi Martin,

It is first time I am hearing about this method's performance problem.
That code was not changed for very long time since we never thought we need to optimize it.

   // '_stack' is emulating a real _stack.  The 'visit-all-users' loop has been
   // made stateless, so I do not need to record the index 'i' on my _stack.
   // Instead I visit all users each time, scanning for unvisited users.

May be we can optimize it by not going through all users each time. We should file RFE for this.

We would need to know how to reproduce it.

Thanks,
Vladimir

On 9/10/15 5:17 AM, Doerr, Martin wrote:
> Hi,
>
> we were running Octane benchmark and noticed a very significant performance drop with JVMTI.
>
> VTune measurement showed that the JVM has spent the majority of the whole CPU time in Node_Backward_Iterator::next
> during PhaseCFG::schedule_late when JvmtiExport::_can_access_local_variables is on
>
> (see http://cr.openjdk.java.net/~mdoerr/OctaneVTune.jpg).
>
> We were using openjdk 8 with/without the following option:
>
> -agentlib:jdwp=transport=dt_socket,address=8000,server=y,suspend=n
>
> This option activates the JVMTI capability can_access_local_variables which prevents C2 from killing dead locals leading
> to a higher number of edges in the graph.
>
> If we don’t use this option PhaseCFG::schedule_late does no longer play a significant role regarding the CPU time.
>
> Have you noticed this before? Is this of interest to you?
>
> For us, this is a significant issue, as we have can_access_local_variables on by default.
>
> As a solution we could think of limiting the node iterations in schedule_late and generating a quicker and less
> optimized schedule in extreme cases.
>
> Best regards,
>
> Martin
>


More information about the hotspot-compiler-dev mailing list