hg: hsx/hotspot-comp/hotspot: 7013347: allow crypto functions to be called inline to enhance performance

Krystal Mok rednaxelafx at gmail.com
Tue Feb 7 18:05:25 PST 2012


Hi John,

Yes, I've read that blog post in the link before. We'll make sure we
investigate thoroughly and post empirical test results the next time we
post performance-related patches.

Thanks for reminding that :-)

There's a related question that I'm looking for advice. We've seen some of
our Java apps having to call native code extensively, through JNI. The team
that developed the library in native code doesn't have the resource to
re-write the whole thing to a Java version, so the clients from Java side
are stuck with JNI. That's not really "typical workload", but it's
happening on our production site. Running standard benchmarks like SPECjvm
or SPECjbb wouldn't exercise JNI invocations that frequently, so perf
improvements in the JNI wrappers won't show up significantly in these
benchmarks.

How should we benchmark for situations like this? Should we run the
standard benchmarks to show that performance doesn't degrade on typical
workloads, and that it improves in a special case?

Regards,
Kris Mok

On Wed, Feb 8, 2012 at 7:46 AM, John Rose <john.r.rose at oracle.com> wrote:

> On Feb 6, 2012, at 6:20 PM, Krystal Mok wrote:
>
> The original thinking behind modifying the UEP sequence in native wrapper
> is that the more space it takes, the more likely it'll become a code bubble
> that affects i-cache. The patch in the earlier post also modified the place
> where it checks thread state, which should also reduce code bubble for the
> common case.
>
>
> Let's remember to control our tendency (I mean yours, mine, and
> everybody's) to make optimizations prematurely.
>
> Just because a bit of code looks ugly, or a data structure contains a
> surplus byte, does *not* mean it will pay to smooth it out.  Remember the
> following maxim: "If you touch it it will break."[1]  In the case of
> Hotspot, we have many times seen simple, innocuous changes cause
> unanticipated breakage, in either correctness or performance.  To change
> the metaphor to binoculars or camera lenses, vigorously removing the lint
> can scratch the finish.  A well-intentioned change can include
> unintentional costs, including escalating source code complexity,
> surprising performance interactions, noisy skew between versions or across
> ports, or schedule slips due to stress test failures.
>
> In this case, when removing icache bubbles, it would be best to detect
> them in real benchmarks (not micro-benchmarks) and then follow this
> paradigm:  First fix the problems that are known to affect performance.
>  Such work will probably occupy more than half of our professional time,
> because there are plenty of such problems.  To turn around the logic, if
> you see impossibly bubbly code (like nop sequences) in hotspot output, that
> is prima facie evidence that the "bad" code is irrelevant to performance,
> and probably also non-trivial to clean up.
>
> I apologize if you have already done the empirical investigations for this
> case, and I know everyone already knows these generalities.  But I can also
> tell you that, at least in the teams I have been on, if we don't remind
> ourselves from time to time to pick the tasks that matter (and can be
> proved to matter), we will become less effective as engineers.  Nobody
> wants that, so I'm reminding.
>
> In the case of UEP, it seems likely that those instructions are usually
> not in code cache for typical workloads.  A code cache block contains many
> words of header information, immediately preceding the UEP, and all of that
> stuff appears to be irrelevant to performance.  If we want to experiment
> with cache effects, I suggest that a more fruitful line of investigation
> would be to either rigidly segregate hot nmethod code from all other
> nmethod contents, or do cache-line aware assignment (coloring) and padding
> to pack the hot nmethod code (post UEP, almost always) most tightly into
> icache.  I think OS engineers use similar techniques for perturbing thread
> stack addresses, to spread stack frames more evenly across dcache.
>
> (And, to be clear, I do believe passionately in the value of code
> cleanups, and hope that every change can leave the workspace cleaner than
> before.  The best engineering change, IMO, is one which throws away bad
> code and finds a cleaner-and-better solution for a difficult-and-practical
> problem.  Another good change tweaks the system in an obviously correct
> way, without increasing source complexity, to achieve a localized
> beneficial result.  Tom's suggestion of moving the UEP up past the nops
> sounds to me like such a thing.)
>
> Hope this helps...
> -- John
>
> [1] http://blogs.oracle.com/jrose/entry/three_software_proverbs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20120208/6d4bd698/attachment.html 


More information about the hotspot-compiler-dev mailing list