Native wrapper optimization
Krystal Mok
rednaxelafx at gmail.com
Sun Nov 20 20:53:49 PST 2011
Hi all,
(Just in case my company email strips attachment again, I'm replying with
my personal email)
I've got the patch ported to 32-bit x86. See attachment.
Additional comments about the patch:
Joseph's original patch moves the IC miss jump out-of-line, but on x64 with
compressed oops, that doesn't really save space in the unverified entry
point code sequence, due to the 8-byte alignment. Examples in [1].
The version in this mail's attachment uses jump_cc() inline instead of
jcc() and a out-of-line jump(). The UEP code generated by both C1 and C2
uses the same pattern.
There's a similar pattern in generate_i2c2i_adapters() that could have used
jump_cc() to call the ic_miss_stub. But the gains doesn't look significant
enough so I didn't modify it.
Another note:
In x64's version of SharedRuntime::generate_dtrace_nmethod(), the IC check
isn't using load_klass().
__ verify_oop(receiver);
__ cmpl(ic_reg, Address(receiver, oopDesc::klass_offset_in_bytes()));
__ jcc(Assembler::equal, hit);
Is this correct, or should it be modified to use load_klass(), too? My take
is the latter.
load_klass() was introduced in [2], and later, generate_dtrace_nmethod()
was introduced in [3]. I think [3] missed the compressed oops changes.
Regards,
Kris Mok
Software Engineer, Taobao (http://www.taobao.com)
[1]: https://gist.github.com/1380416#file_notes.md
[2]:
http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/ba764ed4b6f2/src/cpu/x86/vm/sharedRuntime_x86_64.cpp
[3]:
http://hg.openjdk.java.net/hsx/hotspot-main/hotspot/file/018d5b58dd4f/src/cpu/x86/vm/sharedRuntime_x86_64.cpp
2011/11/18 changren <changren at taobao.com>
> Ok, Kris will help to port to 32bit.
> Thanks,
> Joseph
>
> ÓÚ 2011-11-18 17:09, Christian Thalinger дµÀ:
> > Looks like a good patch to me. What about 32-bit x86?
> >
> > -- Chris
> >
> > On Nov 18, 2011, at 7:39 AM, changren wrote:
> >
> >> Hi, all
> >> Attached patch(diff with hsx20) is supposed to speed up native
> >> invocation. It rearranges the compiled-to-native wrapper code to
> >> straighten branches which improves spatial locality. Micro
> >> benchmark(500m consecutive JNI invocations with warm up) shows the
> >> stalled CPU cycles caused by instruction fetch due to L1 ICache miss
> >> decrease 3.4% on Intel Nehalem microarchitecture and 9.6% on Core
> >> microarchitecture. The real execution time of the micro benchmark is
> >> also decreased 5-10% respectively which reflects the improvement.
> >> Thanks,
> >> Joseph
> >>
> >>
> >> ________________________________
> >>
> >> This email (including any attachments) is confidential and may be
> legally privileged. If you received this email in error, please delete it
> immediately and do not copy it or use it for any purpose or disclose its
> contents to any other person. Thank you.
> >>
> >>
> ±¾µçÓÊ(°üÀ¨Èκθ½¼þ)¿ÉÄܺ¬ÓлúÃÜ×ÊÁϲ¢ÊÜ·¨Âɱ£»¤¡£ÈçÄú²»ÊÇÕýÈ·µÄÊÕ¼þÈË£¬ÇëÄúÁ¢¼´É¾³ý±¾Óʼþ¡£Çë²»Òª½«±¾µçÓʽøÐи´ÖƲ¢ÓÃ×÷ÈÎºÎÆäËûÓÃ;¡¢»ò͸¶±¾ÓʼþÖ®ÄÚÈÝ¡£Ð»Ð»¡£
> >> <JNIWrapperOpt.patch>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-dev/attachments/20111121/68f7eb42/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: JNI_wrapper_ver2.patch
Type: application/octet-stream
Size: 7936 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-dev/attachments/20111121/68f7eb42/JNI_wrapper_ver2.patch
More information about the hotspot-dev
mailing list