Add instrumentation in the TemplateInterpreter

Fri Feb 19 18:18:44 UTC 2016

> On Feb 18, 2016, at 1:43 PM, Khanh Nguyen <ktruong.nguyen at gmail.com> wrote:
> 
> Yes, you are correct. This is for a research project. We try to remember those references so that later we can do something similar to a GC to update the references. The base assumption is that we only have a small number of these kind of object so the cost should be acceptable.
> 
> A side reason for staying with the TemplateInterpreter is that I can't convince my team to switch to BytecodeInterpreter in ZeroShark. The unknown performance of ZeroShark is partially responsible for my unability to convince them.
> 
> 

Are you running your experiments interpreted only or in a tiered environment with C1/C2?

> On Feb 18, 2016 2:50 PM, "Christian Thalinger" <christian.thalinger at oracle.com <mailto:christian.thalinger at oracle.com>> wrote:
> 
>> On Feb 18, 2016, at 9:30 AM, Khanh Nguyen <ktruong.nguyen at gmail.com <mailto:ktruong.nguyen at gmail.com>> wrote:
>> 
>> The main reason is the performance difference between the TemplateInterpreter and the BytecodeInterpreter in Zero.
>> I did not verify the difference but I found from this mailing list that the difference is 10x.
>> 
>> 
> 
> But the instrumentation you are adding is quite expensive and I’m assuming this is for research or academia?
> 
>> And since we are talking about Zero. How much is the performance difference between ZeroShark and the standard Hotspot, do you by any chance know?
>> 
>> Thanks
>> 
>> On Feb 18, 2016 11:14 AM, "Christian Thalinger" <christian.thalinger at oracle.com <mailto:christian.thalinger at oracle.com>> wrote:
>> Can you share the reason?
>> 
>>> On Feb 18, 2016, at 8:01 AM, Khanh Nguyen <ktruong.nguyen at gmail.com <mailto:ktruong.nguyen at gmail.com>> wrote:
>>> 
>>> Unfortunately it has to be the Template Interpreter.
>>> 
>>> On Feb 18, 2016 9:27 AM, "Christian Thalinger" <christian.thalinger at oracle.com <mailto:christian.thalinger at oracle.com>> wrote:
>>> Does it have to be the template interpreter or could you do your work with Zero as well?
>>> 
>>> > On Feb 9, 2016, at 11:19 PM, Khanh Nguyen <ktruong.nguyen at gmail.com <mailto:ktruong.nguyen at gmail.com>> wrote:
>>> >
>>> > Hello,
>>> >
>>> > I want to add instrumentation to monitor all reads and writes in the
>>> > TemplateInterpreter, I think I got the correct place for it in
>>> > /cpu/x86/vm/templateTable_x86_64.cpp. Can someone please tell me if I'm
>>> > doing it right?
>>> >
>>> > For writes:
>>> > static void do_oop_store(InterpreterMacroAssembler* _masm,
>>> >                         Address obj,
>>> >                         Register val,
>>> >                         BarrierSet::Name barrier,
>>> >                         bool precise) {
>>> > [...]
>>> > case BarrierSet::CardTableModRef:
>>> >  case BarrierSet::CardTableExtension:
>>> >      {
>>> >        if (val == noreg) {
>>> >          __ store_heap_oop_null(obj);
>>> >        } else {
>>> >          __ store_heap_oop(obj, val);
>>> >
>>> > /*mycodeA*/  __ movptr(c_rarg1, obj.base()); // save this value otherwise
>>> > it will be changed?
>>> >
>>> >          // flatten object address if needed
>>> >          if (!precise || (obj.index() == noreg && obj.disp() == 0)) {
>>> >            __ store_check(obj.base());
>>> > /*mycodeB*/ __ call_VM(noreg, //void
>>> >                       CAST_FROM_FN_PTR(address,
>>> >                                        InterpreterRuntime::write_helper),
>>> >                       c_rarg1,  // obj
>>> >                       c_rarg1, // field address because store check is
>>> > called on field address
>>> >                       val);
>>> >          } else {
>>> >            __ leaq(rdx, obj);
>>> >            __ store_check(rdx);
>>> > /*mycodeC*/ __ call_VM(noreg, //void
>>> >                         CAST_FROM_FN_PTR(address,
>>> >                                          InterpreterRuntime::write_helper),
>>> >                         c_rarg1,  // obj
>>> >                         rdx, // field address, because store check is
>>> > called on field address
>>> >                         val);
>>> >        }
>>> >      }
>>> >      break;
>>> >
>>> > For reads:
>>> > case Bytecodes::_fast_agetfield:
>>> >    __ load_heap_oop(rax, field);
>>> >
>>> > /*mycodeD*/     __ call_VM(noreg,
>>> >               CAST_FROM_FN_PTR(address,
>>> >                                InterpreterRuntime::read_barrier_helper),
>>> >               rax);
>>> >
>>> > __ verify_oop(rax);
>>> >    break;
>>> >
>>> > My questions are:
>>> >
>>> > 1) I thought this represents a putfield a.f=b where a.f is represented by
>>> > the parameter obj of type Address. b is obvious the parameter val of type
>>> > Register. Especially in obj there are fields: base, index and disp. But as
>>> > I run this, looks like obj is actually the field address. (the case mycodeB)
>>> > I haven't found a test case that can trigger the case mycodeC to see the
>>> > behavior (i.e., rdx might get destroyed and I got random value back or
>>> > c_rarg1 is the obj address and rdx is field address)
>>> >
>>> > 2) Before this, I tried to insert the same __ call_VM in fast_aputfield
>>> > before do_oop_store but it results in JVM crash. I don't understand the
>>> > reason why. What I did in the call is just print the parameters. I did see
>>> > the values printed (only the 1st time it goes to the method) but then the
>>> > VM crashed. I thought __ call_VM will preserve all registers's value and
>>> > restore properly when comes back. My instrumentation has no side effect, I
>>> > just observe and record the values (actually just printing the values to
>>> > test).
>>> >
>>> > 3) Is it strictly required to have the line /*mycodeA*/ I tried to, in
>>> > mycodeB line, pass obj.base() twice and it got build errors for "smashed
>>> > args"?
>>> >
>>> > I greatly appreciate your time,
>>> >
>>> > Best,
>>> >
>>> > Khanh Nguyen