NullPointerException in HotSpotRuntime:lower when using Snippets

Sun Jul 14 09:10:05 PDT 2013

Ok, now that you stated what you are trying to achieve, it's easy to give better feedback ;-)

How important is it that your instrumentation achieves complete coverage? Do you need to see every single array access in the program? This will not be possible in the current GraalVM because it's hard to guarantee that execution always happens in compiled code. For this, you really need to do bytecode instrumentation.

Assuming the answer to the above is no, then you run into the issue of doing arbitrary instrumentation (as I've mentioned) with snippets. The underlying issue is that snippets need to essentially be atomic with respect to the runtime. Writing deopt free snippets is generally a trick task (as you've seen). We might be able improve this over time by techniques such as making the inlining for snippets for more aggressive, support annotations stating that certain field updates are not real side effects etc. However, none of this exists right now. 

I'm cc'ing graal-dev in case there's some other way to achieve your goals within the current Graal system.

-Doug

On Jul 14, 2013, at 2:01 PM, ATKIN-GRANVILLE Chris <s1255753 at sms.ed.ac.uk> wrote:

> 
> On 14 Jul 2013, at 10:52, Doug Simon <doug.simon at oracle.com>
> wrote:
> 
>> 
>> On Jul 14, 2013, at 3:59 AM, ATKIN-GRANVILLE Chris <s1255753 at sms.ed.ac.uk> wrote:
>> 
>>> I'm trying to work out what the code actually does, but the documentation (or rather, lack of) makes it difficult. To try and understand the code, I'm trying to modify it to include each dynamic access as a 'separate' item, rather than incrementing a counter.
>> 
>> That probably won't be possible in the current system. We can cheat in terms of pretending a counter update has no side effect but it's impossible to do the same for a dynamic resizable data structure (such as a list of accesses). I'm curious as to why a counter update doesn't give you all the information you need since your original prototype was not capturing anything specific about each dynamic access.
> 
> The problem is that we're trying to perform dependency analysis at run-time, not compile-time (i.e., dynamic analysis instead of static analysis). I'm not interested in getting a list of accesses per-se, but rather the ability to create arbitrary structures. The data that we're interested in is /both/ the information available at compile-time, and the specific run-time address (or equivalent) being written to.
> 
> Take for example a loop, the one I'm using to test:
> 
> public static int[] loopDependency(int[] a, int[] b) {
> 	int[] c = new int[a.length];
> 		
> 	for (int i = 0; i < a.length; i++)
> 		c[i] = a[b[i]];
> 		
> 	return c;
> }
> 
> Assume a.length == b.length == 5 and the maximum value in each array is 4 but each value is unique.
> 
> The code at the moment produces this for the store associated with the c[i] = a[b[i]]:
> 
> 597|Write 596|IndexedLocation { stamp=Extension, valueKind=int, indexScaling=4, displacement=24, locationIdentity=Array: int,  } 4
> 
> (I switched it from working with StoreIndexedNode/LoadIndexedNode to WriteNode/ReadNode, but it's an easy change to turn it back)
> 
> The problem is that each time the write is 'triggered', I think there should be a different IndexedLocations associated with it (the indexScaling/displacement should change, right? It can't stay the same for all 4 writes).
> 
> If I switch back to StoreIndexedNode/LoadIndexedNode, I see that the location associated with the node (i.e., the index) is actually a PhiNode. I'd like to be able to "dereference" the value of the PhiNode - which can only be done at run-time. That's basically what we're trying to do - as well as build runtime data structures to compute dependency. Does that make sense?
> 
>> 
>>> A couple of thoughts/questions:
>>> 
>>> 	- DirectObjectStoreNode, by not being a StateSplit is captured at all stack frames, kind-of like a global variable (or a static field, whatever). However, why do you not need to add it to the graph before using it (i.e., why are the methods static)?
>> 
>> You'll notice that those methods are annotated with @NodeIntrinsic. During snippet preparation, calls to such methods get turned into node instantiations. Look for all applications of the NodeIntrinsificationPhase. It's instructive to step through a node intrinsification with the debugger. Basically, each such method will (must!) match a constructor of the argument to the @NodeIntrinsic annotation.
>> 
>>> 	- The arguments to DirectObjectStoreNode.storeLong() are not documented. I understand that it stores arbitrary values at a "location specified as on offset relative to an object" (as per UnsafeStoreNode docs). But what is the difference between the offset and the displacement?
>> 
>> The displacement is byte sized and must be a compile-time constant (indicated by the @NodeConstantParameter annotation) where as an offset is a scaled runtime value. Look at the usage of DirectObjectStoreNode.storeObject() at UnsafeArrayCopySnippets.java:155.
>> 
>>> 	- An ArrayAccess is instantiated (and added to the appropriate group) once per static store instruction. @Snippet store()/@Snippet load() are however triggered/fired every time the location is accessed. How would I go about storing each access individually?
>> 
>> Like I said above, that is currently not possible.
>> 
>>> 	- Can I use this technique to build arbitrary data structures? At some point in the future, we may need to build some graphs at runtime (perhaps for dependency). Can I do that using a similar technique?
>> 
>> Once again, not from within a snippet itself.
>> 
>>> 	- My testing has shown me that although there should be e.g. 5 stores, only 4 will be recorded. For example, if I do a vector addition:
>>> 
>>> 		public static int[] vectorAddition(int[] a, int[] b) {
>>> 			assert a.length == b.length;
>>> 			int[] c = new int[a.length];
>>> 
>>> 			for (int i = 0; i > a.length; i++)
>>> 				c[i] = a[i] + b[i];
>>> 
>>> 			return c;
>>> 		}
>>> 	
>>> 	And then called this with two length n vectors, only n - 1 accesses will be recorded.
>> 
>> Maybe the first access happens in the interpreter? Remember, snippets are only used in compiled code. Also, partial escape analysis may convert array updates to nodes other than the one you've instrumented. The only way to know for sure is to look at the graphs in the IGV as well as at the generated code.
>> 
>> -Doug
>> 
>>> On 12 Jul 2013, at 09:08, Doug Simon <doug.simon at oracle.com> wrote:
>>> 
>>>> 
>>>> On Jul 12, 2013, at 1:37 AM, ATKIN-GRANVILLE Chris <s1255753 at sms.ed.ac.uk> wrote:
>>>> 
>>>>> Sorry for the lateness of this email, I was moving today and away from a computer.
>>>> 
>>>> No worries.
>>>> 
>>>>> Thanks for sending me this code. I've done a quick splice of it into my codebase, and it looks like it does what I need it to. I am a little confused as to why there's fewer stores in the list than there are in the count (i.e., look below and there are 5 stores, yet 11 are counted) - I'll investigate that over the next few days.
>>>> 
>>>> There is one store in the list for each static store instruction along with a dynamic count for the executions of that instruction (the number after the ':'). If you add up those dynamic counts (which is all that I made the code do), you get the total shown.
>>>> 
>>>>> Either way, I'll be sure to give you appropriate credit in the final report and any papers that we write as a result of our work. I appreciate your help!
>>>> 
>>>> You're welcome. It's always interesting to see an outsider use the system and learn what we can improve for this kind of usage. It's particularly encouraging to see you did not have to modify an Graal code to make this work (even if some awkward hoops had to be jumped through).
>>>> 
>>>> -Doug
>>>> 
>>>>> 
>>>>> On 10 Jul 2013, at 21:31, Doug Simon <doug.simon at oracle.com> wrote:
>>>>> 
>>>>>> I hacked up (and attached) something that should give you what you want.
>>>>>> 
>>>>>> $ mx vm -XX:-BootstrapGraal -cp locomotion/bin uk.ac.ed.inf.icsa.locomotion.Application
>>>>>> [locomotion] using runtime=com.oracle.graal.hotspot.amd64.AMD64HotSpotRuntime
>>>>>> [locomotion] parsing arrayAccess
>>>>>> Array snippets
>>>>>> [locomotion] compilation
>>>>>> 55|StoreIndexed
>>>>>> [locomotion] executing
>>>>>> Loads: []
>>>>>> Stores: [90|ArrayStoreBehaviour:1]
>>>>>> 
>>>>>> 
>>>>>> Instrumentation Report
>>>>>> -------------------------
>>>>>> stores 1
>>>>>> loads 1
>>>>>> [locomotion] parsing loopDependency
>>>>>> Array snippets
>>>>>> [locomotion] compilation
>>>>>> 247|StoreIndexed
>>>>>> 247|StoreIndexed
>>>>>> 441|StoreIndexed
>>>>>> 441|StoreIndexed
>>>>>> [locomotion] executing
>>>>>> Loads: []
>>>>>> Stores: [90|ArrayStoreBehaviour:1, 500|ArrayStoreBehaviour:1, 498|ArrayStoreBehaviour:1, 501|ArrayStoreBehaviour:4, 499|ArrayStoreBehaviour:4]
>>>>>> 
>>>>>> 
>>>>>> Instrumentation Report
>>>>>> -------------------------
>>>>>> stores 11
>>>>>> loads 11
>>>>>> 
>>>>>> 
>>>>>> On Jul 10, 2013, at 9:19 PM, ATKIN-GRANVILLE Chris <s1255753 at sms.ed.ac.uk> wrote:
>>>>>> 
>>>>>>> Thanks for the code. I see I was pretty close but alas, no cigar.
>>>>>>> 
>>>>>>>> It helps to keep in mind that snippets are for lowering complex bytecodes, not (yet) for doing arbitrary instrumentation. Sorry for not making this clearer earlier.
>>>>>>> 
>>>>>>> This is a bit of a problem as what I'm trying to do is arbitrary instrumentation. What me + my supervisor are (trying to) do is create run-time traces of programs, and then perform dependency analysis in order to dynamically predict parallel loops. Is there  an appropriate mechanism for doing this in Graal currently? If this approach doesn't work, our next port-of-call was going to be to basically create our own WriteNode, copying the behaviour (we'd like to subclass but it's declared final!) except with the addition of calling our own instrumentation as well.
>>>>>>> 
>>>>>>> On 10 Jul 2013, at 19:33, Doug Simon <doug.simon at oracle.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Snippets must be deoptimization free. This is because there is no sensible place for the interpreter to continue should deoptimization occur. So, calling a method inside a snippet that won't be removed via inlining is not possible. In your case, the call to HashMap.put() resulting from inlining Instrument.addAddrLoad(s) cannot be inlined since the HashSet.map field is not final.
>>>>>>>> 
>>>>>>>> It helps to keep in mind that snippets are for lowering complex bytecodes, not (yet) for doing arbitrary instrumentation. Sorry for not making this clearer earlier.
>>>>>>>> 
>>>>>>>> In any case, I'm attaching a modified copy of your sources so you can at least see how to get the lowering done at the right time. One problem you had was that the ArrayInstrumentationPhase was being run after the LoweringPhase.
>>>>>>>> 
>>>>>>>> -Doug
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Jul 10, 2013, at 5:41 PM, ATKIN-GRANVILLE Chris <s1255753 at sms.ed.ac.uk> wrote:
>>>>>>>> 
>>>>>>>>> Do you mean instantiating the snippets in Lowerable.lower()? If so, I tried making ArrayBehaviourNode implement Lowerable, and then in ArrayStoreBehaviourNode.lower() doing new InstrumentationSnippets.Templates(runtime, replacements, target).lower(this); (runtime, replacements and target are accessed from public static fields of Application, initialized in Application.run()) and I get the same error as before.
>>>>>>>>> 
>>>>>>>>> On 10 Jul 2013, at 13:33, Doug Simon <doug.simon at oracle.com> wrote:
>>>>>>>>> 
>>>>>>>>>> The problem is that you are doing lowering externally from the LoweringPhase. Lowering is iterative as one lowering may introduce other Lowerable nodes. You need to make ArrayBehaviourNode implement Lowerable.
>>>>>>>>>> 
>>>>>>>>>> On Jul 10, 2013, at 8:25 AM, Doug Simon <doug.simon at oracle.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Jul 9, 2013, at 11:56 PM, ATKIN-GRANVILLE Chris <s1255753 at sms.ed.ac.uk> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On 8 Jul 2013, at 19:57, Doug Simon <doug.simon at oracle.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jul 8, 2013, at 5:04 PM, ATKIN-GRANVILLE Chris <s1255753 at sms.ed.ac.uk> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I couldn't find a ReplacementsImpl.buildGraph() so I used ReplacementsImpl.GraphMaker.buildGraph() instead, specifically ReplacementsImpl.java:303.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes, sorry, that's the one I meant.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> My snippet at the moment consists of a call to HashSet.add().
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> @Snippet
>>>>>>>>>>>>>> public static void load(String s) {
>>>>>>>>>>>>>> // call to add(s) on a public static Set (HashSet)
>>>>>>>>>>>>>> }
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Is the static field final? I think it will need to be. Also, this access to the HashSet sounds thread unsafe.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> private final SnippetInfo load = snippet(InstrumentationSnippets.class, "load");
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> My lowering code looks like (in an inner class Templates extending AbstractTemplates):
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> public void lower(final ArrayLoadBehaviourNode node) {
>>>>>>>>>>>>>> Arguments args = new Arguments(load);
>>>>>>>>>>>>>> args.add("s", node.getInfoAsString());
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> template(args).instantiate(runtime, node, DEFAULT_REPLACER, args);
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> My lowering code is called though a HIGH_LEVEL phase (via Suites) that scans through the graph:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> for(Node n: graph.getNodes())
>>>>>>>>>>>>>> if (n instanceof ArrayLoadBehaviourNode)
>>>>>>>>>>>>>>      new InstrumentationSnippets.Templates(runtime, replacements, target).lower((ArrayLoadBehaviourNode) n);
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Lowering should only be done from the LoweringPhase. That is, your ArrayLoadBehaviourNode needs to implement Lowerable.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I added a breakpoint to ReplacementsImpl.GraphMaker.pharseGraph() line 303 (return buildGraph(…)). I inspected every call at this point, checking this.this$1.method. There were no calls with dontInline = true or forceInline = true - i.e., they were all false. Is that what you meant?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Something like that. I was just trying to work out why the inlining is not happening.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It would really help if you send me a Mercurial bundle and a command line to reproduce your issue.
>>>>>>>>>>>> 
>>>>>>>>>>>> Sure thing. I'm using git for scm instead of hg though, so I've included a git bundle instead (if it's an issue, I can repackage with hg).
>>>>>>>>>>> 
>>>>>>>>>>> Yes, please repackage as a hg bundle as I don't know how to apply a git bundle to a hg repo.
>>>>>>>>>>> 
>>>>>>>>>>> -Doug
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -- 
>>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> <locomotion.tar.gz>
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>> Scotland, with registration number SC005336.
>>>>>>> 
>>>>>> 
>>>>>> <locomotion.tar.gz>
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>> 
>> 
>> 
> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>