From christian.thalinger at oracle.com  Mon Mar  3 16:47:21 2014
From: christian.thalinger at oracle.com (Christian Thalinger)
Date: Mon, 3 Mar 2014 16:47:21 -0800
Subject: Substitution/Replacements problem
In-Reply-To: <53150F7A.6020103@amd.com>
References: <53150F7A.6020103@amd.com>
Message-ID: <E8015BDE-76FC-4CC9-974D-33A310A4E315@oracle.com>


On Mar 3, 2014, at 3:25 PM, Eric Caspole <eric.caspole at amd.com> wrote:

> Hi everybody,
> We have a lot of lambda based tests that are not in the public repo yet, waiting for JDK 8 to come to Graal. In these tests, I found a problem with the way replacements are being done now.
> 
> See this method:
> 
> graal/com.oracle.graal.hotspot/src/com/oracle/graal/hotspot/HotSpotReplacementsImpl.java
> 
> 
> 107     @Override
> 108     public StructuredGraph getMethodSubstitution(ResolvedJavaMethod original) {
> 109         for (GraphProducer gp : graphProducers) {
> 110             StructuredGraph graph = gp.getGraphFor(original);
> 111             if (graph != null) {
> 112                 return graph;
> 113             }
> 114         }
> 115         return super.getMethodSubstitution(original);
> 116     }
> 
> 
> Here it loops over the backends until it gets a hit. In our tests, I found that while we are compiling an HSAIL kernel that is actually a Stream API lambda, when it goes into getIntrinsicGraph(), it will go into getMethodSubstitution() and look for substitutions in the PTX backend, see the "lambda$" method we are compiling and try to produce a PTX kernel of the thing we are in the middle of compiling for HSAIL, which was a shock :)
> 
> Up til now, we have been using the replacements/inline mechanism for example AtomicInteger that end up as fence/load/fence type ops, and other uses, that get inlined into the kernel body and that is working well so far.
> 
> I have a suitable PTX card in my box so I might be the only one in the group that might see this problem. The existing HSAIL KernelTester tests in the public repo do not get this problem since the harness sends an ordinary method to get HSAIL-compiled and they are not called "lambda$..."
> 
> I think I see that the strategy for offloading for PTX so far is doing a "replacement" of a CPU method with a GPU kernel. But we also want to have some replacements/inlining inside the kernel.

Hmm, interesting problem.  Although I don?t have an answer to your question I want to put out this general question:

How are we going to decide which methods to offload to which GPU given that there is more than one GPU in the system?

> 
> What is the best way to fix this problem?
> Thanks,
> Eric
> 
> 


From doug.simon at oracle.com  Tue Mar  4 02:01:23 2014
From: doug.simon at oracle.com (Doug Simon)
Date: Tue, 4 Mar 2014 11:01:23 +0100
Subject: Substitution/Replacements problem
In-Reply-To: <E8015BDE-76FC-4CC9-974D-33A310A4E315@oracle.com>
References: <53150F7A.6020103@amd.com>
	<E8015BDE-76FC-4CC9-974D-33A310A4E315@oracle.com>
Message-ID: <148CBF35-FB8F-4C30-BF24-B4E635579073@oracle.com>


On Mar 4, 2014, at 1:47 AM, Christian Thalinger <christian.thalinger at oracle.com> wrote:

> 
> On Mar 3, 2014, at 3:25 PM, Eric Caspole <eric.caspole at amd.com> wrote:
> 
>> Hi everybody,
>> We have a lot of lambda based tests that are not in the public repo yet, waiting for JDK 8 to come to Graal. In these tests, I found a problem with the way replacements are being done now.
>> 
>> See this method:
>> 
>> graal/com.oracle.graal.hotspot/src/com/oracle/graal/hotspot/HotSpotReplacementsImpl.java
>> 
>> 
>> 107     @Override
>> 108     public StructuredGraph getMethodSubstitution(ResolvedJavaMethod original) {
>> 109         for (GraphProducer gp : graphProducers) {
>> 110             StructuredGraph graph = gp.getGraphFor(original);
>> 111             if (graph != null) {
>> 112                 return graph;
>> 113             }
>> 114         }
>> 115         return super.getMethodSubstitution(original);
>> 116     }
>> 
>> 
>> Here it loops over the backends until it gets a hit. In our tests, I found that while we are compiling an HSAIL kernel that is actually a Stream API lambda, when it goes into getIntrinsicGraph(), it will go into getMethodSubstitution() and look for substitutions in the PTX backend, see the "lambda$" method we are compiling and try to produce a PTX kernel of the thing we are in the middle of compiling for HSAIL, which was a shock :)
>> 
>> Up til now, we have been using the replacements/inline mechanism for example AtomicInteger that end up as fence/load/fence type ops, and other uses, that get inlined into the kernel body and that is working well so far.
>> 
>> I have a suitable PTX card in my box so I might be the only one in the group that might see this problem. The existing HSAIL KernelTester tests in the public repo do not get this problem since the harness sends an ordinary method to get HSAIL-compiled and they are not called "lambda$..."
>> 
>> I think I see that the strategy for offloading for PTX so far is doing a "replacement" of a CPU method with a GPU kernel. But we also want to have some replacements/inlining inside the kernel.

This integration of GPU offloading into the normal compiler pipeline is a hack that should be removed. I put it in only so that Bharadwaj could easily test PTX offloading without having to do the Stream API interposition. The reason that it?s a hack is that it exactly ignores the policy problem of what to offload and to which available GPU. Bharadwaj, can you move to the Sumatra way of offloading soon? Once you?ve done this, I?ll remove this hack.

> Hmm, interesting problem.  Although I don?t have an answer to your question I want to put out this general question:
> 
> How are we going to decide which methods to offload to which GPU given that there is more than one GPU in the system?

Very good question.

>> What is the best way to fix this problem?

Use -XX:-GPUOffload for now. I don?t think Sumatra uses this option.

-Doug

From bharadwaj.yadavalli at oracle.com  Tue Mar  4 07:23:44 2014
From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli)
Date: Tue, 04 Mar 2014 10:23:44 -0500
Subject: Substitution/Replacements problem
In-Reply-To: <148CBF35-FB8F-4C30-BF24-B4E635579073@oracle.com>
References: <53150F7A.6020103@amd.com>	<E8015BDE-76FC-4CC9-974D-33A310A4E315@oracle.com>
	<148CBF35-FB8F-4C30-BF24-B4E635579073@oracle.com>
Message-ID: <5315F000.20301@oracle.com>


On 03/04/2014 05:01 AM, Doug Simon wrote:
> On Mar 4, 2014, at 1:47 AM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
>
>> On Mar 3, 2014, at 3:25 PM, Eric Caspole <eric.caspole at amd.com> wrote:
<...>
>>> Up til now, we have been using the replacements/inline mechanism for example AtomicInteger that end up as fence/load/fence type ops, and other uses, that get inlined into the kernel body and that is working well so far.
>>>
>>> I have a suitable PTX card in my box so I might be the only one in the group that might see this problem. The existing HSAIL KernelTester tests in the public repo do not get this problem since the harness sends an ordinary method to get HSAIL-compiled and they are not called "lambda$..."
>>>
>>> I think I see that the strategy for offloading for PTX so far is doing a "replacement" of a CPU method with a GPU kernel. But we also want to have some replacements/inlining inside the kernel.
> This integration of GPU offloading into the normal compiler pipeline is a hack that should be removed. I put it in only so that Bharadwaj could easily test PTX offloading without having to do the Stream API interposition. The reason that it?s a hack is that it exactly ignores the policy problem of what to offload and to which available GPU. Bharadwaj, can you move to the Sumatra way of offloading soon? Once you?ve done this, I?ll remove this hack.

I'll work on making the needed changes.

Bharadwaj


From Eric.Caspole at amd.com  Wed Mar  5 06:27:24 2014
From: Eric.Caspole at amd.com (Caspole, Eric)
Date: Wed, 5 Mar 2014 14:27:24 +0000
Subject: Tests
In-Reply-To: <5310CB05.20807@ed.ac.uk>
References: <53106DD4.7090307@ed.ac.uk>
	<BC97738F8E7C8742BABED7F06FB9DF9153E9F009@satlexdag04.amd.com>,
	<5310CB05.20807@ed.ac.uk>
Message-ID: <CB75F466C5DA72408BB1A97B80B1CBC74E2D6534@satlexdag06.amd.com>

Hi Juanjo,
I put a webrev at  http://cr.openjdk.java.net/~ecaspole/stream_tests/webrev/

that shows how to make a simple Stream API junit inside of Graal. We don't have any unit tests yet using reduce or other "typical" gpu workloads. 

In order to run this you can use Doug's patch or use the Sumatra JDK at http://hg.openjdk.java.net/sumatra/sumatra-dev/. We were able to publish that last week. It builds the same way as JDK 8.
Regards,
Eric


________________________________________
From: sumatra-dev-bounces at openjdk.java.net [sumatra-dev-bounces at openjdk.java.net] on behalf of Juan Jose Fumero [juan.fumero at ed.ac.uk]
Sent: Friday, February 28, 2014 12:44 PM
To: sumatra-dev at openjdk.java.net
Subject: Re: Tests

Hi Tom,
    The test that I am referring to is BasicSumatraTest.java in
com.oracle.graal.compiler.hsail.test. This is a patch to be applied in
Graal [1]. What I am looking for is more test with the stream API
(Matrix-Vector multiplication, reductions, ... ).

[1]  http://cr.openjdk.java.net/~dnsimon/sumatra-on-graal/

Thanks
Juanjo

On 28/02/14 16:53, Deneau, Tom wrote:
> Juanjo --
>
> Which specific test are you referring to?
>
> -- Tom
>
>> -----Original Message-----
>> From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-
>> bounces at openjdk.java.net] On Behalf Of Juan Jose Fumero
>> Sent: Friday, February 28, 2014 5:07 AM
>> To: sumatra-dev at openjdk.java.net
>> Subject: Tests
>>
>> Hi ,
>>      In the package com.oracle.graal.compiler.hsail.test there is a test
>> for Sumatra and Stream API. Are there more tests like this available?
>>
>>     I would like to prepare a set of test for OpenCL using Stream API.
>>
>> Thanks
>> Juanjo
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>
>


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From tom.deneau at amd.com  Thu Mar  6 15:30:20 2014
From: tom.deneau at amd.com (Deneau, Tom)
Date: Thu, 6 Mar 2014 23:30:20 +0000
Subject: suspcions about GC and HSAIL Deopt
Message-ID: <BC97738F8E7C8742BABED7F06FB9DF9153EA255C@satlexdag04.amd.com>

While preparing this webrev for the hsail deoptimization work we've been doing, I noticed some spurious failures when we run on HSA hardware.  I have a theory of what's happening, let me know if this makes sense...

First the big overview:

When we run a kernel, and it returns from the GPU each workitem can be in one of 3 states:

   a) finished normally
   b) deopted and saved its state (and set the deopt-happened flag)
   c) on entry, saw deopt-happened=true and so just exited early
      without running.

This last one exists because we don't want to have to allocate enough deopt save space so that each workitem has its own unique save space.
Instead we only allocate enough for the number of concurrent workitems possible.

When we return from the GPU, if one or more workitems deopted we:

   a) for the workitems that finished normally, there is nothing to do

   b) for each deopted workitems, we want to run it thru the
      interpreter going first thru the special host trampoline code
      infrastructure that Gilles created.  The trampoline host code
      takes a deoptId (sort of like a pc, telling where the deopt
      occurred in the hsail code) and a pointer to the saved hsail
      frame.  We currently do this sequentially although other
      policies are possible.

   c) for each never ran workitem, we can just run it from the
      beginning of the kernel "method", just making sure we pass the
      arguments and the appropriate workitem id for each one.  Again,
      we currently do this sequentially although other policies are
      possible.

When we enter the JVM to run the kernel, we transition to thread_in_vm mode.  So while running on the GPU, no oops are moving (although of course GCs may be delayed).

When we start looking for workitems of type b or c above, we are still in thread_in_vm mode.  However since both b and c above use the javaCall infrastructure, I believe they are transitioning to thread_in_java mode on each call, and oops can move.

So if for instance there are two deopting workitems, it is possible that after executing the first one that the saved deopt state for the second one is no longer valid.

The junit tests on which I have seen the spurious failures are ones where lots of workitems deopt.  When run in the hotspot debug build, we usually see SEGVs in interpreter code and the access is always to 0xbaadbabe.

Note that when Gilles was developing his infrastructure, the only test cases we had all had a single workitem deopting so would not show this.  Also even with multi-deopting test cases, I believe the reason we don't see this on the simulator is that the concurrency is much less there so the number of workitems of type b) above will be much less.  On hardware, we can have thousands of workitems deopting.

I suppose the solution to this is to mark any oops in the deopt saved state in some way that GC can find them and fix them.  What is the best way to do this?

Or is there any way to execute javaCalls from thread_in_vm mode without allowing GCs to happen?

-- Tom


From doug.simon at oracle.com  Fri Mar  7 02:26:49 2014
From: doug.simon at oracle.com (Doug Simon)
Date: Fri, 7 Mar 2014 11:26:49 +0100
Subject: suspcions about GC and HSAIL Deopt
In-Reply-To: <BC97738F8E7C8742BABED7F06FB9DF9153EA255C@satlexdag04.amd.com>
References: <BC97738F8E7C8742BABED7F06FB9DF9153EA255C@satlexdag04.amd.com>
Message-ID: <661FF05A-0F67-43EE-9281-85C1BA875994@oracle.com>


On Mar 7, 2014, at 12:30 AM, Deneau, Tom <tom.deneau at amd.com> wrote:

> While preparing this webrev for the hsail deoptimization work we've been doing, I noticed some spurious failures when we run on HSA hardware.  I have a theory of what's happening, let me know if this makes sense...
> 
> First the big overview:
> 
> When we run a kernel, and it returns from the GPU each workitem can be in one of 3 states:
> 
>   a) finished normally
>   b) deopted and saved its state (and set the deopt-happened flag)
>   c) on entry, saw deopt-happened=true and so just exited early
>      without running.
> 
> This last one exists because we don't want to have to allocate enough deopt save space so that each workitem has its own unique save space.
> Instead we only allocate enough for the number of concurrent workitems possible.
> 
> When we return from the GPU, if one or more workitems deopted we:
> 
>   a) for the workitems that finished normally, there is nothing to do
> 
>   b) for each deopted workitems, we want to run it thru the
>      interpreter going first thru the special host trampoline code
>      infrastructure that Gilles created.  The trampoline host code
>      takes a deoptId (sort of like a pc, telling where the deopt
>      occurred in the hsail code) and a pointer to the saved hsail
>      frame.  We currently do this sequentially although other
>      policies are possible.
> 
>   c) for each never ran workitem, we can just run it from the
>      beginning of the kernel "method", just making sure we pass the
>      arguments and the appropriate workitem id for each one.  Again,
>      we currently do this sequentially although other policies are
>      possible.
> 
> When we enter the JVM to run the kernel, we transition to thread_in_vm mode.  So while running on the GPU, no oops are moving (although of course GCs may be delayed).
> 
> When we start looking for workitems of type b or c above, we are still in thread_in_vm mode.  However since both b and c above use the javaCall infrastructure, I believe they are transitioning to thread_in_java mode on each call, and oops can move.
> 
> So if for instance there are two deopting workitems, it is possible that after executing the first one that the saved deopt state for the second one is no longer valid.
> 
> The junit tests on which I have seen the spurious failures are ones where lots of workitems deopt.  When run in the hotspot debug build, we usually see SEGVs in interpreter code and the access is always to 0xbaadbabe.
> 
> Note that when Gilles was developing his infrastructure, the only test cases we had all had a single workitem deopting so would not show this.  Also even with multi-deopting test cases, I believe the reason we don't see this on the simulator is that the concurrency is much less there so the number of workitems of type b) above will be much less.  On hardware, we can have thousands of workitems deopting.
> 
> I suppose the solution to this is to mark any oops in the deopt saved state in some way that GC can find them and fix them.  What is the best way to do this?

I?m not sure it?s the most optimal solution, but around each javaCall, you could convert each saved oop to a Handle and convert it back after the call. I?m not aware of other mechanisms in HotSpot for registering GC roots but that doesn?t mean they don?t exist.

> Or is there any way to execute javaCalls from thread_in_vm mode without allowing GCs to happen?

You are calling arbitrary Java code right? That means you cannot guarantee allocation won?t be performed which in turn means you cannot disable GC (even though there are mechanisms for doing so like GC_locker::lock_critical/GC_locker::unlock_critical).

-Doug

From tom.deneau at amd.com  Fri Mar  7 04:52:36 2014
From: tom.deneau at amd.com (Deneau, Tom)
Date: Fri, 7 Mar 2014 12:52:36 +0000
Subject: suspcions about GC and HSAIL Deopt
In-Reply-To: <661FF05A-0F67-43EE-9281-85C1BA875994@oracle.com>
References: <BC97738F8E7C8742BABED7F06FB9DF9153EA255C@satlexdag04.amd.com>
	<661FF05A-0F67-43EE-9281-85C1BA875994@oracle.com>
Message-ID: <BC97738F8E7C8742BABED7F06FB9DF9153EA26E3@satlexdag04.amd.com>

Doug --

Regarding your handle-based solution...

would it be sufficient to convert all the saved oops (in all the workitem saved state areas) to Handles before the first javaCall (while we are still in thread_in_vm mode), and then before each javaCall just convert back the one save area that is being used in that javaCall?

-- Tom


> -----Original Message-----
> From: Doug Simon [mailto:doug.simon at oracle.com]
> Sent: Friday, March 07, 2014 4:27 AM
> To: Deneau, Tom
> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
> Subject: Re: suspcions about GC and HSAIL Deopt
> 
> 
> On Mar 7, 2014, at 12:30 AM, Deneau, Tom <tom.deneau at amd.com> wrote:
> 
> > While preparing this webrev for the hsail deoptimization work we've
> been doing, I noticed some spurious failures when we run on HSA
> hardware.  I have a theory of what's happening, let me know if this
> makes sense...
> >
> > First the big overview:
> >
> > When we run a kernel, and it returns from the GPU each workitem can be
> in one of 3 states:
> >
> >   a) finished normally
> >   b) deopted and saved its state (and set the deopt-happened flag)
> >   c) on entry, saw deopt-happened=true and so just exited early
> >      without running.
> >
> > This last one exists because we don't want to have to allocate enough
> deopt save space so that each workitem has its own unique save space.
> > Instead we only allocate enough for the number of concurrent workitems
> possible.
> >
> > When we return from the GPU, if one or more workitems deopted we:
> >
> >   a) for the workitems that finished normally, there is nothing to do
> >
> >   b) for each deopted workitems, we want to run it thru the
> >      interpreter going first thru the special host trampoline code
> >      infrastructure that Gilles created.  The trampoline host code
> >      takes a deoptId (sort of like a pc, telling where the deopt
> >      occurred in the hsail code) and a pointer to the saved hsail
> >      frame.  We currently do this sequentially although other
> >      policies are possible.
> >
> >   c) for each never ran workitem, we can just run it from the
> >      beginning of the kernel "method", just making sure we pass the
> >      arguments and the appropriate workitem id for each one.  Again,
> >      we currently do this sequentially although other policies are
> >      possible.
> >
> > When we enter the JVM to run the kernel, we transition to thread_in_vm
> mode.  So while running on the GPU, no oops are moving (although of
> course GCs may be delayed).
> >
> > When we start looking for workitems of type b or c above, we are still
> in thread_in_vm mode.  However since both b and c above use the javaCall
> infrastructure, I believe they are transitioning to thread_in_java mode
> on each call, and oops can move.
> >
> > So if for instance there are two deopting workitems, it is possible
> that after executing the first one that the saved deopt state for the
> second one is no longer valid.
> >
> > The junit tests on which I have seen the spurious failures are ones
> where lots of workitems deopt.  When run in the hotspot debug build, we
> usually see SEGVs in interpreter code and the access is always to
> 0xbaadbabe.
> >
> > Note that when Gilles was developing his infrastructure, the only test
> cases we had all had a single workitem deopting so would not show this.
> Also even with multi-deopting test cases, I believe the reason we don't
> see this on the simulator is that the concurrency is much less there so
> the number of workitems of type b) above will be much less.  On
> hardware, we can have thousands of workitems deopting.
> >
> > I suppose the solution to this is to mark any oops in the deopt saved
> state in some way that GC can find them and fix them.  What is the best
> way to do this?
> 
> I'm not sure it's the most optimal solution, but around each javaCall,
> you could convert each saved oop to a Handle and convert it back after
> the call. I'm not aware of other mechanisms in HotSpot for registering
> GC roots but that doesn't mean they don't exist.
> 
> > Or is there any way to execute javaCalls from thread_in_vm mode
> without allowing GCs to happen?
> 
> You are calling arbitrary Java code right? That means you cannot
> guarantee allocation won't be performed which in turn means you cannot
> disable GC (even though there are mechanisms for doing so like
> GC_locker::lock_critical/GC_locker::unlock_critical).
> 
> -Doug


From doug.simon at oracle.com  Fri Mar  7 05:21:47 2014
From: doug.simon at oracle.com (Doug Simon)
Date: Fri, 7 Mar 2014 14:21:47 +0100
Subject: suspcions about GC and HSAIL Deopt
In-Reply-To: <BC97738F8E7C8742BABED7F06FB9DF9153EA26E3@satlexdag04.amd.com>
References: <BC97738F8E7C8742BABED7F06FB9DF9153EA255C@satlexdag04.amd.com>
	<661FF05A-0F67-43EE-9281-85C1BA875994@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EA26E3@satlexdag04.amd.com>
Message-ID: <B0CDE585-AA48-440B-88A7-89A697F37D81@oracle.com>


On Mar 7, 2014, at 1:52 PM, Deneau, Tom <tom.deneau at amd.com> wrote:

> Doug --
> 
> Regarding your handle-based solution...
> 
> would it be sufficient to convert all the saved oops (in all the workitem saved state areas) to Handles before the first javaCall (while we are still in thread_in_vm mode), and then before each javaCall just convert back the one save area that is being used in that javaCall?

This javaCall is to the special deopting nmethod if I understand correctly. And the save state area is used solely as input to a deopt instruction in which case there is no possibility of a GC between entering the javaCall and hitting the deopt instruction by which time all oops have been copied from the save state area (i.e., the hsailFrame) to slots in the special deopting method?s frame. At that point, the oops in the save state area are dead and standard GC root scanning knows where to find their copies. If this is all correct, then your suggestion should work.

-Doug

>> -----Original Message-----
>> From: Doug Simon [mailto:doug.simon at oracle.com]
>> Sent: Friday, March 07, 2014 4:27 AM
>> To: Deneau, Tom
>> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
>> Subject: Re: suspcions about GC and HSAIL Deopt
>> 
>> 
>> On Mar 7, 2014, at 12:30 AM, Deneau, Tom <tom.deneau at amd.com> wrote:
>> 
>>> While preparing this webrev for the hsail deoptimization work we've
>> been doing, I noticed some spurious failures when we run on HSA
>> hardware.  I have a theory of what's happening, let me know if this
>> makes sense...
>>> 
>>> First the big overview:
>>> 
>>> When we run a kernel, and it returns from the GPU each workitem can be
>> in one of 3 states:
>>> 
>>>  a) finished normally
>>>  b) deopted and saved its state (and set the deopt-happened flag)
>>>  c) on entry, saw deopt-happened=true and so just exited early
>>>     without running.
>>> 
>>> This last one exists because we don't want to have to allocate enough
>> deopt save space so that each workitem has its own unique save space.
>>> Instead we only allocate enough for the number of concurrent workitems
>> possible.
>>> 
>>> When we return from the GPU, if one or more workitems deopted we:
>>> 
>>>  a) for the workitems that finished normally, there is nothing to do
>>> 
>>>  b) for each deopted workitems, we want to run it thru the
>>>     interpreter going first thru the special host trampoline code
>>>     infrastructure that Gilles created.  The trampoline host code
>>>     takes a deoptId (sort of like a pc, telling where the deopt
>>>     occurred in the hsail code) and a pointer to the saved hsail
>>>     frame.  We currently do this sequentially although other
>>>     policies are possible.
>>> 
>>>  c) for each never ran workitem, we can just run it from the
>>>     beginning of the kernel "method", just making sure we pass the
>>>     arguments and the appropriate workitem id for each one.  Again,
>>>     we currently do this sequentially although other policies are
>>>     possible.
>>> 
>>> When we enter the JVM to run the kernel, we transition to thread_in_vm
>> mode.  So while running on the GPU, no oops are moving (although of
>> course GCs may be delayed).
>>> 
>>> When we start looking for workitems of type b or c above, we are still
>> in thread_in_vm mode.  However since both b and c above use the javaCall
>> infrastructure, I believe they are transitioning to thread_in_java mode
>> on each call, and oops can move.
>>> 
>>> So if for instance there are two deopting workitems, it is possible
>> that after executing the first one that the saved deopt state for the
>> second one is no longer valid.
>>> 
>>> The junit tests on which I have seen the spurious failures are ones
>> where lots of workitems deopt.  When run in the hotspot debug build, we
>> usually see SEGVs in interpreter code and the access is always to
>> 0xbaadbabe.
>>> 
>>> Note that when Gilles was developing his infrastructure, the only test
>> cases we had all had a single workitem deopting so would not show this.
>> Also even with multi-deopting test cases, I believe the reason we don't
>> see this on the simulator is that the concurrency is much less there so
>> the number of workitems of type b) above will be much less.  On
>> hardware, we can have thousands of workitems deopting.
>>> 
>>> I suppose the solution to this is to mark any oops in the deopt saved
>> state in some way that GC can find them and fix them.  What is the best
>> way to do this?
>> 
>> I'm not sure it's the most optimal solution, but around each javaCall,
>> you could convert each saved oop to a Handle and convert it back after
>> the call. I'm not aware of other mechanisms in HotSpot for registering
>> GC roots but that doesn't mean they don't exist.
>> 
>>> Or is there any way to execute javaCalls from thread_in_vm mode
>> without allowing GCs to happen?
>> 
>> You are calling arbitrary Java code right? That means you cannot
>> guarantee allocation won't be performed which in turn means you cannot
>> disable GC (even though there are mechanisms for doing so like
>> GC_locker::lock_critical/GC_locker::unlock_critical).
>> 
>> -Doug
> 
> 


From tom.deneau at amd.com  Fri Mar  7 11:54:58 2014
From: tom.deneau at amd.com (Deneau, Tom)
Date: Fri, 7 Mar 2014 19:54:58 +0000
Subject: gdb to native frame that made a javaCall
Message-ID: <BC97738F8E7C8742BABED7F06FB9DF9153EA2968@satlexdag04.amd.com>

Say you're debugging hotspot in gdb and you hit a SEGV or some other fault while running in the interpreter on behalf of making a javaCall that was invoked from the vm.  Is there a way to get gdb back to the native frame that made the javaCall?

-- Tom


From duboscq at ssw.jku.at  Mon Mar 10 12:58:19 2014
From: duboscq at ssw.jku.at (Gilles Duboscq)
Date: Mon, 10 Mar 2014 13:58:19 +0100
Subject: suspcions about GC and HSAIL Deopt
In-Reply-To: <B0CDE585-AA48-440B-88A7-89A697F37D81@oracle.com>
References: <BC97738F8E7C8742BABED7F06FB9DF9153EA255C@satlexdag04.amd.com>
	<661FF05A-0F67-43EE-9281-85C1BA875994@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EA26E3@satlexdag04.amd.com>
	<B0CDE585-AA48-440B-88A7-89A697F37D81@oracle.com>
Message-ID: <CAGjk+z8oezhwR6vEovch-XD+XLGv2+oFuuU8eKMvCn+qK=fuKQ@mail.gmail.com>

Using Handle and restoring the value should work. In the long term we
may want to just have an opps_do on the save area and hook into
JavaThread::oops_do.

However even with the Handles version you need "oop maps" for the save
areas. It shouldn't be very hard to extract them from the HSAIL
compilation but currently they are just thrown away.

-Gilles

On Fri, Mar 7, 2014 at 2:21 PM, Doug Simon <doug.simon at oracle.com> wrote:
>
> On Mar 7, 2014, at 1:52 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>
>> Doug --
>>
>> Regarding your handle-based solution...
>>
>> would it be sufficient to convert all the saved oops (in all the workitem saved state areas) to Handles before the first javaCall (while we are still in thread_in_vm mode), and then before each javaCall just convert back the one save area that is being used in that javaCall?
>
> This javaCall is to the special deopting nmethod if I understand correctly. And the save state area is used solely as input to a deopt instruction in which case there is no possibility of a GC between entering the javaCall and hitting the deopt instruction by which time all oops have been copied from the save state area (i.e., the hsailFrame) to slots in the special deopting method?s frame. At that point, the oops in the save state area are dead and standard GC root scanning knows where to find their copies. If this is all correct, then your suggestion should work.
>
> -Doug
>
>>> -----Original Message-----
>>> From: Doug Simon [mailto:doug.simon at oracle.com]
>>> Sent: Friday, March 07, 2014 4:27 AM
>>> To: Deneau, Tom
>>> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
>>> Subject: Re: suspcions about GC and HSAIL Deopt
>>>
>>>
>>> On Mar 7, 2014, at 12:30 AM, Deneau, Tom <tom.deneau at amd.com> wrote:
>>>
>>>> While preparing this webrev for the hsail deoptimization work we've
>>> been doing, I noticed some spurious failures when we run on HSA
>>> hardware.  I have a theory of what's happening, let me know if this
>>> makes sense...
>>>>
>>>> First the big overview:
>>>>
>>>> When we run a kernel, and it returns from the GPU each workitem can be
>>> in one of 3 states:
>>>>
>>>>  a) finished normally
>>>>  b) deopted and saved its state (and set the deopt-happened flag)
>>>>  c) on entry, saw deopt-happened=true and so just exited early
>>>>     without running.
>>>>
>>>> This last one exists because we don't want to have to allocate enough
>>> deopt save space so that each workitem has its own unique save space.
>>>> Instead we only allocate enough for the number of concurrent workitems
>>> possible.
>>>>
>>>> When we return from the GPU, if one or more workitems deopted we:
>>>>
>>>>  a) for the workitems that finished normally, there is nothing to do
>>>>
>>>>  b) for each deopted workitems, we want to run it thru the
>>>>     interpreter going first thru the special host trampoline code
>>>>     infrastructure that Gilles created.  The trampoline host code
>>>>     takes a deoptId (sort of like a pc, telling where the deopt
>>>>     occurred in the hsail code) and a pointer to the saved hsail
>>>>     frame.  We currently do this sequentially although other
>>>>     policies are possible.
>>>>
>>>>  c) for each never ran workitem, we can just run it from the
>>>>     beginning of the kernel "method", just making sure we pass the
>>>>     arguments and the appropriate workitem id for each one.  Again,
>>>>     we currently do this sequentially although other policies are
>>>>     possible.
>>>>
>>>> When we enter the JVM to run the kernel, we transition to thread_in_vm
>>> mode.  So while running on the GPU, no oops are moving (although of
>>> course GCs may be delayed).
>>>>
>>>> When we start looking for workitems of type b or c above, we are still
>>> in thread_in_vm mode.  However since both b and c above use the javaCall
>>> infrastructure, I believe they are transitioning to thread_in_java mode
>>> on each call, and oops can move.
>>>>
>>>> So if for instance there are two deopting workitems, it is possible
>>> that after executing the first one that the saved deopt state for the
>>> second one is no longer valid.
>>>>
>>>> The junit tests on which I have seen the spurious failures are ones
>>> where lots of workitems deopt.  When run in the hotspot debug build, we
>>> usually see SEGVs in interpreter code and the access is always to
>>> 0xbaadbabe.
>>>>
>>>> Note that when Gilles was developing his infrastructure, the only test
>>> cases we had all had a single workitem deopting so would not show this.
>>> Also even with multi-deopting test cases, I believe the reason we don't
>>> see this on the simulator is that the concurrency is much less there so
>>> the number of workitems of type b) above will be much less.  On
>>> hardware, we can have thousands of workitems deopting.
>>>>
>>>> I suppose the solution to this is to mark any oops in the deopt saved
>>> state in some way that GC can find them and fix them.  What is the best
>>> way to do this?
>>>
>>> I'm not sure it's the most optimal solution, but around each javaCall,
>>> you could convert each saved oop to a Handle and convert it back after
>>> the call. I'm not aware of other mechanisms in HotSpot for registering
>>> GC roots but that doesn't mean they don't exist.
>>>
>>>> Or is there any way to execute javaCalls from thread_in_vm mode
>>> without allowing GCs to happen?
>>>
>>> You are calling arbitrary Java code right? That means you cannot
>>> guarantee allocation won't be performed which in turn means you cannot
>>> disable GC (even though there are mechanisms for doing so like
>>> GC_locker::lock_critical/GC_locker::unlock_critical).
>>>
>>> -Doug
>>
>>
>

From tom.deneau at amd.com  Mon Mar 10 13:28:48 2014
From: tom.deneau at amd.com (Deneau, Tom)
Date: Mon, 10 Mar 2014 13:28:48 +0000
Subject: suspcions about GC and HSAIL Deopt
In-Reply-To: <CAGjk+z8oezhwR6vEovch-XD+XLGv2+oFuuU8eKMvCn+qK=fuKQ@mail.gmail.com>
References: <BC97738F8E7C8742BABED7F06FB9DF9153EA255C@satlexdag04.amd.com>
	<661FF05A-0F67-43EE-9281-85C1BA875994@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EA26E3@satlexdag04.amd.com>
	<B0CDE585-AA48-440B-88A7-89A697F37D81@oracle.com>
	<CAGjk+z8oezhwR6vEovch-XD+XLGv2+oFuuU8eKMvCn+qK=fuKQ@mail.gmail.com>
Message-ID: <BC97738F8E7C8742BABED7F06FB9DF9153EAB577@satlexdag04.amd.com>

Gilles --

Update on this...

Yes, I put in the code to save the oops maps, currently somewhat simplified in that only hsail $d registers can have oops and we are not saving stack slots yet.

Using that I implemented a quickie solution that copied the detected oops into a regular java Object array before the first deopt, then reloaded them into the particular frame before each deopt.  Logging code did show that there were times when the original value of the oop had changed to a new value and we no longer hit our spurious failures.    I'm sure its inefficient when compared to an oops_do approach but it did seem to work.

I will probably submit the webrev with this quickie solution and we can discuss how to make it use oops_do.

-- Tom


> -----Original Message-----
> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
> Gilles Duboscq
> Sent: Monday, March 10, 2014 7:58 AM
> To: Deneau, Tom
> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
> Subject: Re: suspcions about GC and HSAIL Deopt
> 
> Using Handle and restoring the value should work. In the long term we
> may want to just have an opps_do on the save area and hook into
> JavaThread::oops_do.
> 
> However even with the Handles version you need "oop maps" for the save
> areas. It shouldn't be very hard to extract them from the HSAIL
> compilation but currently they are just thrown away.
> 
> -Gilles
> 
> On Fri, Mar 7, 2014 at 2:21 PM, Doug Simon <doug.simon at oracle.com>
> wrote:
> >
> > On Mar 7, 2014, at 1:52 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> >
> >> Doug --
> >>
> >> Regarding your handle-based solution...
> >>
> >> would it be sufficient to convert all the saved oops (in all the
> workitem saved state areas) to Handles before the first javaCall (while
> we are still in thread_in_vm mode), and then before each javaCall just
> convert back the one save area that is being used in that javaCall?
> >
> > This javaCall is to the special deopting nmethod if I understand
> correctly. And the save state area is used solely as input to a deopt
> instruction in which case there is no possibility of a GC between
> entering the javaCall and hitting the deopt instruction by which time
> all oops have been copied from the save state area (i.e., the
> hsailFrame) to slots in the special deopting method?s frame. At that
> point, the oops in the save state area are dead and standard GC root
> scanning knows where to find their copies. If this is all correct, then
> your suggestion should work.
> >
> > -Doug
> >
> >>> -----Original Message-----
> >>> From: Doug Simon [mailto:doug.simon at oracle.com]
> >>> Sent: Friday, March 07, 2014 4:27 AM
> >>> To: Deneau, Tom
> >>> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
> >>> Subject: Re: suspcions about GC and HSAIL Deopt
> >>>
> >>>
> >>> On Mar 7, 2014, at 12:30 AM, Deneau, Tom <tom.deneau at amd.com> wrote:
> >>>
> >>>> While preparing this webrev for the hsail deoptimization work we've
> >>> been doing, I noticed some spurious failures when we run on HSA
> >>> hardware.  I have a theory of what's happening, let me know if this
> >>> makes sense...
> >>>>
> >>>> First the big overview:
> >>>>
> >>>> When we run a kernel, and it returns from the GPU each workitem can
> >>>> be
> >>> in one of 3 states:
> >>>>
> >>>>  a) finished normally
> >>>>  b) deopted and saved its state (and set the deopt-happened flag)
> >>>>  c) on entry, saw deopt-happened=true and so just exited early
> >>>>     without running.
> >>>>
> >>>> This last one exists because we don't want to have to allocate
> >>>> enough
> >>> deopt save space so that each workitem has its own unique save
> space.
> >>>> Instead we only allocate enough for the number of concurrent
> >>>> workitems
> >>> possible.
> >>>>
> >>>> When we return from the GPU, if one or more workitems deopted we:
> >>>>
> >>>>  a) for the workitems that finished normally, there is nothing to
> >>>> do
> >>>>
> >>>>  b) for each deopted workitems, we want to run it thru the
> >>>>     interpreter going first thru the special host trampoline code
> >>>>     infrastructure that Gilles created.  The trampoline host code
> >>>>     takes a deoptId (sort of like a pc, telling where the deopt
> >>>>     occurred in the hsail code) and a pointer to the saved hsail
> >>>>     frame.  We currently do this sequentially although other
> >>>>     policies are possible.
> >>>>
> >>>>  c) for each never ran workitem, we can just run it from the
> >>>>     beginning of the kernel "method", just making sure we pass the
> >>>>     arguments and the appropriate workitem id for each one.  Again,
> >>>>     we currently do this sequentially although other policies are
> >>>>     possible.
> >>>>
> >>>> When we enter the JVM to run the kernel, we transition to
> >>>> thread_in_vm
> >>> mode.  So while running on the GPU, no oops are moving (although of
> >>> course GCs may be delayed).
> >>>>
> >>>> When we start looking for workitems of type b or c above, we are
> >>>> still
> >>> in thread_in_vm mode.  However since both b and c above use the
> >>> javaCall infrastructure, I believe they are transitioning to
> >>> thread_in_java mode on each call, and oops can move.
> >>>>
> >>>> So if for instance there are two deopting workitems, it is possible
> >>> that after executing the first one that the saved deopt state for
> >>> the second one is no longer valid.
> >>>>
> >>>> The junit tests on which I have seen the spurious failures are ones
> >>> where lots of workitems deopt.  When run in the hotspot debug build,
> >>> we usually see SEGVs in interpreter code and the access is always to
> >>> 0xbaadbabe.
> >>>>
> >>>> Note that when Gilles was developing his infrastructure, the only
> >>>> test
> >>> cases we had all had a single workitem deopting so would not show
> this.
> >>> Also even with multi-deopting test cases, I believe the reason we
> >>> don't see this on the simulator is that the concurrency is much less
> >>> there so the number of workitems of type b) above will be much less.
> >>> On hardware, we can have thousands of workitems deopting.
> >>>>
> >>>> I suppose the solution to this is to mark any oops in the deopt
> >>>> saved
> >>> state in some way that GC can find them and fix them.  What is the
> >>> best way to do this?
> >>>
> >>> I'm not sure it's the most optimal solution, but around each
> >>> javaCall, you could convert each saved oop to a Handle and convert
> >>> it back after the call. I'm not aware of other mechanisms in HotSpot
> >>> for registering GC roots but that doesn't mean they don't exist.
> >>>
> >>>> Or is there any way to execute javaCalls from thread_in_vm mode
> >>> without allowing GCs to happen?
> >>>
> >>> You are calling arbitrary Java code right? That means you cannot
> >>> guarantee allocation won't be performed which in turn means you
> >>> cannot disable GC (even though there are mechanisms for doing so
> >>> like GC_locker::lock_critical/GC_locker::unlock_critical).
> >>>
> >>> -Doug
> >>
> >>
> >


From duboscq at ssw.jku.at  Mon Mar 10 15:14:29 2014
From: duboscq at ssw.jku.at (Gilles Duboscq)
Date: Mon, 10 Mar 2014 16:14:29 +0100
Subject: suspcions about GC and HSAIL Deopt
In-Reply-To: <BC97738F8E7C8742BABED7F06FB9DF9153EAB577@satlexdag04.amd.com>
References: <BC97738F8E7C8742BABED7F06FB9DF9153EA255C@satlexdag04.amd.com>
	<661FF05A-0F67-43EE-9281-85C1BA875994@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EA26E3@satlexdag04.amd.com>
	<B0CDE585-AA48-440B-88A7-89A697F37D81@oracle.com>
	<CAGjk+z8oezhwR6vEovch-XD+XLGv2+oFuuU8eKMvCn+qK=fuKQ@mail.gmail.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EAB577@satlexdag04.amd.com>
Message-ID: <CAGjk+z_MUwxqV2Ss15dbRmPOT+iOL519FqeY4yRB5GWKq+bQDQ@mail.gmail.com>

Ok, sounds good

On Mon, Mar 10, 2014 at 2:28 PM, Tom Deneau <tom.deneau at amd.com> wrote:
> Gilles --
>
> Update on this...
>
> Yes, I put in the code to save the oops maps, currently somewhat simplified in that only hsail $d registers can have oops and we are not saving stack slots yet.
>
> Using that I implemented a quickie solution that copied the detected oops into a regular java Object array before the first deopt, then reloaded them into the particular frame before each deopt.  Logging code did show that there were times when the original value of the oop had changed to a new value and we no longer hit our spurious failures.    I'm sure its inefficient when compared to an oops_do approach but it did seem to work.
>
> I will probably submit the webrev with this quickie solution and we can discuss how to make it use oops_do.
>
> -- Tom
>
>
>
>
>
>> -----Original Message-----
>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
>> Gilles Duboscq
>> Sent: Monday, March 10, 2014 7:58 AM
>> To: Deneau, Tom
>> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
>> Subject: Re: suspcions about GC and HSAIL Deopt
>>
>> Using Handle and restoring the value should work. In the long term we
>> may want to just have an opps_do on the save area and hook into
>> JavaThread::oops_do.
>>
>> However even with the Handles version you need "oop maps" for the save
>> areas. It shouldn't be very hard to extract them from the HSAIL
>> compilation but currently they are just thrown away.
>>
>> -Gilles
>>
>> On Fri, Mar 7, 2014 at 2:21 PM, Doug Simon <doug.simon at oracle.com>
>> wrote:
>> >
>> > On Mar 7, 2014, at 1:52 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>> >
>> >> Doug --
>> >>
>> >> Regarding your handle-based solution...
>> >>
>> >> would it be sufficient to convert all the saved oops (in all the
>> workitem saved state areas) to Handles before the first javaCall (while
>> we are still in thread_in_vm mode), and then before each javaCall just
>> convert back the one save area that is being used in that javaCall?
>> >
>> > This javaCall is to the special deopting nmethod if I understand
>> correctly. And the save state area is used solely as input to a deopt
>> instruction in which case there is no possibility of a GC between
>> entering the javaCall and hitting the deopt instruction by which time
>> all oops have been copied from the save state area (i.e., the
>> hsailFrame) to slots in the special deopting method?s frame. At that
>> point, the oops in the save state area are dead and standard GC root
>> scanning knows where to find their copies. If this is all correct, then
>> your suggestion should work.
>> >
>> > -Doug
>> >
>> >>> -----Original Message-----
>> >>> From: Doug Simon [mailto:doug.simon at oracle.com]
>> >>> Sent: Friday, March 07, 2014 4:27 AM
>> >>> To: Deneau, Tom
>> >>> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
>> >>> Subject: Re: suspcions about GC and HSAIL Deopt
>> >>>
>> >>>
>> >>> On Mar 7, 2014, at 12:30 AM, Deneau, Tom <tom.deneau at amd.com> wrote:
>> >>>
>> >>>> While preparing this webrev for the hsail deoptimization work we've
>> >>> been doing, I noticed some spurious failures when we run on HSA
>> >>> hardware.  I have a theory of what's happening, let me know if this
>> >>> makes sense...
>> >>>>
>> >>>> First the big overview:
>> >>>>
>> >>>> When we run a kernel, and it returns from the GPU each workitem can
>> >>>> be
>> >>> in one of 3 states:
>> >>>>
>> >>>>  a) finished normally
>> >>>>  b) deopted and saved its state (and set the deopt-happened flag)
>> >>>>  c) on entry, saw deopt-happened=true and so just exited early
>> >>>>     without running.
>> >>>>
>> >>>> This last one exists because we don't want to have to allocate
>> >>>> enough
>> >>> deopt save space so that each workitem has its own unique save
>> space.
>> >>>> Instead we only allocate enough for the number of concurrent
>> >>>> workitems
>> >>> possible.
>> >>>>
>> >>>> When we return from the GPU, if one or more workitems deopted we:
>> >>>>
>> >>>>  a) for the workitems that finished normally, there is nothing to
>> >>>> do
>> >>>>
>> >>>>  b) for each deopted workitems, we want to run it thru the
>> >>>>     interpreter going first thru the special host trampoline code
>> >>>>     infrastructure that Gilles created.  The trampoline host code
>> >>>>     takes a deoptId (sort of like a pc, telling where the deopt
>> >>>>     occurred in the hsail code) and a pointer to the saved hsail
>> >>>>     frame.  We currently do this sequentially although other
>> >>>>     policies are possible.
>> >>>>
>> >>>>  c) for each never ran workitem, we can just run it from the
>> >>>>     beginning of the kernel "method", just making sure we pass the
>> >>>>     arguments and the appropriate workitem id for each one.  Again,
>> >>>>     we currently do this sequentially although other policies are
>> >>>>     possible.
>> >>>>
>> >>>> When we enter the JVM to run the kernel, we transition to
>> >>>> thread_in_vm
>> >>> mode.  So while running on the GPU, no oops are moving (although of
>> >>> course GCs may be delayed).
>> >>>>
>> >>>> When we start looking for workitems of type b or c above, we are
>> >>>> still
>> >>> in thread_in_vm mode.  However since both b and c above use the
>> >>> javaCall infrastructure, I believe they are transitioning to
>> >>> thread_in_java mode on each call, and oops can move.
>> >>>>
>> >>>> So if for instance there are two deopting workitems, it is possible
>> >>> that after executing the first one that the saved deopt state for
>> >>> the second one is no longer valid.
>> >>>>
>> >>>> The junit tests on which I have seen the spurious failures are ones
>> >>> where lots of workitems deopt.  When run in the hotspot debug build,
>> >>> we usually see SEGVs in interpreter code and the access is always to
>> >>> 0xbaadbabe.
>> >>>>
>> >>>> Note that when Gilles was developing his infrastructure, the only
>> >>>> test
>> >>> cases we had all had a single workitem deopting so would not show
>> this.
>> >>> Also even with multi-deopting test cases, I believe the reason we
>> >>> don't see this on the simulator is that the concurrency is much less
>> >>> there so the number of workitems of type b) above will be much less.
>> >>> On hardware, we can have thousands of workitems deopting.
>> >>>>
>> >>>> I suppose the solution to this is to mark any oops in the deopt
>> >>>> saved
>> >>> state in some way that GC can find them and fix them.  What is the
>> >>> best way to do this?
>> >>>
>> >>> I'm not sure it's the most optimal solution, but around each
>> >>> javaCall, you could convert each saved oop to a Handle and convert
>> >>> it back after the call. I'm not aware of other mechanisms in HotSpot
>> >>> for registering GC roots but that doesn't mean they don't exist.
>> >>>
>> >>>> Or is there any way to execute javaCalls from thread_in_vm mode
>> >>> without allowing GCs to happen?
>> >>>
>> >>> You are calling arbitrary Java code right? That means you cannot
>> >>> guarantee allocation won't be performed which in turn means you
>> >>> cannot disable GC (even though there are mechanisms for doing so
>> >>> like GC_locker::lock_critical/GC_locker::unlock_critical).
>> >>>
>> >>> -Doug
>> >>
>> >>
>> >
>

From doug.simon at oracle.com  Mon Mar 10 15:58:04 2014
From: doug.simon at oracle.com (Doug Simon)
Date: Mon, 10 Mar 2014 16:58:04 +0100
Subject: Moving GPU offload policy into Java sources
In-Reply-To: <531DD592.2050806@oracle.com>
References: <531A347B.4030906@oracle.com>
	<581921E1-8E90-46FD-9635-A6BDBEE12A8C@oracle.com>
	<531A380B.7040608@oracle.com>
	<6ED7D40E-56A8-4EE3-A66D-A2132AB734EE@oracle.com>
	<531DD592.2050806@oracle.com>
Message-ID: <B2A0256F-ECC4-4447-991A-D6F1D66E67A0@oracle.com>

[opening up for broader discussion]

Bharadwaj,

I think this is really a discussion on whether or not it makes sense for a whole bytecode method to be offloaded to a GPU. I cannot imagine a scenario in which this would achieve faster execution that host CPU execution. In my understanding, for GPU execution to be a win, there needs to be parallel execution. This requires either some compiler-like analysis or some library API that implies parallel execution. The latter justifies the Sumatra approach. For the former, an ideal place to do the analysis is as part of compilation. The compiler may recognize a loop whose body can offloaded to GPU. So, in that sense, you could say that offload is part of compilation. However, I still think this is different from being part of *compilation policy*. For the latter, all the analysis has to be done as part of the process deciding whether or not to compile something. Not only would this be very expensive, it would almost certainly require an analysis framework very similar to what the compiler already offers.

So, I think we agree on the worthy goal of automatic GPU offload. I just think this is best done within a compilation. Assuming you still think the required analysis is best done outside of compilation, can you describe how it can be done (efficiently) and what mechanisms it would use?

-Doug

On Mar 10, 2014, at 4:09 PM, S. Bharadwaj Yadavalli <bharadwaj.yadavalli at oracle.com> wrote:

> 
> On 03/08/2014 03:50 AM, Doug Simon wrote:
>> Why? All the context needed for the decision can be accessed from Java code. In any case, it needs to be removed from the normal compilation policy mechanism.
> 
> In my opinion, deciding which non-host target to compile and execute Java methods _is_ part of compilation policy - just like the current compilation policy decides which methods to compile and which to interpret. Enhancing the present policy to offload execution of appropriate portions of Java for better performance _transparently_ is what gives the ability to run Java applications on heterogeneous systems. Adding GPU-specific changes to JDK (similar to what AMD guys did for Streams) is at best an intermediate step. Taking that approach will require implementations of data structures such as Streams to be specialized for GPUs as well as other heterogeneous architectures like Intel's Phi. We will have to then specialize implementations of other data structures.
> 
> I believe that non-host offload should be decided by the VM based on structure of the code in a compilation unit and the nature of data that unit manipulates. Any specialization/annotation in the library code should be to provide hints to the offload policy.
> 
> Bharadwaj
> 


From tom.deneau at amd.com  Mon Mar 10 16:53:11 2014
From: tom.deneau at amd.com (Deneau, Tom)
Date: Mon, 10 Mar 2014 16:53:11 +0000
Subject: suspcions about GC and HSAIL Deopt
In-Reply-To: <CAGjk+z_MUwxqV2Ss15dbRmPOT+iOL519FqeY4yRB5GWKq+bQDQ@mail.gmail.com>
References: <BC97738F8E7C8742BABED7F06FB9DF9153EA255C@satlexdag04.amd.com>
	<661FF05A-0F67-43EE-9281-85C1BA875994@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EA26E3@satlexdag04.amd.com>
	<B0CDE585-AA48-440B-88A7-89A697F37D81@oracle.com>
	<CAGjk+z8oezhwR6vEovch-XD+XLGv2+oFuuU8eKMvCn+qK=fuKQ@mail.gmail.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EAB577@satlexdag04.amd.com>
	<CAGjk+z_MUwxqV2Ss15dbRmPOT+iOL519FqeY4yRB5GWKq+bQDQ@mail.gmail.com>
Message-ID: <BC97738F8E7C8742BABED7F06FB9DF9153EAB72F@satlexdag04.amd.com>

Gilles, Doug --


I was wondering about this statement Doug made...


This javaCall is to the special deopting nmethod if I understand correctly. And the save state area is used solely as input to a deopt instruction in which case there is no possibility of a GC between entering the javaCall and hitting the deopt instruction by which time all oops have been copied from the save state area (i.e., the hsailFrame) to slots in the special deopting method?s frame.


Is it true there is no possibility of GC between entering the nmethod and hitting the deopt call/instruction?  How is that prevented?


-- Tom


> -----Original Message-----

> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of

> Gilles Duboscq

> Sent: Monday, March 10, 2014 10:14 AM

> To: Deneau, Tom

> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net

> Subject: Re: suspcions about GC and HSAIL Deopt

>

> Ok, sounds good

>

> On Mon, Mar 10, 2014 at 2:28 PM, Tom Deneau <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:

> > Gilles --

> >

> > Update on this...

> >

> > Yes, I put in the code to save the oops maps, currently somewhat

> simplified in that only hsail $d registers can have oops and we are not

> saving stack slots yet.

> >

> > Using that I implemented a quickie solution that copied the detected

> oops into a regular java Object array before the first deopt, then

> reloaded them into the particular frame before each deopt.  Logging code

> did show that there were times when the original value of the oop had

> changed to a new value and we no longer hit our spurious failures.

> I'm sure its inefficient when compared to an oops_do approach but it did

> seem to work.

> >

> > I will probably submit the webrev with this quickie solution and we

> can discuss how to make it use oops_do.

> >

> > -- Tom

> >

> >

> >

> >

> >

> >> -----Original Message-----

> >> From: gilwooden at gmail.com<mailto:gilwooden at gmail.com> [mailto:gilwooden at gmail.com] On Behalf Of

> >> Gilles Duboscq

> >> Sent: Monday, March 10, 2014 7:58 AM

> >> To: Deneau, Tom

> >> Cc: graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>; sumatra-dev at openjdk.java.net<mailto:sumatra-dev at openjdk.java.net>

> >> Subject: Re: suspcions about GC and HSAIL Deopt

> >>

> >> Using Handle and restoring the value should work. In the long term we

> >> may want to just have an opps_do on the save area and hook into

> >> JavaThread::oops_do.

> >>

> >> However even with the Handles version you need "oop maps" for the

> >> save areas. It shouldn't be very hard to extract them from the HSAIL

> >> compilation but currently they are just thrown away.

> >>

> >> -Gilles

> >>

> >> On Fri, Mar 7, 2014 at 2:21 PM, Doug Simon <doug.simon at oracle.com<mailto:doug.simon at oracle.com>>

> >> wrote:

> >> >

> >> > On Mar 7, 2014, at 1:52 PM, Deneau, Tom <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:

> >> >

> >> >> Doug --

> >> >>

> >> >> Regarding your handle-based solution...

> >> >>

> >> >> would it be sufficient to convert all the saved oops (in all the

> >> workitem saved state areas) to Handles before the first javaCall

> >> (while we are still in thread_in_vm mode), and then before each

> >> javaCall just convert back the one save area that is being used in

> that javaCall?

> >> >

> >> > This javaCall is to the special deopting nmethod if I understand

> >> correctly. And the save state area is used solely as input to a deopt

> >> instruction in which case there is no possibility of a GC between

> >> entering the javaCall and hitting the deopt instruction by which time

> >> all oops have been copied from the save state area (i.e., the

> >> hsailFrame) to slots in the special deopting method?s frame. At that

> >> point, the oops in the save state area are dead and standard GC root

> >> scanning knows where to find their copies. If this is all correct,

> >> then your suggestion should work.

> >> >

> >> > -Doug

> >> >

> >> >>> -----Original Message-----

> >> >>> From: Doug Simon [mailto:doug.simon at oracle.com]

> >> >>> Sent: Friday, March 07, 2014 4:27 AM

> >> >>> To: Deneau, Tom

> >> >>> Cc: graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>; sumatra-dev at openjdk.java.net<mailto:sumatra-dev at openjdk.java.net>

> >> >>> Subject: Re: suspcions about GC and HSAIL Deopt

> >> >>>

> >> >>>

> >> >>> On Mar 7, 2014, at 12:30 AM, Deneau, Tom <tom.deneau at amd.com<mailto:tom.deneau at amd.com>>

> wrote:

> >> >>>

> >> >>>> While preparing this webrev for the hsail deoptimization work

> >> >>>> we've

> >> >>> been doing, I noticed some spurious failures when we run on HSA

> >> >>> hardware.  I have a theory of what's happening, let me know if

> >> >>> this makes sense...

> >> >>>>

> >> >>>> First the big overview:

> >> >>>>

> >> >>>> When we run a kernel, and it returns from the GPU each workitem

> >> >>>> can be

> >> >>> in one of 3 states:

> >> >>>>

> >> >>>>  a) finished normally

> >> >>>>  b) deopted and saved its state (and set the deopt-happened

> >> >>>> flag)

> >> >>>>  c) on entry, saw deopt-happened=true and so just exited early

> >> >>>>     without running.

> >> >>>>

> >> >>>> This last one exists because we don't want to have to allocate

> >> >>>> enough

> >> >>> deopt save space so that each workitem has its own unique save

> >> space.

> >> >>>> Instead we only allocate enough for the number of concurrent

> >> >>>> workitems

> >> >>> possible.

> >> >>>>

> >> >>>> When we return from the GPU, if one or more workitems deopted

> we:

> >> >>>>

> >> >>>>  a) for the workitems that finished normally, there is nothing

> >> >>>> to do

> >> >>>>

> >> >>>>  b) for each deopted workitems, we want to run it thru the

> >> >>>>     interpreter going first thru the special host trampoline

> code

> >> >>>>     infrastructure that Gilles created.  The trampoline host

> code

> >> >>>>     takes a deoptId (sort of like a pc, telling where the deopt

> >> >>>>     occurred in the hsail code) and a pointer to the saved hsail

> >> >>>>     frame.  We currently do this sequentially although other

> >> >>>>     policies are possible.

> >> >>>>

> >> >>>>  c) for each never ran workitem, we can just run it from the

> >> >>>>     beginning of the kernel "method", just making sure we pass

> the

> >> >>>>     arguments and the appropriate workitem id for each one.

> Again,

> >> >>>>     we currently do this sequentially although other policies

> are

> >> >>>>     possible.

> >> >>>>

> >> >>>> When we enter the JVM to run the kernel, we transition to

> >> >>>> thread_in_vm

> >> >>> mode.  So while running on the GPU, no oops are moving (although

> >> >>> of course GCs may be delayed).

> >> >>>>

> >> >>>> When we start looking for workitems of type b or c above, we are

> >> >>>> still

> >> >>> in thread_in_vm mode.  However since both b and c above use the

> >> >>> javaCall infrastructure, I believe they are transitioning to

> >> >>> thread_in_java mode on each call, and oops can move.

> >> >>>>

> >> >>>> So if for instance there are two deopting workitems, it is

> >> >>>> possible

> >> >>> that after executing the first one that the saved deopt state for

> >> >>> the second one is no longer valid.

> >> >>>>

> >> >>>> The junit tests on which I have seen the spurious failures are

> >> >>>> ones

> >> >>> where lots of workitems deopt.  When run in the hotspot debug

> >> >>> build, we usually see SEGVs in interpreter code and the access is

> >> >>> always to 0xbaadbabe.

> >> >>>>

> >> >>>> Note that when Gilles was developing his infrastructure, the

> >> >>>> only test

> >> >>> cases we had all had a single workitem deopting so would not show

> >> this.

> >> >>> Also even with multi-deopting test cases, I believe the reason we

> >> >>> don't see this on the simulator is that the concurrency is much

> >> >>> less there so the number of workitems of type b) above will be

> much less.

> >> >>> On hardware, we can have thousands of workitems deopting.

> >> >>>>

> >> >>>> I suppose the solution to this is to mark any oops in the deopt

> >> >>>> saved

> >> >>> state in some way that GC can find them and fix them.  What is

> >> >>> the best way to do this?

> >> >>>

> >> >>> I'm not sure it's the most optimal solution, but around each

> >> >>> javaCall, you could convert each saved oop to a Handle and

> >> >>> convert it back after the call. I'm not aware of other mechanisms

> >> >>> in HotSpot for registering GC roots but that doesn't mean they

> don't exist.

> >> >>>

> >> >>>> Or is there any way to execute javaCalls from thread_in_vm mode

> >> >>> without allowing GCs to happen?

> >> >>>

> >> >>> You are calling arbitrary Java code right? That means you cannot

> >> >>> guarantee allocation won't be performed which in turn means you

> >> >>> cannot disable GC (even though there are mechanisms for doing so

> >> >>> like GC_locker::lock_critical/GC_locker::unlock_critical).

> >> >>>

> >> >>> -Doug

> >> >>

> >> >>

> >> >

> >


From doug.simon at oracle.com  Mon Mar 10 17:03:12 2014
From: doug.simon at oracle.com (Doug Simon)
Date: Mon, 10 Mar 2014 18:03:12 +0100
Subject: suspcions about GC and HSAIL Deopt
In-Reply-To: <BC97738F8E7C8742BABED7F06FB9DF9153EAB72F@satlexdag04.amd.com>
References: <BC97738F8E7C8742BABED7F06FB9DF9153EA255C@satlexdag04.amd.com>
	<661FF05A-0F67-43EE-9281-85C1BA875994@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EA26E3@satlexdag04.amd.com>
	<B0CDE585-AA48-440B-88A7-89A697F37D81@oracle.com>
	<CAGjk+z8oezhwR6vEovch-XD+XLGv2+oFuuU8eKMvCn+qK=fuKQ@mail.gmail.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EAB577@satlexdag04.amd.com>
	<CAGjk+z_MUwxqV2Ss15dbRmPOT+iOL519FqeY4yRB5GWKq+bQDQ@mail.gmail.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EAB72F@satlexdag04.amd.com>
Message-ID: <759DBD3C-1037-4D2F-BB24-F93719E21A66@oracle.com>

It?s based on my understanding of what the special deopting method does which is something like:

void deoptFromHSAIL(int id, HSAILFrame frame) {
   if (id == 0) {
       // copy info out of frame into registers/stack slots
       Deoptimize();
   } else if (id == 1) {
       // copy info out of frame into registers/stack slots
       Deoptimize();
   } else if ...

Gilles can confirm/correct.

-Doug

On Mar 10, 2014, at 5:53 PM, Deneau, Tom <tom.deneau at amd.com> wrote:

> Gilles, Doug --
> 
> 
> 
> I was wondering about this statement Doug made...
> 
> 
> 
> This javaCall is to the special deopting nmethod if I understand correctly. And the save state area is used solely as input to a deopt instruction in which case there is no possibility of a GC between entering the javaCall and hitting the deopt instruction by which time all oops have been copied from the save state area (i.e., the hsailFrame) to slots in the special deopting method?s frame.
> 
> 
> 
> 
> 
> Is it true there is no possibility of GC between entering the nmethod and hitting the deopt call/instruction?  How is that prevented?
> 
> 
> 
> -- Tom
> 
> 
> 
>> -----Original Message-----
> 
>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
> 
>> Gilles Duboscq
> 
>> Sent: Monday, March 10, 2014 10:14 AM
> 
>> To: Deneau, Tom
> 
>> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
> 
>> Subject: Re: suspcions about GC and HSAIL Deopt
> 
>> 
> 
>> Ok, sounds good
> 
>> 
> 
>> On Mon, Mar 10, 2014 at 2:28 PM, Tom Deneau <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
> 
>>> Gilles --
> 
>>> 
> 
>>> Update on this...
> 
>>> 
> 
>>> Yes, I put in the code to save the oops maps, currently somewhat
> 
>> simplified in that only hsail $d registers can have oops and we are not
> 
>> saving stack slots yet.
> 
>>> 
> 
>>> Using that I implemented a quickie solution that copied the detected
> 
>> oops into a regular java Object array before the first deopt, then
> 
>> reloaded them into the particular frame before each deopt.  Logging code
> 
>> did show that there were times when the original value of the oop had
> 
>> changed to a new value and we no longer hit our spurious failures.
> 
>> I'm sure its inefficient when compared to an oops_do approach but it did
> 
>> seem to work.
> 
>>> 
> 
>>> I will probably submit the webrev with this quickie solution and we
> 
>> can discuss how to make it use oops_do.
> 
>>> 
> 
>>> -- Tom
> 
>>> 
> 
>>> 
> 
>>> 
> 
>>> 
> 
>>> 
> 
>>>> -----Original Message-----
> 
>>>> From: gilwooden at gmail.com<mailto:gilwooden at gmail.com> [mailto:gilwooden at gmail.com] On Behalf Of
> 
>>>> Gilles Duboscq
> 
>>>> Sent: Monday, March 10, 2014 7:58 AM
> 
>>>> To: Deneau, Tom
> 
>>>> Cc: graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>; sumatra-dev at openjdk.java.net<mailto:sumatra-dev at openjdk.java.net>
> 
>>>> Subject: Re: suspcions about GC and HSAIL Deopt
> 
>>>> 
> 
>>>> Using Handle and restoring the value should work. In the long term we
> 
>>>> may want to just have an opps_do on the save area and hook into
> 
>>>> JavaThread::oops_do.
> 
>>>> 
> 
>>>> However even with the Handles version you need "oop maps" for the
> 
>>>> save areas. It shouldn't be very hard to extract them from the HSAIL
> 
>>>> compilation but currently they are just thrown away.
> 
>>>> 
> 
>>>> -Gilles
> 
>>>> 
> 
>>>> On Fri, Mar 7, 2014 at 2:21 PM, Doug Simon <doug.simon at oracle.com<mailto:doug.simon at oracle.com>>
> 
>>>> wrote:
> 
>>>>> 
> 
>>>>> On Mar 7, 2014, at 1:52 PM, Deneau, Tom <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
> 
>>>>> 
> 
>>>>>> Doug --
> 
>>>>>> 
> 
>>>>>> Regarding your handle-based solution...
> 
>>>>>> 
> 
>>>>>> would it be sufficient to convert all the saved oops (in all the
> 
>>>> workitem saved state areas) to Handles before the first javaCall
> 
>>>> (while we are still in thread_in_vm mode), and then before each
> 
>>>> javaCall just convert back the one save area that is being used in
> 
>> that javaCall?
> 
>>>>> 
> 
>>>>> This javaCall is to the special deopting nmethod if I understand
> 
>>>> correctly. And the save state area is used solely as input to a deopt
> 
>>>> instruction in which case there is no possibility of a GC between
> 
>>>> entering the javaCall and hitting the deopt instruction by which time
> 
>>>> all oops have been copied from the save state area (i.e., the
> 
>>>> hsailFrame) to slots in the special deopting method?s frame. At that
> 
>>>> point, the oops in the save state area are dead and standard GC root
> 
>>>> scanning knows where to find their copies. If this is all correct,
> 
>>>> then your suggestion should work.
> 
>>>>> 
> 
>>>>> -Doug
> 
>>>>> 
> 
>>>>>>> -----Original Message-----
> 
>>>>>>> From: Doug Simon [mailto:doug.simon at oracle.com]
> 
>>>>>>> Sent: Friday, March 07, 2014 4:27 AM
> 
>>>>>>> To: Deneau, Tom
> 
>>>>>>> Cc: graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>; sumatra-dev at openjdk.java.net<mailto:sumatra-dev at openjdk.java.net>
> 
>>>>>>> Subject: Re: suspcions about GC and HSAIL Deopt
> 
>>>>>>> 
> 
>>>>>>> 
> 
>>>>>>> On Mar 7, 2014, at 12:30 AM, Deneau, Tom <tom.deneau at amd.com<mailto:tom.deneau at amd.com>>
> 
>> wrote:
> 
>>>>>>> 
> 
>>>>>>>> While preparing this webrev for the hsail deoptimization work
> 
>>>>>>>> we've
> 
>>>>>>> been doing, I noticed some spurious failures when we run on HSA
> 
>>>>>>> hardware.  I have a theory of what's happening, let me know if
> 
>>>>>>> this makes sense...
> 
>>>>>>>> 
> 
>>>>>>>> First the big overview:
> 
>>>>>>>> 
> 
>>>>>>>> When we run a kernel, and it returns from the GPU each workitem
> 
>>>>>>>> can be
> 
>>>>>>> in one of 3 states:
> 
>>>>>>>> 
> 
>>>>>>>> a) finished normally
> 
>>>>>>>> b) deopted and saved its state (and set the deopt-happened
> 
>>>>>>>> flag)
> 
>>>>>>>> c) on entry, saw deopt-happened=true and so just exited early
> 
>>>>>>>>    without running.
> 
>>>>>>>> 
> 
>>>>>>>> This last one exists because we don't want to have to allocate
> 
>>>>>>>> enough
> 
>>>>>>> deopt save space so that each workitem has its own unique save
> 
>>>> space.
> 
>>>>>>>> Instead we only allocate enough for the number of concurrent
> 
>>>>>>>> workitems
> 
>>>>>>> possible.
> 
>>>>>>>> 
> 
>>>>>>>> When we return from the GPU, if one or more workitems deopted
> 
>> we:
> 
>>>>>>>> 
> 
>>>>>>>> a) for the workitems that finished normally, there is nothing
> 
>>>>>>>> to do
> 
>>>>>>>> 
> 
>>>>>>>> b) for each deopted workitems, we want to run it thru the
> 
>>>>>>>>    interpreter going first thru the special host trampoline
> 
>> code
> 
>>>>>>>>    infrastructure that Gilles created.  The trampoline host
> 
>> code
> 
>>>>>>>>    takes a deoptId (sort of like a pc, telling where the deopt
> 
>>>>>>>>    occurred in the hsail code) and a pointer to the saved hsail
> 
>>>>>>>>    frame.  We currently do this sequentially although other
> 
>>>>>>>>    policies are possible.
> 
>>>>>>>> 
> 
>>>>>>>> c) for each never ran workitem, we can just run it from the
> 
>>>>>>>>    beginning of the kernel "method", just making sure we pass
> 
>> the
> 
>>>>>>>>    arguments and the appropriate workitem id for each one.
> 
>> Again,
> 
>>>>>>>>    we currently do this sequentially although other policies
> 
>> are
> 
>>>>>>>>    possible.
> 
>>>>>>>> 
> 
>>>>>>>> When we enter the JVM to run the kernel, we transition to
> 
>>>>>>>> thread_in_vm
> 
>>>>>>> mode.  So while running on the GPU, no oops are moving (although
> 
>>>>>>> of course GCs may be delayed).
> 
>>>>>>>> 
> 
>>>>>>>> When we start looking for workitems of type b or c above, we are
> 
>>>>>>>> still
> 
>>>>>>> in thread_in_vm mode.  However since both b and c above use the
> 
>>>>>>> javaCall infrastructure, I believe they are transitioning to
> 
>>>>>>> thread_in_java mode on each call, and oops can move.
> 
>>>>>>>> 
> 
>>>>>>>> So if for instance there are two deopting workitems, it is
> 
>>>>>>>> possible
> 
>>>>>>> that after executing the first one that the saved deopt state for
> 
>>>>>>> the second one is no longer valid.
> 
>>>>>>>> 
> 
>>>>>>>> The junit tests on which I have seen the spurious failures are
> 
>>>>>>>> ones
> 
>>>>>>> where lots of workitems deopt.  When run in the hotspot debug
> 
>>>>>>> build, we usually see SEGVs in interpreter code and the access is
> 
>>>>>>> always to 0xbaadbabe.
> 
>>>>>>>> 
> 
>>>>>>>> Note that when Gilles was developing his infrastructure, the
> 
>>>>>>>> only test
> 
>>>>>>> cases we had all had a single workitem deopting so would not show
> 
>>>> this.
> 
>>>>>>> Also even with multi-deopting test cases, I believe the reason we
> 
>>>>>>> don't see this on the simulator is that the concurrency is much
> 
>>>>>>> less there so the number of workitems of type b) above will be
> 
>> much less.
> 
>>>>>>> On hardware, we can have thousands of workitems deopting.
> 
>>>>>>>> 
> 
>>>>>>>> I suppose the solution to this is to mark any oops in the deopt
> 
>>>>>>>> saved
> 
>>>>>>> state in some way that GC can find them and fix them.  What is
> 
>>>>>>> the best way to do this?
> 
>>>>>>> 
> 
>>>>>>> I'm not sure it's the most optimal solution, but around each
> 
>>>>>>> javaCall, you could convert each saved oop to a Handle and
> 
>>>>>>> convert it back after the call. I'm not aware of other mechanisms
> 
>>>>>>> in HotSpot for registering GC roots but that doesn't mean they
> 
>> don't exist.
> 
>>>>>>> 
> 
>>>>>>>> Or is there any way to execute javaCalls from thread_in_vm mode
> 
>>>>>>> without allowing GCs to happen?
> 
>>>>>>> 
> 
>>>>>>> You are calling arbitrary Java code right? That means you cannot
> 
>>>>>>> guarantee allocation won't be performed which in turn means you
> 
>>>>>>> cannot disable GC (even though there are mechanisms for doing so
> 
>>>>>>> like GC_locker::lock_critical/GC_locker::unlock_critical).
> 
>>>>>>> 
> 
>>>>>>> -Doug
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>> 
> 
>>> 
> 
> 


From tom.deneau at amd.com  Mon Mar 10 17:10:12 2014
From: tom.deneau at amd.com (Deneau, Tom)
Date: Mon, 10 Mar 2014 17:10:12 +0000
Subject: suspcions about GC and HSAIL Deopt
In-Reply-To: <759DBD3C-1037-4D2F-BB24-F93719E21A66@oracle.com>
References: <BC97738F8E7C8742BABED7F06FB9DF9153EA255C@satlexdag04.amd.com>
	<661FF05A-0F67-43EE-9281-85C1BA875994@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EA26E3@satlexdag04.amd.com>
	<B0CDE585-AA48-440B-88A7-89A697F37D81@oracle.com>
	<CAGjk+z8oezhwR6vEovch-XD+XLGv2+oFuuU8eKMvCn+qK=fuKQ@mail.gmail.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EAB577@satlexdag04.amd.com>
	<CAGjk+z_MUwxqV2Ss15dbRmPOT+iOL519FqeY4yRB5GWKq+bQDQ@mail.gmail.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EAB72F@satlexdag04.amd.com>
	<759DBD3C-1037-4D2F-BB24-F93719E21A66@oracle.com>
Message-ID: <BC97738F8E7C8742BABED7F06FB9DF9153EAB771@satlexdag04.amd.com>

Ah, I was worried about the (admittedly small) window between entering the special deopting method and getting those values safely into register/stack slots, but now I realize there are no safepoints in that window (I hope) so no GC can happen.

-- Tom

> -----Original Message-----
> From: Doug Simon [mailto:doug.simon at oracle.com]
> Sent: Monday, March 10, 2014 12:03 PM
> To: Deneau, Tom
> Cc: Gilles Duboscq; sumatra-dev at openjdk.java.net; graal-
> dev at openjdk.java.net
> Subject: Re: suspcions about GC and HSAIL Deopt
> 
> It's based on my understanding of what the special deopting method does
> which is something like:
> 
> void deoptFromHSAIL(int id, HSAILFrame frame) {
>    if (id == 0) {
>        // copy info out of frame into registers/stack slots
>        Deoptimize();
>    } else if (id == 1) {
>        // copy info out of frame into registers/stack slots
>        Deoptimize();
>    } else if ...
> 
> Gilles can confirm/correct.
> 
> -Doug
> 
> On Mar 10, 2014, at 5:53 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> 
> > Gilles, Doug --
> >
> >
> >
> > I was wondering about this statement Doug made...
> >
> >
> >
> > This javaCall is to the special deopting nmethod if I understand
> correctly. And the save state area is used solely as input to a deopt
> instruction in which case there is no possibility of a GC between
> entering the javaCall and hitting the deopt instruction by which time
> all oops have been copied from the save state area (i.e., the
> hsailFrame) to slots in the special deopting method's frame.
> >
> >
> >
> >
> >
> > Is it true there is no possibility of GC between entering the nmethod
> and hitting the deopt call/instruction?  How is that prevented?
> >
> >
> >
> > -- Tom
> >
> >
> >
> >> -----Original Message-----
> >
> >> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
> >
> >> Gilles Duboscq
> >
> >> Sent: Monday, March 10, 2014 10:14 AM
> >
> >> To: Deneau, Tom
> >
> >> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
> >
> >> Subject: Re: suspcions about GC and HSAIL Deopt
> >
> >>
> >
> >> Ok, sounds good
> >
> >>
> >
> >> On Mon, Mar 10, 2014 at 2:28 PM, Tom Deneau
> <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
> >
> >>> Gilles --
> >
> >>>
> >
> >>> Update on this...
> >
> >>>
> >
> >>> Yes, I put in the code to save the oops maps, currently somewhat
> >
> >> simplified in that only hsail $d registers can have oops and we are
> not
> >
> >> saving stack slots yet.
> >
> >>>
> >
> >>> Using that I implemented a quickie solution that copied the detected
> >
> >> oops into a regular java Object array before the first deopt, then
> >
> >> reloaded them into the particular frame before each deopt.  Logging
> code
> >
> >> did show that there were times when the original value of the oop had
> >
> >> changed to a new value and we no longer hit our spurious failures.
> >
> >> I'm sure its inefficient when compared to an oops_do approach but it
> did
> >
> >> seem to work.
> >
> >>>
> >
> >>> I will probably submit the webrev with this quickie solution and we
> >
> >> can discuss how to make it use oops_do.
> >
> >>>
> >
> >>> -- Tom
> >
> >>>
> >
> >>>
> >
> >>>
> >
> >>>
> >
> >>>
> >
> >>>> -----Original Message-----
> >
> >>>> From: gilwooden at gmail.com<mailto:gilwooden at gmail.com>
> [mailto:gilwooden at gmail.com] On Behalf Of
> >
> >>>> Gilles Duboscq
> >
> >>>> Sent: Monday, March 10, 2014 7:58 AM
> >
> >>>> To: Deneau, Tom
> >
> >>>> Cc: graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>;
> sumatra-dev at openjdk.java.net<mailto:sumatra-dev at openjdk.java.net>
> >
> >>>> Subject: Re: suspcions about GC and HSAIL Deopt
> >
> >>>>
> >
> >>>> Using Handle and restoring the value should work. In the long term
> we
> >
> >>>> may want to just have an opps_do on the save area and hook into
> >
> >>>> JavaThread::oops_do.
> >
> >>>>
> >
> >>>> However even with the Handles version you need "oop maps" for the
> >
> >>>> save areas. It shouldn't be very hard to extract them from the
> HSAIL
> >
> >>>> compilation but currently they are just thrown away.
> >
> >>>>
> >
> >>>> -Gilles
> >
> >>>>
> >
> >>>> On Fri, Mar 7, 2014 at 2:21 PM, Doug Simon
> <doug.simon at oracle.com<mailto:doug.simon at oracle.com>>
> >
> >>>> wrote:
> >
> >>>>>
> >
> >>>>> On Mar 7, 2014, at 1:52 PM, Deneau, Tom
> <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
> >
> >>>>>
> >
> >>>>>> Doug --
> >
> >>>>>>
> >
> >>>>>> Regarding your handle-based solution...
> >
> >>>>>>
> >
> >>>>>> would it be sufficient to convert all the saved oops (in all the
> >
> >>>> workitem saved state areas) to Handles before the first javaCall
> >
> >>>> (while we are still in thread_in_vm mode), and then before each
> >
> >>>> javaCall just convert back the one save area that is being used in
> >
> >> that javaCall?
> >
> >>>>>
> >
> >>>>> This javaCall is to the special deopting nmethod if I understand
> >
> >>>> correctly. And the save state area is used solely as input to a
> deopt
> >
> >>>> instruction in which case there is no possibility of a GC between
> >
> >>>> entering the javaCall and hitting the deopt instruction by which
> time
> >
> >>>> all oops have been copied from the save state area (i.e., the
> >
> >>>> hsailFrame) to slots in the special deopting method's frame. At
> that
> >
> >>>> point, the oops in the save state area are dead and standard GC
> root
> >
> >>>> scanning knows where to find their copies. If this is all correct,
> >
> >>>> then your suggestion should work.
> >
> >>>>>
> >
> >>>>> -Doug
> >
> >>>>>
> >
> >>>>>>> -----Original Message-----
> >
> >>>>>>> From: Doug Simon [mailto:doug.simon at oracle.com]
> >
> >>>>>>> Sent: Friday, March 07, 2014 4:27 AM
> >
> >>>>>>> To: Deneau, Tom
> >
> >>>>>>> Cc: graal-dev at openjdk.java.net<mailto:graal-
> dev at openjdk.java.net>; sumatra-dev at openjdk.java.net<mailto:sumatra-
> dev at openjdk.java.net>
> >
> >>>>>>> Subject: Re: suspcions about GC and HSAIL Deopt
> >
> >>>>>>>
> >
> >>>>>>>
> >
> >>>>>>> On Mar 7, 2014, at 12:30 AM, Deneau, Tom
> <tom.deneau at amd.com<mailto:tom.deneau at amd.com>>
> >
> >> wrote:
> >
> >>>>>>>
> >
> >>>>>>>> While preparing this webrev for the hsail deoptimization work
> >
> >>>>>>>> we've
> >
> >>>>>>> been doing, I noticed some spurious failures when we run on HSA
> >
> >>>>>>> hardware.  I have a theory of what's happening, let me know if
> >
> >>>>>>> this makes sense...
> >
> >>>>>>>>
> >
> >>>>>>>> First the big overview:
> >
> >>>>>>>>
> >
> >>>>>>>> When we run a kernel, and it returns from the GPU each workitem
> >
> >>>>>>>> can be
> >
> >>>>>>> in one of 3 states:
> >
> >>>>>>>>
> >
> >>>>>>>> a) finished normally
> >
> >>>>>>>> b) deopted and saved its state (and set the deopt-happened
> >
> >>>>>>>> flag)
> >
> >>>>>>>> c) on entry, saw deopt-happened=true and so just exited early
> >
> >>>>>>>>    without running.
> >
> >>>>>>>>
> >
> >>>>>>>> This last one exists because we don't want to have to allocate
> >
> >>>>>>>> enough
> >
> >>>>>>> deopt save space so that each workitem has its own unique save
> >
> >>>> space.
> >
> >>>>>>>> Instead we only allocate enough for the number of concurrent
> >
> >>>>>>>> workitems
> >
> >>>>>>> possible.
> >
> >>>>>>>>
> >
> >>>>>>>> When we return from the GPU, if one or more workitems deopted
> >
> >> we:
> >
> >>>>>>>>
> >
> >>>>>>>> a) for the workitems that finished normally, there is nothing
> >
> >>>>>>>> to do
> >
> >>>>>>>>
> >
> >>>>>>>> b) for each deopted workitems, we want to run it thru the
> >
> >>>>>>>>    interpreter going first thru the special host trampoline
> >
> >> code
> >
> >>>>>>>>    infrastructure that Gilles created.  The trampoline host
> >
> >> code
> >
> >>>>>>>>    takes a deoptId (sort of like a pc, telling where the deopt
> >
> >>>>>>>>    occurred in the hsail code) and a pointer to the saved hsail
> >
> >>>>>>>>    frame.  We currently do this sequentially although other
> >
> >>>>>>>>    policies are possible.
> >
> >>>>>>>>
> >
> >>>>>>>> c) for each never ran workitem, we can just run it from the
> >
> >>>>>>>>    beginning of the kernel "method", just making sure we pass
> >
> >> the
> >
> >>>>>>>>    arguments and the appropriate workitem id for each one.
> >
> >> Again,
> >
> >>>>>>>>    we currently do this sequentially although other policies
> >
> >> are
> >
> >>>>>>>>    possible.
> >
> >>>>>>>>
> >
> >>>>>>>> When we enter the JVM to run the kernel, we transition to
> >
> >>>>>>>> thread_in_vm
> >
> >>>>>>> mode.  So while running on the GPU, no oops are moving (although
> >
> >>>>>>> of course GCs may be delayed).
> >
> >>>>>>>>
> >
> >>>>>>>> When we start looking for workitems of type b or c above, we
> are
> >
> >>>>>>>> still
> >
> >>>>>>> in thread_in_vm mode.  However since both b and c above use the
> >
> >>>>>>> javaCall infrastructure, I believe they are transitioning to
> >
> >>>>>>> thread_in_java mode on each call, and oops can move.
> >
> >>>>>>>>
> >
> >>>>>>>> So if for instance there are two deopting workitems, it is
> >
> >>>>>>>> possible
> >
> >>>>>>> that after executing the first one that the saved deopt state
> for
> >
> >>>>>>> the second one is no longer valid.
> >
> >>>>>>>>
> >
> >>>>>>>> The junit tests on which I have seen the spurious failures are
> >
> >>>>>>>> ones
> >
> >>>>>>> where lots of workitems deopt.  When run in the hotspot debug
> >
> >>>>>>> build, we usually see SEGVs in interpreter code and the access
> is
> >
> >>>>>>> always to 0xbaadbabe.
> >
> >>>>>>>>
> >
> >>>>>>>> Note that when Gilles was developing his infrastructure, the
> >
> >>>>>>>> only test
> >
> >>>>>>> cases we had all had a single workitem deopting so would not
> show
> >
> >>>> this.
> >
> >>>>>>> Also even with multi-deopting test cases, I believe the reason
> we
> >
> >>>>>>> don't see this on the simulator is that the concurrency is much
> >
> >>>>>>> less there so the number of workitems of type b) above will be
> >
> >> much less.
> >
> >>>>>>> On hardware, we can have thousands of workitems deopting.
> >
> >>>>>>>>
> >
> >>>>>>>> I suppose the solution to this is to mark any oops in the deopt
> >
> >>>>>>>> saved
> >
> >>>>>>> state in some way that GC can find them and fix them.  What is
> >
> >>>>>>> the best way to do this?
> >
> >>>>>>>
> >
> >>>>>>> I'm not sure it's the most optimal solution, but around each
> >
> >>>>>>> javaCall, you could convert each saved oop to a Handle and
> >
> >>>>>>> convert it back after the call. I'm not aware of other
> mechanisms
> >
> >>>>>>> in HotSpot for registering GC roots but that doesn't mean they
> >
> >> don't exist.
> >
> >>>>>>>
> >
> >>>>>>>> Or is there any way to execute javaCalls from thread_in_vm mode
> >
> >>>>>>> without allowing GCs to happen?
> >
> >>>>>>>
> >
> >>>>>>> You are calling arbitrary Java code right? That means you cannot
> >
> >>>>>>> guarantee allocation won't be performed which in turn means you
> >
> >>>>>>> cannot disable GC (even though there are mechanisms for doing so
> >
> >>>>>>> like GC_locker::lock_critical/GC_locker::unlock_critical).
> >
> >>>>>>>
> >
> >>>>>>> -Doug
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>
> >
> >>>
> >
> >
> 


From duboscq at ssw.jku.at  Mon Mar 10 17:45:34 2014
From: duboscq at ssw.jku.at (Gilles Duboscq)
Date: Mon, 10 Mar 2014 18:45:34 +0100
Subject: suspcions about GC and HSAIL Deopt
In-Reply-To: <BC97738F8E7C8742BABED7F06FB9DF9153EAB771@satlexdag04.amd.com>
References: <BC97738F8E7C8742BABED7F06FB9DF9153EA255C@satlexdag04.amd.com>
	<661FF05A-0F67-43EE-9281-85C1BA875994@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EA26E3@satlexdag04.amd.com>
	<B0CDE585-AA48-440B-88A7-89A697F37D81@oracle.com>
	<CAGjk+z8oezhwR6vEovch-XD+XLGv2+oFuuU8eKMvCn+qK=fuKQ@mail.gmail.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EAB577@satlexdag04.amd.com>
	<CAGjk+z_MUwxqV2Ss15dbRmPOT+iOL519FqeY4yRB5GWKq+bQDQ@mail.gmail.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EAB72F@satlexdag04.amd.com>
	<759DBD3C-1037-4D2F-BB24-F93719E21A66@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153EAB771@satlexdag04.amd.com>
Message-ID: <CAGjk+z-HkpM=NF9P6rsP3J86q26E2J+Nm85tJMHarpYe63Vn2g@mail.gmail.com>

On Mon, Mar 10, 2014 at 6:10 PM, Tom Deneau <tom.deneau at amd.com> wrote:
> Ah, I was worried about the (admittedly small) window between entering the special deopting method and getting those values safely into register/stack slots, but now I realize there are no safepoints in that window (I hope) so no GC can happen.
Yes exactly
>
> -- Tom
>
>> -----Original Message-----
>> From: Doug Simon [mailto:doug.simon at oracle.com]
>> Sent: Monday, March 10, 2014 12:03 PM
>> To: Deneau, Tom
>> Cc: Gilles Duboscq; sumatra-dev at openjdk.java.net; graal-
>> dev at openjdk.java.net
>> Subject: Re: suspcions about GC and HSAIL Deopt
>>
>> It's based on my understanding of what the special deopting method does
>> which is something like:
>>
>> void deoptFromHSAIL(int id, HSAILFrame frame) {
>>    if (id == 0) {
>>        // copy info out of frame into registers/stack slots
>>        Deoptimize();
>>    } else if (id == 1) {
>>        // copy info out of frame into registers/stack slots
>>        Deoptimize();
>>    } else if ...
>>
>> Gilles can confirm/correct.
>>
>> -Doug
>>
>> On Mar 10, 2014, at 5:53 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>>
>> > Gilles, Doug --
>> >
>> >
>> >
>> > I was wondering about this statement Doug made...
>> >
>> >
>> >
>> > This javaCall is to the special deopting nmethod if I understand
>> correctly. And the save state area is used solely as input to a deopt
>> instruction in which case there is no possibility of a GC between
>> entering the javaCall and hitting the deopt instruction by which time
>> all oops have been copied from the save state area (i.e., the
>> hsailFrame) to slots in the special deopting method's frame.
>> >
>> >
>> >
>> >
>> >
>> > Is it true there is no possibility of GC between entering the nmethod
>> and hitting the deopt call/instruction?  How is that prevented?
>> >
>> >
>> >
>> > -- Tom
>> >
>> >
>> >
>> >> -----Original Message-----
>> >
>> >> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
>> >
>> >> Gilles Duboscq
>> >
>> >> Sent: Monday, March 10, 2014 10:14 AM
>> >
>> >> To: Deneau, Tom
>> >
>> >> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
>> >
>> >> Subject: Re: suspcions about GC and HSAIL Deopt
>> >
>> >>
>> >
>> >> Ok, sounds good
>> >
>> >>
>> >
>> >> On Mon, Mar 10, 2014 at 2:28 PM, Tom Deneau
>> <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
>> >
>> >>> Gilles --
>> >
>> >>>
>> >
>> >>> Update on this...
>> >
>> >>>
>> >
>> >>> Yes, I put in the code to save the oops maps, currently somewhat
>> >
>> >> simplified in that only hsail $d registers can have oops and we are
>> not
>> >
>> >> saving stack slots yet.
>> >
>> >>>
>> >
>> >>> Using that I implemented a quickie solution that copied the detected
>> >
>> >> oops into a regular java Object array before the first deopt, then
>> >
>> >> reloaded them into the particular frame before each deopt.  Logging
>> code
>> >
>> >> did show that there were times when the original value of the oop had
>> >
>> >> changed to a new value and we no longer hit our spurious failures.
>> >
>> >> I'm sure its inefficient when compared to an oops_do approach but it
>> did
>> >
>> >> seem to work.
>> >
>> >>>
>> >
>> >>> I will probably submit the webrev with this quickie solution and we
>> >
>> >> can discuss how to make it use oops_do.
>> >
>> >>>
>> >
>> >>> -- Tom
>> >
>> >>>
>> >
>> >>>
>> >
>> >>>
>> >
>> >>>
>> >
>> >>>
>> >
>> >>>> -----Original Message-----
>> >
>> >>>> From: gilwooden at gmail.com<mailto:gilwooden at gmail.com>
>> [mailto:gilwooden at gmail.com] On Behalf Of
>> >
>> >>>> Gilles Duboscq
>> >
>> >>>> Sent: Monday, March 10, 2014 7:58 AM
>> >
>> >>>> To: Deneau, Tom
>> >
>> >>>> Cc: graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>;
>> sumatra-dev at openjdk.java.net<mailto:sumatra-dev at openjdk.java.net>
>> >
>> >>>> Subject: Re: suspcions about GC and HSAIL Deopt
>> >
>> >>>>
>> >
>> >>>> Using Handle and restoring the value should work. In the long term
>> we
>> >
>> >>>> may want to just have an opps_do on the save area and hook into
>> >
>> >>>> JavaThread::oops_do.
>> >
>> >>>>
>> >
>> >>>> However even with the Handles version you need "oop maps" for the
>> >
>> >>>> save areas. It shouldn't be very hard to extract them from the
>> HSAIL
>> >
>> >>>> compilation but currently they are just thrown away.
>> >
>> >>>>
>> >
>> >>>> -Gilles
>> >
>> >>>>
>> >
>> >>>> On Fri, Mar 7, 2014 at 2:21 PM, Doug Simon
>> <doug.simon at oracle.com<mailto:doug.simon at oracle.com>>
>> >
>> >>>> wrote:
>> >
>> >>>>>
>> >
>> >>>>> On Mar 7, 2014, at 1:52 PM, Deneau, Tom
>> <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
>> >
>> >>>>>
>> >
>> >>>>>> Doug --
>> >
>> >>>>>>
>> >
>> >>>>>> Regarding your handle-based solution...
>> >
>> >>>>>>
>> >
>> >>>>>> would it be sufficient to convert all the saved oops (in all the
>> >
>> >>>> workitem saved state areas) to Handles before the first javaCall
>> >
>> >>>> (while we are still in thread_in_vm mode), and then before each
>> >
>> >>>> javaCall just convert back the one save area that is being used in
>> >
>> >> that javaCall?
>> >
>> >>>>>
>> >
>> >>>>> This javaCall is to the special deopting nmethod if I understand
>> >
>> >>>> correctly. And the save state area is used solely as input to a
>> deopt
>> >
>> >>>> instruction in which case there is no possibility of a GC between
>> >
>> >>>> entering the javaCall and hitting the deopt instruction by which
>> time
>> >
>> >>>> all oops have been copied from the save state area (i.e., the
>> >
>> >>>> hsailFrame) to slots in the special deopting method's frame. At
>> that
>> >
>> >>>> point, the oops in the save state area are dead and standard GC
>> root
>> >
>> >>>> scanning knows where to find their copies. If this is all correct,
>> >
>> >>>> then your suggestion should work.
>> >
>> >>>>>
>> >
>> >>>>> -Doug
>> >
>> >>>>>
>> >
>> >>>>>>> -----Original Message-----
>> >
>> >>>>>>> From: Doug Simon [mailto:doug.simon at oracle.com]
>> >
>> >>>>>>> Sent: Friday, March 07, 2014 4:27 AM
>> >
>> >>>>>>> To: Deneau, Tom
>> >
>> >>>>>>> Cc: graal-dev at openjdk.java.net<mailto:graal-
>> dev at openjdk.java.net>; sumatra-dev at openjdk.java.net<mailto:sumatra-
>> dev at openjdk.java.net>
>> >
>> >>>>>>> Subject: Re: suspcions about GC and HSAIL Deopt
>> >
>> >>>>>>>
>> >
>> >>>>>>>
>> >
>> >>>>>>> On Mar 7, 2014, at 12:30 AM, Deneau, Tom
>> <tom.deneau at amd.com<mailto:tom.deneau at amd.com>>
>> >
>> >> wrote:
>> >
>> >>>>>>>
>> >
>> >>>>>>>> While preparing this webrev for the hsail deoptimization work
>> >
>> >>>>>>>> we've
>> >
>> >>>>>>> been doing, I noticed some spurious failures when we run on HSA
>> >
>> >>>>>>> hardware.  I have a theory of what's happening, let me know if
>> >
>> >>>>>>> this makes sense...
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> First the big overview:
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> When we run a kernel, and it returns from the GPU each workitem
>> >
>> >>>>>>>> can be
>> >
>> >>>>>>> in one of 3 states:
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> a) finished normally
>> >
>> >>>>>>>> b) deopted and saved its state (and set the deopt-happened
>> >
>> >>>>>>>> flag)
>> >
>> >>>>>>>> c) on entry, saw deopt-happened=true and so just exited early
>> >
>> >>>>>>>>    without running.
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> This last one exists because we don't want to have to allocate
>> >
>> >>>>>>>> enough
>> >
>> >>>>>>> deopt save space so that each workitem has its own unique save
>> >
>> >>>> space.
>> >
>> >>>>>>>> Instead we only allocate enough for the number of concurrent
>> >
>> >>>>>>>> workitems
>> >
>> >>>>>>> possible.
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> When we return from the GPU, if one or more workitems deopted
>> >
>> >> we:
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> a) for the workitems that finished normally, there is nothing
>> >
>> >>>>>>>> to do
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> b) for each deopted workitems, we want to run it thru the
>> >
>> >>>>>>>>    interpreter going first thru the special host trampoline
>> >
>> >> code
>> >
>> >>>>>>>>    infrastructure that Gilles created.  The trampoline host
>> >
>> >> code
>> >
>> >>>>>>>>    takes a deoptId (sort of like a pc, telling where the deopt
>> >
>> >>>>>>>>    occurred in the hsail code) and a pointer to the saved hsail
>> >
>> >>>>>>>>    frame.  We currently do this sequentially although other
>> >
>> >>>>>>>>    policies are possible.
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> c) for each never ran workitem, we can just run it from the
>> >
>> >>>>>>>>    beginning of the kernel "method", just making sure we pass
>> >
>> >> the
>> >
>> >>>>>>>>    arguments and the appropriate workitem id for each one.
>> >
>> >> Again,
>> >
>> >>>>>>>>    we currently do this sequentially although other policies
>> >
>> >> are
>> >
>> >>>>>>>>    possible.
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> When we enter the JVM to run the kernel, we transition to
>> >
>> >>>>>>>> thread_in_vm
>> >
>> >>>>>>> mode.  So while running on the GPU, no oops are moving (although
>> >
>> >>>>>>> of course GCs may be delayed).
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> When we start looking for workitems of type b or c above, we
>> are
>> >
>> >>>>>>>> still
>> >
>> >>>>>>> in thread_in_vm mode.  However since both b and c above use the
>> >
>> >>>>>>> javaCall infrastructure, I believe they are transitioning to
>> >
>> >>>>>>> thread_in_java mode on each call, and oops can move.
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> So if for instance there are two deopting workitems, it is
>> >
>> >>>>>>>> possible
>> >
>> >>>>>>> that after executing the first one that the saved deopt state
>> for
>> >
>> >>>>>>> the second one is no longer valid.
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> The junit tests on which I have seen the spurious failures are
>> >
>> >>>>>>>> ones
>> >
>> >>>>>>> where lots of workitems deopt.  When run in the hotspot debug
>> >
>> >>>>>>> build, we usually see SEGVs in interpreter code and the access
>> is
>> >
>> >>>>>>> always to 0xbaadbabe.
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> Note that when Gilles was developing his infrastructure, the
>> >
>> >>>>>>>> only test
>> >
>> >>>>>>> cases we had all had a single workitem deopting so would not
>> show
>> >
>> >>>> this.
>> >
>> >>>>>>> Also even with multi-deopting test cases, I believe the reason
>> we
>> >
>> >>>>>>> don't see this on the simulator is that the concurrency is much
>> >
>> >>>>>>> less there so the number of workitems of type b) above will be
>> >
>> >> much less.
>> >
>> >>>>>>> On hardware, we can have thousands of workitems deopting.
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> I suppose the solution to this is to mark any oops in the deopt
>> >
>> >>>>>>>> saved
>> >
>> >>>>>>> state in some way that GC can find them and fix them.  What is
>> >
>> >>>>>>> the best way to do this?
>> >
>> >>>>>>>
>> >
>> >>>>>>> I'm not sure it's the most optimal solution, but around each
>> >
>> >>>>>>> javaCall, you could convert each saved oop to a Handle and
>> >
>> >>>>>>> convert it back after the call. I'm not aware of other
>> mechanisms
>> >
>> >>>>>>> in HotSpot for registering GC roots but that doesn't mean they
>> >
>> >> don't exist.
>> >
>> >>>>>>>
>> >
>> >>>>>>>> Or is there any way to execute javaCalls from thread_in_vm mode
>> >
>> >>>>>>> without allowing GCs to happen?
>> >
>> >>>>>>>
>> >
>> >>>>>>> You are calling arbitrary Java code right? That means you cannot
>> >
>> >>>>>>> guarantee allocation won't be performed which in turn means you
>> >
>> >>>>>>> cannot disable GC (even though there are mechanisms for doing so
>> >
>> >>>>>>> like GC_locker::lock_critical/GC_locker::unlock_critical).
>> >
>> >>>>>>>
>> >
>> >>>>>>> -Doug
>> >
>> >>>>>>
>> >
>> >>>>>>
>> >
>> >>>>>
>> >
>> >>>
>> >
>> >
>>
>
>

From bharadwaj.yadavalli at oracle.com  Tue Mar 11 15:58:11 2014
From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli)
Date: Tue, 11 Mar 2014 11:58:11 -0400
Subject: Moving GPU offload policy into Java sources
In-Reply-To: <B2A0256F-ECC4-4447-991A-D6F1D66E67A0@oracle.com>
References: <531A347B.4030906@oracle.com>
	<581921E1-8E90-46FD-9635-A6BDBEE12A8C@oracle.com>
	<531A380B.7040608@oracle.com>
	<6ED7D40E-56A8-4EE3-A66D-A2132AB734EE@oracle.com>
	<531DD592.2050806@oracle.com>
	<B2A0256F-ECC4-4447-991A-D6F1D66E67A0@oracle.com>
Message-ID: <531F3293.5040701@oracle.com>

Doug,

On 03/10/2014 11:58 AM, Doug Simon wrote:
> So, I think we agree on the worthy goal of automatic GPU offload. I just think this is best done within a compilation. Assuming you still think the required analysis is best done outside of compilation, can you describe how it can be done (efficiently) and what mechanisms it would use?

I do not yet have the full algorithm / technique chalked out. 
GPU/non-host offload decision making at runtime is an area that I have 
been trying to experiment with and have been trying to understand in the 
context of a JVM. Roughly speaking, the idea is to recognize parallel 
application of lambda methods and offload such application to GPU - when 
deemed beneficial. More concretely, I am currently looking at the 
possibility of recognizing the characteristics of a stream pipeline by 
the VM runtime (assuming current Streams.parallel() - the parallel 
streams pipeline - implementation can be rendered for SIMD execution). I 
would like to see if I can use information such as size of data, 
composability of functions being applied in the pipeline (may be others, 
I do not know, yet) can be used to make the offload decision.

Bharadwaj


From doug.simon at oracle.com  Tue Mar 11 16:15:46 2014
From: doug.simon at oracle.com (Doug Simon)
Date: Tue, 11 Mar 2014 17:15:46 +0100
Subject: Moving GPU offload policy into Java sources
In-Reply-To: <531F3293.5040701@oracle.com>
References: <531A347B.4030906@oracle.com>
	<581921E1-8E90-46FD-9635-A6BDBEE12A8C@oracle.com>
	<531A380B.7040608@oracle.com>
	<6ED7D40E-56A8-4EE3-A66D-A2132AB734EE@oracle.com>
	<531DD592.2050806@oracle.com>
	<B2A0256F-ECC4-4447-991A-D6F1D66E67A0@oracle.com>
	<531F3293.5040701@oracle.com>
Message-ID: <E5090967-CA88-42A2-9173-61C74416DC55@oracle.com>


On Mar 11, 2014, at 4:58 PM, S. Bharadwaj Yadavalli <bharadwaj.yadavalli at oracle.com> wrote:

> Doug,
> 
> On 03/10/2014 11:58 AM, Doug Simon wrote:
>> So, I think we agree on the worthy goal of automatic GPU offload. I just think this is best done within a compilation. Assuming you still think the required analysis is best done outside of compilation, can you describe how it can be done (efficiently) and what mechanisms it would use?
> 
> I do not yet have the full algorithm / technique chalked out. GPU/non-host offload decision making at runtime is an area that I have been trying to experiment with and have been trying to understand in the context of a JVM. Roughly speaking, the idea is to recognize parallel application of lambda methods and offload such application to GPU - when deemed beneficial. More concretely, I am currently looking at the possibility of recognizing the characteristics of a stream pipeline by the VM runtime (assuming current Streams.parallel() - the parallel streams pipeline - implementation can be rendered for SIMD execution). I would like to see if I can use information such as size of data, composability of functions being applied in the pipeline (may be others, I do not know, yet) can be used to make the offload decision.

Thanks for the explanation. So, if I understand correctly, the decision making you describe would be done:

1. in modified library code (i.e., current Sumatra approach),
2. during compilation, or
3. class file loading

One place it should not be done is when deciding whether or not to compile a method as it would be too slow and would involve duplicating machinery used by one of the above. As such, we should still proceed with removing the GPU offloading decision logic from the logic of deciding whether or not to compile a method.

-Doug

From doug.simon at oracle.com  Mon Mar 17 10:43:10 2014
From: doug.simon at oracle.com (Doug Simon)
Date: Mon, 17 Mar 2014 11:43:10 +0100
Subject: Graal and Clojure
In-Reply-To: <5326354D.8010204@yahoo.com>
References: <1394063789.8387.YahooMailNeo@web122406.mail.ne1.yahoo.com>
	<E754FA57-3A90-4396-B83A-D8BF463DB469@oracle.com>
	<5326354D.8010204@yahoo.com>
Message-ID: <BC0A9313-D663-4B53-A7A5-4705EEFDBCDB@oracle.com>

Hi Julian,

In terms of what to test, I?ll leave it up to the Sumatra developers at AMD and Oracle to comment on what may be interesting.

As far as I know, no one in the Graal team is that familiar with Clojure. This actually makes the value of having Clojure tests integrated into our gate system somewhat problematic - debugging/fixing failures may be difficult. I?d suggest you report failures your discover on the sumatra-dev and graal-dev lists and we can resolve them that way for now.

In addition, for Graal we?re always interested in benchmark suites for JVM hosted languages. Can you recommend any well known and trusted Clojure benchmarks we should consider?

-Doug

On Mar 17, 2014, at 12:35 AM, Jules Gosnell <jules_gosnell at yahoo.com> wrote:

> Doug,
> 
> Sorry it has taken me a while to get back - I've been busy playing with code.
> 
> I currently have a small clojure testsuite up and running. I plan to broaden this and then think about providing it in a junit-compatible way to your project.
> 
> I still haven't made up my mind exactly what I should be testing. It seems unnecessary to duplicate all your tests in Clojure, since you are already guarding against regression in these areas - I think I will start out with a fairly general set of tests as I probe for particular problem areas for Clojure, into which I may dive for more detail - time will tell... if you have any ideas, please let me know.
> 
> I've checked my stuff in at github, if anyone is interested - here is the project:
> 
> https://github.com/JulesGosnell/clumatra
> 
> here is some clojure that I reverse engineered from GraalKernelTester:
> 
> https://github.com/JulesGosnell/clumatra/blob/master/src/clumatra/core.clj
> 
> and here are the first successful tests:
> 
> https://github.com/JulesGosnell/clumatra/blob/master/test/clumatra/core_test.clj
> 
> not very pretty yet - I am still feeling my way.
> 
> cheers
> 
> 
> Jules
> 
> 
> 
> 
> On 06/03/14 10:24, Doug Simon wrote:
>> Hi Julian,
>> 
>> This looks very interesting and will be an good alternative testing vector for the Sumatra work as it matures. If Clojure tests can somehow be made to run from Junit, then I?m sure we can try integrating it into our testing.
>> 
>> -Doug
>> 
>> On Mar 6, 2014, at 12:56 AM, Julian Gosnell <jules_gosnell at yahoo.com> wrote:
>> 
>>> Guys,
>>> 
>>> I just built the Java8 / Graal / Okra stack and managed to run some very simple Clojure copying the contents of one int array into another one on Okra.
>>> 
>>> https://groups.google.com/forum/#!topic/clojure/JpjK__NTR5Y
>>> 
>>> 
>>> I find the ramifications of this very exciting :-)
>>> 
>>> I understand that fn-ality is limited at the moment - but I am keen to keep testing and to help ensure early visibility of Clojure related issues to both communities - perhaps even the submission of a Clojure testsuite if Graal developers thought that useful.
>>> 
>>> I'd be very interested in your thoughts on Graal / Clojure.
>>> 
>>> regards,
>>> 
>>> 
>>> Jules
>>> 
>> 
>> 
> 


From eric.caspole at amd.com  Mon Mar 17 22:20:34 2014
From: eric.caspole at amd.com (Eric Caspole)
Date: Mon, 17 Mar 2014 18:20:34 -0400
Subject: Graal and Clojure
In-Reply-To: <BC0A9313-D663-4B53-A7A5-4705EEFDBCDB@oracle.com>
References: <1394063789.8387.YahooMailNeo@web122406.mail.ne1.yahoo.com>
	<E754FA57-3A90-4396-B83A-D8BF463DB469@oracle.com>
	<5326354D.8010204@yahoo.com>
	<BC0A9313-D663-4B53-A7A5-4705EEFDBCDB@oracle.com>
Message-ID: <53277532.3030807@amd.com>

Hi everybody,
I was setting up a new HSA system today and I installed Clumatra and got 
it working under the simulator. I have a real HSA system not too 
different from this:

http://code.google.com/p/aparapi/wiki/SettingUpLinuxHSAMachineForAparapi

In the clojure/lein/maven based system there is so much harness I can't 
immediately see how to switch in the real hardware version of Okra 
instead of the simulator Okra that gets installed into maven and 
exploded in /tmp when the tests run.

Jules, could you show how to bypass the harness etc to switch the Okra? 
Can I just run 1 test with a simple java command line? This makes it a 
lot easier to use a debugger which I will definitely want to do.

Otherwise this is a cool project and we already found 1 or 2 issues we 
can fix in HSAIL Graal related to your clojure tests.
Regards,
Eric


On 03/17/2014 06:43 AM, Doug Simon wrote:
> Hi Julian,
>
> In terms of what to test, I?ll leave it up to the Sumatra developers at AMD and Oracle to comment on what may be interesting.
>
> As far as I know, no one in the Graal team is that familiar with Clojure. This actually makes the value of having Clojure tests integrated into our gate system somewhat problematic - debugging/fixing failures may be difficult. I?d suggest you report failures your discover on the sumatra-dev and graal-dev lists and we can resolve them that way for now.
>
> In addition, for Graal we?re always interested in benchmark suites for JVM hosted languages. Can you recommend any well known and trusted Clojure benchmarks we should consider?
>
> -Doug
>
> On Mar 17, 2014, at 12:35 AM, Jules Gosnell <jules_gosnell at yahoo.com> wrote:
>
>> Doug,
>>
>> Sorry it has taken me a while to get back - I've been busy playing with code.
>>
>> I currently have a small clojure testsuite up and running. I plan to broaden this and then think about providing it in a junit-compatible way to your project.
>>
>> I still haven't made up my mind exactly what I should be testing. It seems unnecessary to duplicate all your tests in Clojure, since you are already guarding against regression in these areas - I think I will start out with a fairly general set of tests as I probe for particular problem areas for Clojure, into which I may dive for more detail - time will tell... if you have any ideas, please let me know.
>>
>> I've checked my stuff in at github, if anyone is interested - here is the project:
>>
>> https://github.com/JulesGosnell/clumatra
>>
>> here is some clojure that I reverse engineered from GraalKernelTester:
>>
>> https://github.com/JulesGosnell/clumatra/blob/master/src/clumatra/core.clj
>>
>> and here are the first successful tests:
>>
>> https://github.com/JulesGosnell/clumatra/blob/master/test/clumatra/core_test.clj
>>
>> not very pretty yet - I am still feeling my way.
>>
>> cheers
>>
>>
>> Jules
>>
>>
>>
>>
>> On 06/03/14 10:24, Doug Simon wrote:
>>> Hi Julian,
>>>
>>> This looks very interesting and will be an good alternative testing vector for the Sumatra work as it matures. If Clojure tests can somehow be made to run from Junit, then I?m sure we can try integrating it into our testing.
>>>
>>> -Doug
>>>
>>> On Mar 6, 2014, at 12:56 AM, Julian Gosnell <jules_gosnell at yahoo.com> wrote:
>>>
>>>> Guys,
>>>>
>>>> I just built the Java8 / Graal / Okra stack and managed to run some very simple Clojure copying the contents of one int array into another one on Okra.
>>>>
>>>> https://groups.google.com/forum/#!topic/clojure/JpjK__NTR5Y
>>>>
>>>>
>>>> I find the ramifications of this very exciting :-)
>>>>
>>>> I understand that fn-ality is limited at the moment - but I am keen to keep testing and to help ensure early visibility of Clojure related issues to both communities - perhaps even the submission of a Clojure testsuite if Graal developers thought that useful.
>>>>
>>>> I'd be very interested in your thoughts on Graal / Clojure.
>>>>
>>>> regards,
>>>>
>>>>
>>>> Jules
>>>>
>>>
>>>
>>
>
>


From juan.fumero at ed.ac.uk  Wed Mar 19 14:16:18 2014
From: juan.fumero at ed.ac.uk (Juan Jose Fumero)
Date: Wed, 19 Mar 2014 14:16:18 +0000
Subject: Nodes in Graal: ParameterNode
In-Reply-To: <CAGjk+z-qzgt=8EMK81kTUh6wFEZJ2ikqDS+dC8reMtthRcwVFg@mail.gmail.com>
References: <1395163053.29018.17.camel@gofio>
	<CAGjk+z-qzgt=8EMK81kTUh6wFEZJ2ikqDS+dC8reMtthRcwVFg@mail.gmail.com>
Message-ID: <1395238578.29018.44.camel@gofio>

Hello, 
  Thanks Gilles for your explanation. As you said, map in stream.util is
IntUnaryOperator. What I am using now is BiFunction<T,U,R>. Maybe I
should change to a specific one as IntBinaryOperator. 

More specifically I am trying is this (this is an example):

// Vector Addition
Stream<Integer,Integer> stream = new Stream<>();
stream.map((x,y) -> x + y).run(input);

This is now running in CPU with my library. This map does not have side
effects and each operation returns a new Stream. When the run is called,
the pipeline is executed. What I am trying is to generate OpenCL code
from the map lambda expression, but I need to "transform" or replace
Parameters x and y to Arrays for example. 

Is there other ways available in Graal?

Thanks
Juanjo

On Wed, 2014-03-19 at 11:49 +0100, Gilles Duboscq wrote:
> Hello,
> 
> The graph you present is the code for:
> 
> Integer foo(Integer x, Integer y) {
>     return x + y;
> }
> 
> There are no arrays involved and you can not force the parameters to
> be arrays, that would just not work.
> 
> If you want to work with integer streams you can use IntStream which
> will allow you to work without the boxing (even for the lambdas).
> I'm not sure what the example should do exactly since map seems to
> only accept unary functions.
> In any case, this lambda would be the kernel which is the "what to
> run" and thus does not/can not contain any information about "what to
> run *on*".
> This information can only be available in the code which is applying
> this lambda to some specific data.
> 
> Maybe you can give a small example of what you would like to achieve
> from java code to OpenCL code?
> 
> -Gilles
> 
> On Tue, Mar 18, 2014 at 6:17 PM, Juan Jose Fumero <juan.fumero at ed.ac.uk> wrote:
> > Hello,
> >    I am working with lambda expression on JDK8 and Graal. My question is
> > related with the creation of new nodes in the Graph to update or change
> > the information.
> >
> >
> > Given this lambda expression:
> >
> >     stream.map((x,y) -> x + y);
> >
> > The StructuredGraph is the following:
> >
> > 0|StartNode
> > 1|Parameter(0)
> > 2|Parameter(1)
> > 3|FrameState at 0
> > 4|MethodCallTarget
> > 5|Invoke#intValue
> > 6|FrameState at 4
> > 8|MethodCallTarget
> > 9|Invoke#intValue
> > 10|FrameState at 8
> > 12|+
> > 13|MethodCallTarget
> > 14|Invoke#valueOf
> > 15|FrameState at 12
> > 17|Return
> > 13|Invoke#valueOf
> >
> >
> > Parameter(0) and Parameter(1) are Objects in the moment that I get the
> > lambda expression. But later on, I know that could be Integer[],
> > Double[], etc.
> >
> > I would like to rewrite this part of the Graph with the new information.
> > Is there any utility in Graal to do that?
> >
> >
> > For instance: if I get the parameterNode from the previous graph:
> >
> > if (((ObjectStamp) parameterNode.stamp()).type().isArray()) {
> >  ...
> > }
> >
> >
> > The nodes Parameter(0) and Parameter(1) in the lambda expression are not
> > arrays. What I want to do is to change or update these nodes. What I am
> > using now is a new node (paramNode):
> >
> >
> > IntegerStamp integerStamp = new IntegerStamp(Kind.Int);
> > ParameterNode paramNode   = new ParameterNode(index, integerStamp);
> > newGraph.unique(paramNode);
> >
> > But I need also to store the array information (size and dimension). The
> > aim is facilitates the OpenCL code generation from this expression.
> >
> >
> > Many thanks
> > Juanjo
> >
> >
> > --
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> >
> 


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From jules_gosnell at yahoo.com  Tue Mar 18 22:47:42 2014
From: jules_gosnell at yahoo.com (Jules Gosnell)
Date: Tue, 18 Mar 2014 22:47:42 +0000
Subject: Graal and Clojure
In-Reply-To: <5328BA07.6030100@yahoo.com>
References: <1394063789.8387.YahooMailNeo@web122406.mail.ne1.yahoo.com>	<E754FA57-3A90-4396-B83A-D8BF463DB469@oracle.com>	<5326354D.8010204@yahoo.com>	<BC0A9313-D663-4B53-A7A5-4705EEFDBCDB@oracle.com>
	<53277532.3030807@amd.com> <5328BA07.6030100@yahoo.com>
Message-ID: <5328CD0E.2040206@yahoo.com>

Eric,

I've just added a -Dsumatra.verbose flag - it is set in the project.clj.

When set, Okra is asked to be verbose about the HSAIL generation and I 
also print the disassembled bytecode that Okra is about to compile. I 
have updated the readme accordingly.

Okra also logs whether it is running in simulated or native mode.

I'll give the JUnit / launch from CLI some thought now.

regards


Jules


On 18/03/14 21:26, Jules Gosnell wrote:
> Eric,
>
> Thanks for checking out Clumatra :-) :
>
>   https://github.com/JulesGosnell/clumatra
>
> I'm only just getting it off the ground at the moment, so I'm afraid
> that there is not much there yet.
>
> Its aim is to promote Clojure/Sumatra/Graal friendliness and I hope,
> ensure that Clojure can take advantage of all the benefits of HSA as
> soon as Java9 comes out.
>
> OK - that's the plug over with - now to your questions.
>
> 1. enabling a real 'finalizer' instead of the Okra simulator - Hmmm... I
> haven't bought my HSA box yet - it is still a wishlist on Amazon. So I
> have not been able to try this yet. However, I get the feeling from
> looking at the Okra src, that it might detect if the finalizer is
> available and switch over for you. The OkraContext has an isSimulator()
> method on it. So it must be able to run in simulated and some other mode
> ... I think the HSAIL->GPU finalizer is available in one of the AMD d/ls
> - you probably have it already... I'll log something about whether we
> are on a simulator or not in the test harness. - There might be a -X
> flag to the jvm to enable a finalizer...
>
> 2. I'll have to dig around in the Clojure test harness to see if it can
> be integrated with junit - otherwise I'll have to switch over. I can see
> the obvious benefit in being able to kick these tests off in your
> debugger. I'll look into it.
>
> 3. I'm glad that it is helping uncover issues. I have added a few more
> test today and will keep building. I'm looking forward to getting more
> and more of the stack runnning.
>
> I have a few thoughts / questions - if I may :-)
>
> Is there a web page on which you guys are ticking off each bytecode
> instruction as it is implemented ? This would be a help to me when I
> have a test that fails. I can look at the page to see if it should work
> or not.
>
> What might I reasonably expect to get working eventually ? I'm assuming
> that there are limits to what can be done on the GPU - I'm assuming that
> i/o for instance is not feasible etc... Is there some sort of a road map
> published anywhere that I could take a peek at ?
>
> I think that is enough for the moment,
>
> Thanks again, Eric, for your interest in Clumatra - I look forward to
> working with you guys to get Clojure and Graal to work well together.
>
>
> Jules
>
>
>
>
> On 17/03/14 22:20, Eric Caspole wrote:
>> Hi everybody,
>> I was setting up a new HSA system today and I installed Clumatra and got
>> it working under the simulator. I have a real HSA system not too
>> different from this:
>>
>> http://code.google.com/p/aparapi/wiki/SettingUpLinuxHSAMachineForAparapi
>>
>> In the clojure/lein/maven based system there is so much harness I can't
>> immediately see how to switch in the real hardware version of Okra
>> instead of the simulator Okra that gets installed into maven and
>> exploded in /tmp when the tests run.
>>
>> Jules, could you show how to bypass the harness etc to switch the Okra?
>> Can I just run 1 test with a simple java command line? This makes it a
>> lot easier to use a debugger which I will definitely want to do.
>>
>> Otherwise this is a cool project and we already found 1 or 2 issues we
>> can fix in HSAIL Graal related to your clojure tests.
>> Regards,
>> Eric
>>
>>
>>
>> On 03/17/2014 06:43 AM, Doug Simon wrote:
>>> Hi Julian,
>>>
>>> In terms of what to test, I?ll leave it up to the Sumatra developers
>>> at AMD and Oracle to comment on what may be interesting.
>>>
>>> As far as I know, no one in the Graal team is that familiar with
>>> Clojure. This actually makes the value of having Clojure tests
>>> integrated into our gate system somewhat problematic -
>>> debugging/fixing failures may be difficult. I?d suggest you report
>>> failures your discover on the sumatra-dev and graal-dev lists and we
>>> can resolve them that way for now.
>>>
>>> In addition, for Graal we?re always interested in benchmark suites for
>>> JVM hosted languages. Can you recommend any well known and trusted
>>> Clojure benchmarks we should consider?
>>>
>>> -Doug
>>>
>>> On Mar 17, 2014, at 12:35 AM, Jules Gosnell <jules_gosnell at yahoo.com>
>>> wrote:
>>>
>>>> Doug,
>>>>
>>>> Sorry it has taken me a while to get back - I've been busy playing
>>>> with code.
>>>>
>>>> I currently have a small clojure testsuite up and running. I plan to
>>>> broaden this and then think about providing it in a junit-compatible
>>>> way to your project.
>>>>
>>>> I still haven't made up my mind exactly what I should be testing. It
>>>> seems unnecessary to duplicate all your tests in Clojure, since you
>>>> are already guarding against regression in these areas - I think I
>>>> will start out with a fairly general set of tests as I probe for
>>>> particular problem areas for Clojure, into which I may dive for more
>>>> detail - time will tell... if you have any ideas, please let me know.
>>>>
>>>> I've checked my stuff in at github, if anyone is interested - here is
>>>> the project:
>>>>
>>>> https://github.com/JulesGosnell/clumatra
>>>>
>>>> here is some clojure that I reverse engineered from GraalKernelTester:
>>>>
>>>> https://github.com/JulesGosnell/clumatra/blob/master/src/clumatra/core.clj
>>>>
>>>>
>>>>
>>>> and here are the first successful tests:
>>>>
>>>> https://github.com/JulesGosnell/clumatra/blob/master/test/clumatra/core_test.clj
>>>>
>>>>
>>>>
>>>> not very pretty yet - I am still feeling my way.
>>>>
>>>> cheers
>>>>
>>>>
>>>> Jules
>>>>
>>>>
>>>>
>>>>
>>>> On 06/03/14 10:24, Doug Simon wrote:
>>>>> Hi Julian,
>>>>>
>>>>> This looks very interesting and will be an good alternative testing
>>>>> vector for the Sumatra work as it matures. If Clojure tests can
>>>>> somehow be made to run from Junit, then I?m sure we can try
>>>>> integrating it into our testing.
>>>>>
>>>>> -Doug
>>>>>
>>>>> On Mar 6, 2014, at 12:56 AM, Julian Gosnell
>>>>> <jules_gosnell at yahoo.com> wrote:
>>>>>
>>>>>> Guys,
>>>>>>
>>>>>> I just built the Java8 / Graal / Okra stack and managed to run some
>>>>>> very simple Clojure copying the contents of one int array into
>>>>>> another one on Okra.
>>>>>>
>>>>>> https://groups.google.com/forum/#!topic/clojure/JpjK__NTR5Y
>>>>>>
>>>>>>
>>>>>> I find the ramifications of this very exciting :-)
>>>>>>
>>>>>> I understand that fn-ality is limited at the moment - but I am keen
>>>>>> to keep testing and to help ensure early visibility of Clojure
>>>>>> related issues to both communities - perhaps even the submission of
>>>>>> a Clojure testsuite if Graal developers thought that useful.
>>>>>>
>>>>>> I'd be very interested in your thoughts on Graal / Clojure.
>>>>>>
>>>>>> regards,
>>>>>>
>>>>>>
>>>>>> Jules
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>


From jules_gosnell at yahoo.com  Thu Mar 20 21:27:17 2014
From: jules_gosnell at yahoo.com (Jules Gosnell)
Date: Thu, 20 Mar 2014 21:27:17 +0000
Subject: Graal and Clojure
In-Reply-To: <53277532.3030807@amd.com>
References: <1394063789.8387.YahooMailNeo@web122406.mail.ne1.yahoo.com>	<E754FA57-3A90-4396-B83A-D8BF463DB469@oracle.com>	<5326354D.8010204@yahoo.com>	<BC0A9313-D663-4B53-A7A5-4705EEFDBCDB@oracle.com>
	<53277532.3030807@amd.com>
Message-ID: <532B5D35.4010108@yahoo.com>

Eric,

I just checked support to Clumatra for running individual unit tests 
from the CLI - should be enough to get you set up with your debugger.

All details at the bottom of the README.

https://github.com/JulesGosnell/clumatra

I'm afraid you will need to install maven if you don't already have it - 
for some reason lein does not seem to have a test-jar target.

It is still a little rough around the edges, but better than nothing :-)

I also just bought my HSA h/w - should be up and running next week 
sometime. Did you figure out how to enable native-mode Okra ?

catch you later,


Jules


On 17/03/14 22:20, Eric Caspole wrote:
> Hi everybody,
> I was setting up a new HSA system today and I installed Clumatra and got
> it working under the simulator. I have a real HSA system not too
> different from this:
>
> http://code.google.com/p/aparapi/wiki/SettingUpLinuxHSAMachineForAparapi
>
> In the clojure/lein/maven based system there is so much harness I can't
> immediately see how to switch in the real hardware version of Okra
> instead of the simulator Okra that gets installed into maven and
> exploded in /tmp when the tests run.
>
> Jules, could you show how to bypass the harness etc to switch the Okra?
> Can I just run 1 test with a simple java command line? This makes it a
> lot easier to use a debugger which I will definitely want to do.
>
> Otherwise this is a cool project and we already found 1 or 2 issues we
> can fix in HSAIL Graal related to your clojure tests.
> Regards,
> Eric
>
>
>
> On 03/17/2014 06:43 AM, Doug Simon wrote:
>> Hi Julian,
>>
>> In terms of what to test, I?ll leave it up to the Sumatra developers
>> at AMD and Oracle to comment on what may be interesting.
>>
>> As far as I know, no one in the Graal team is that familiar with
>> Clojure. This actually makes the value of having Clojure tests
>> integrated into our gate system somewhat problematic -
>> debugging/fixing failures may be difficult. I?d suggest you report
>> failures your discover on the sumatra-dev and graal-dev lists and we
>> can resolve them that way for now.
>>
>> In addition, for Graal we?re always interested in benchmark suites for
>> JVM hosted languages. Can you recommend any well known and trusted
>> Clojure benchmarks we should consider?
>>
>> -Doug
>>
>> On Mar 17, 2014, at 12:35 AM, Jules Gosnell <jules_gosnell at yahoo.com>
>> wrote:
>>
>>> Doug,
>>>
>>> Sorry it has taken me a while to get back - I've been busy playing
>>> with code.
>>>
>>> I currently have a small clojure testsuite up and running. I plan to
>>> broaden this and then think about providing it in a junit-compatible
>>> way to your project.
>>>
>>> I still haven't made up my mind exactly what I should be testing. It
>>> seems unnecessary to duplicate all your tests in Clojure, since you
>>> are already guarding against regression in these areas - I think I
>>> will start out with a fairly general set of tests as I probe for
>>> particular problem areas for Clojure, into which I may dive for more
>>> detail - time will tell... if you have any ideas, please let me know.
>>>
>>> I've checked my stuff in at github, if anyone is interested - here is
>>> the project:
>>>
>>> https://github.com/JulesGosnell/clumatra
>>>
>>> here is some clojure that I reverse engineered from GraalKernelTester:
>>>
>>> https://github.com/JulesGosnell/clumatra/blob/master/src/clumatra/core.clj
>>>
>>>
>>> and here are the first successful tests:
>>>
>>> https://github.com/JulesGosnell/clumatra/blob/master/test/clumatra/core_test.clj
>>>
>>>
>>> not very pretty yet - I am still feeling my way.
>>>
>>> cheers
>>>
>>>
>>> Jules
>>>
>>>
>>>
>>>
>>> On 06/03/14 10:24, Doug Simon wrote:
>>>> Hi Julian,
>>>>
>>>> This looks very interesting and will be an good alternative testing
>>>> vector for the Sumatra work as it matures. If Clojure tests can
>>>> somehow be made to run from Junit, then I?m sure we can try
>>>> integrating it into our testing.
>>>>
>>>> -Doug
>>>>
>>>> On Mar 6, 2014, at 12:56 AM, Julian Gosnell
>>>> <jules_gosnell at yahoo.com> wrote:
>>>>
>>>>> Guys,
>>>>>
>>>>> I just built the Java8 / Graal / Okra stack and managed to run some
>>>>> very simple Clojure copying the contents of one int array into
>>>>> another one on Okra.
>>>>>
>>>>> https://groups.google.com/forum/#!topic/clojure/JpjK__NTR5Y
>>>>>
>>>>>
>>>>> I find the ramifications of this very exciting :-)
>>>>>
>>>>> I understand that fn-ality is limited at the moment - but I am keen
>>>>> to keep testing and to help ensure early visibility of Clojure
>>>>> related issues to both communities - perhaps even the submission of
>>>>> a Clojure testsuite if Graal developers thought that useful.
>>>>>
>>>>> I'd be very interested in your thoughts on Graal / Clojure.
>>>>>
>>>>> regards,
>>>>>
>>>>>
>>>>> Jules
>>>>>
>>>>
>>>>
>>>
>>
>>
>
>


From juan.fumero at ed.ac.uk  Mon Mar 31 17:41:25 2014
From: juan.fumero at ed.ac.uk (Juan Jose Fumero)
Date: Mon, 31 Mar 2014 18:41:25 +0100
Subject: Graph for lambda expression
Message-ID: <1396287685.4089.26.camel@gofio>

Hello, 
   
   I am working with some transformations in the lambda graph. I am
trying to create a new object in the lambda expression and return it.
The input of the lambda expression, let say, is the type A and the
return type is different (type B). The thing is I get different graphs
if I declare and create at least one object of type B before the
lambda. 

Here is a little testcase. 

     7	package com.gpu.stream.test;
     8	
     9	public class TestCase {
    10	    
    11	    public static void main(String []args) {
    12	        
    13	        Tuple1<Integer>[] tuple1 = new Tuple1[N];
    14	        // tuple1[0] = new Tuple1<>();    // --> here
    15	        
    16	        Stream<Integer> streamTuple1 = new Stream();
    17	        tuple1 = stream.map(
    18	             x -> {
    19	                    int i = x;
    20	                    Tuple1<Integer> t1 = new Tuple1<>();
    21	                    t1._1(x);
    22	                    return t1;
    23	           }).run(v1);
    24	    }
    25	}


Stream is a method which receives a Function<T,R> in jdk8. The input of
this function is an Integer and the output should be a Tuple1 object
created in the lambda expression. 

The class Tuple1 class is very simple: 

     7	package com.gpu.stream.test;
     8	public class Tuple1<T> {
     9	        public T _1;
    10	  
    11	        public T _1() {
    12	            return this._1;
    13	        }
    14	     
    15	        public void _1(T _1) {
    16	            this._1 = _1;
    17	        }
    18	    }


If I run the program TestCase and print the graph (note the line 14 is
commented), I get the following graph for the lambda expression:


0|StartNode
1|Parameter(0)
2|FrameState at 0
3|Deopt
4|IsNull
5|GuardingPi

I expected at least the NewInstanceNode and the ReturnNode. If I
uncomment the line 14 in TestCase, the graph is the following:

0|StartNode
1|Parameter(0)
2|FrameState at 0
3|NewInstance
4|Return
5|IsNull
6|GuardingPi
7|StoreField#_1
8|FrameState at 5
9|FrameState at 15


In this case I would say the graph is correct. It contains the
NewInstanceNode, ReturnNode and StoreFieldNodes. 


Is this the right behaviour? Why I need to create at least one object? 


Note: I am calling to the optimisations in Graal like Inlining, but if I
do not call to the optimisations, the result is the same. 


Any idea?

Thanks
Juanjo


-- 
PhD Student
University of Edinburgh


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From duboscq at ssw.jku.at  Mon Mar 31 22:05:45 2014
From: duboscq at ssw.jku.at (Gilles Duboscq)
Date: Tue, 1 Apr 2014 00:05:45 +0200
Subject: Graph for lambda expression
In-Reply-To: <CA+cQ+tRyYtWz5Mpht2UHORnLZ5b3xPdbv_xNpW2Hk3fN9SOjTw@mail.gmail.com>
References: <1396287685.4089.26.camel@gofio>
	<CA+cQ+tRyYtWz5Mpht2UHORnLZ5b3xPdbv_xNpW2Hk3fN9SOjTw@mail.gmail.com>
Message-ID: <CAGjk+z8f1R4s5h6qwhixME_5qYgDZGbD18uUVXV1aCzNN_r6OQ@mail.gmail.com>

Hello Juan,

I agree with Kris' diagnostic. The most likely problem is that some
type is not loaded yet when you compile this method.
You can try to run the method a few times normally before compiling it
to be sure everything is loaded.
You can also look at how we do our unit tests (especially in
GraalCompilerTest and JTTTest). In those tests we configure the
GraphBuilderPhase such that it eagerly loads classes to avoid this
kind of problems.

-Gilles

On Mon, Mar 31, 2014 at 7:58 PM, Krystal Mok <rednaxelafx at gmail.com> wrote:
> Hi Juan,
>
> Out of total ignorance of Graal and Sumatra, this looks like you're hitting
> a class initialization issue. If a class has not been initialized at the
> time Graal compiles a method, then any use of that class will result in a
> Deopt in Graal.
>
> If you create an instance of Tuple1 before the stream.map(), it'll ensure
> the Tuple1 class is initialized. You use other means to force
> initialization of Tuple1, too, e.g. Class.forName(Tuple1.getName()). Try it
> and see if it works the same. If it works, then it should be it.
>
> BTW, new Tuple1<>[] will NOT force initialization of the Tuple1 class.
>
> - Kris
>
>
> On Mon, Mar 31, 2014 at 10:41 AM, Juan Jose Fumero <juan.fumero at ed.ac.uk>
>  wrote:
>
>> Hello,
>>
>>    I am working with some transformations in the lambda graph. I am
>> trying to create a new object in the lambda expression and return it.
>> The input of the lambda expression, let say, is the type A and the
>> return type is different (type B). The thing is I get different graphs
>> if I declare and create at least one object of type B before the
>> lambda.
>>
>> Here is a little testcase.
>>
>>      7  package com.gpu.stream.test;
>>      8
>>      9  public class TestCase {
>>     10
>>     11      public static void main(String []args) {
>>     12
>>     13          Tuple1<Integer>[] tuple1 = new Tuple1[N];
>>     14          // tuple1[0] = new Tuple1<>();    // --> here
>>     15
>>     16          Stream<Integer> streamTuple1 = new Stream();
>>     17          tuple1 = stream.map(
>>     18               x -> {
>>     19                      int i = x;
>>     20                      Tuple1<Integer> t1 = new Tuple1<>();
>>     21                      t1._1(x);
>>     22                      return t1;
>>     23             }).run(v1);
>>     24      }
>>     25  }
>>
>>
>> Stream is a method which receives a Function<T,R> in jdk8. The input of
>> this function is an Integer and the output should be a Tuple1 object
>> created in the lambda expression.
>>
>> The class Tuple1 class is very simple:
>>
>>      7  package com.gpu.stream.test;
>>      8  public class Tuple1<T> {
>>      9          public T _1;
>>     10
>>     11          public T _1() {
>>     12              return this._1;
>>     13          }
>>     14
>>     15          public void _1(T _1) {
>>     16              this._1 = _1;
>>     17          }
>>     18      }
>>
>>
>> If I run the program TestCase and print the graph (note the line 14 is
>> commented), I get the following graph for the lambda expression:
>>
>>
>> 0|StartNode
>> 1|Parameter(0)
>> 2|FrameState at 0
>> 3|Deopt
>> 4|IsNull
>> 5|GuardingPi
>>
>> I expected at least the NewInstanceNode and the ReturnNode. If I
>> uncomment the line 14 in TestCase, the graph is the following:
>>
>> 0|StartNode
>> 1|Parameter(0)
>> 2|FrameState at 0
>> 3|NewInstance
>> 4|Return
>> 5|IsNull
>> 6|GuardingPi
>> 7|StoreField#_1
>> 8|FrameState at 5
>> 9|FrameState at 15
>>
>>
>> In this case I would say the graph is correct. It contains the
>> NewInstanceNode, ReturnNode and StoreFieldNodes.
>>
>>
>> Is this the right behaviour? Why I need to create at least one object?
>>
>>
>> Note: I am calling to the optimisations in Graal like Inlining, but if I
>> do not call to the optimisations, the result is the same.
>>
>>
>> Any idea?
>>
>> Thanks
>> Juanjo
>>
>>
>> --
>> PhD Student
>> University of Edinburgh
>>
>>
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>

From Eric.Caspole at amd.com  Mon Mar 31 22:40:35 2014
From: Eric.Caspole at amd.com (Caspole, Eric)
Date: Mon, 31 Mar 2014 22:40:35 +0000
Subject: Graal and Clojure
In-Reply-To: <532B5D35.4010108@yahoo.com>
References: <1394063789.8387.YahooMailNeo@web122406.mail.ne1.yahoo.com>
	<E754FA57-3A90-4396-B83A-D8BF463DB469@oracle.com>
	<5326354D.8010204@yahoo.com>
	<BC0A9313-D663-4B53-A7A5-4705EEFDBCDB@oracle.com>
	<53277532.3030807@amd.com>,<532B5D35.4010108@yahoo.com>
Message-ID: <CB75F466C5DA72408BB1A97B80B1CBC74E2ECAEF@satlexdag06.amd.com>

Hi Jules,
I updated everything and got Clumatra running on the real hardware today.

In project.clj, I used this:

  :jvm-opts ["-server" "-ea" "-esa" "-Xms1g" "-Xmx1g" 
             "-verbosegc" "-G:Log=CodeGen"
             "-XX:+GPUOffload" "-XX:+TraceGPUInteraction"]

Then make sure your /path/to/okra/dist/bin is in the PATH and LD_LIBRARY_PATH.

When it is using the simulator, you will see this:
[HSAIL] using _OKRA_SIM_LIB_PATH_=/tmp/okraresource.dir_2595501614013009358/libokra_x86_64.so

With hardware or external okra you will see this:

[HSAIL] library is libokra_x86_64.so

Regards,
Eric


________________________________________
From: Jules Gosnell [jules_gosnell at yahoo.com]
Sent: Thursday, March 20, 2014 5:27 PM
To: Caspole, Eric; graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
Subject: Re: Graal and Clojure

Eric,

I just checked support to Clumatra for running individual unit tests
from the CLI - should be enough to get you set up with your debugger.

All details at the bottom of the README.

https://github.com/JulesGosnell/clumatra

I'm afraid you will need to install maven if you don't already have it -
for some reason lein does not seem to have a test-jar target.

It is still a little rough around the edges, but better than nothing :-)

I also just bought my HSA h/w - should be up and running next week
sometime. Did you figure out how to enable native-mode Okra ?

catch you later,


Jules


On 17/03/14 22:20, Eric Caspole wrote:
> Hi everybody,
> I was setting up a new HSA system today and I installed Clumatra and got
> it working under the simulator. I have a real HSA system not too
> different from this:
>
> http://code.google.com/p/aparapi/wiki/SettingUpLinuxHSAMachineForAparapi
>
> In the clojure/lein/maven based system there is so much harness I can't
> immediately see how to switch in the real hardware version of Okra
> instead of the simulator Okra that gets installed into maven and
> exploded in /tmp when the tests run.
>
> Jules, could you show how to bypass the harness etc to switch the Okra?
> Can I just run 1 test with a simple java command line? This makes it a
> lot easier to use a debugger which I will definitely want to do.
>
> Otherwise this is a cool project and we already found 1 or 2 issues we
> can fix in HSAIL Graal related to your clojure tests.
> Regards,
> Eric
>
>
>
> On 03/17/2014 06:43 AM, Doug Simon wrote:
>> Hi Julian,
>>
>> In terms of what to test, I?ll leave it up to the Sumatra developers
>> at AMD and Oracle to comment on what may be interesting.
>>
>> As far as I know, no one in the Graal team is that familiar with
>> Clojure. This actually makes the value of having Clojure tests
>> integrated into our gate system somewhat problematic -
>> debugging/fixing failures may be difficult. I?d suggest you report
>> failures your discover on the sumatra-dev and graal-dev lists and we
>> can resolve them that way for now.
>>
>> In addition, for Graal we?re always interested in benchmark suites for
>> JVM hosted languages. Can you recommend any well known and trusted
>> Clojure benchmarks we should consider?
>>
>> -Doug
>>
>> On Mar 17, 2014, at 12:35 AM, Jules Gosnell <jules_gosnell at yahoo.com>
>> wrote:
>>
>>> Doug,
>>>
>>> Sorry it has taken me a while to get back - I've been busy playing
>>> with code.
>>>
>>> I currently have a small clojure testsuite up and running. I plan to
>>> broaden this and then think about providing it in a junit-compatible
>>> way to your project.
>>>
>>> I still haven't made up my mind exactly what I should be testing. It
>>> seems unnecessary to duplicate all your tests in Clojure, since you
>>> are already guarding against regression in these areas - I think I
>>> will start out with a fairly general set of tests as I probe for
>>> particular problem areas for Clojure, into which I may dive for more
>>> detail - time will tell... if you have any ideas, please let me know.
>>>
>>> I've checked my stuff in at github, if anyone is interested - here is
>>> the project:
>>>
>>> https://github.com/JulesGosnell/clumatra
>>>
>>> here is some clojure that I reverse engineered from GraalKernelTester:
>>>
>>> https://github.com/JulesGosnell/clumatra/blob/master/src/clumatra/core.clj
>>>
>>>
>>> and here are the first successful tests:
>>>
>>> https://github.com/JulesGosnell/clumatra/blob/master/test/clumatra/core_test.clj
>>>
>>>
>>> not very pretty yet - I am still feeling my way.
>>>
>>> cheers
>>>
>>>
>>> Jules
>>>
>>>
>>>
>>>
>>> On 06/03/14 10:24, Doug Simon wrote:
>>>> Hi Julian,
>>>>
>>>> This looks very interesting and will be an good alternative testing
>>>> vector for the Sumatra work as it matures. If Clojure tests can
>>>> somehow be made to run from Junit, then I?m sure we can try
>>>> integrating it into our testing.
>>>>
>>>> -Doug
>>>>
>>>> On Mar 6, 2014, at 12:56 AM, Julian Gosnell
>>>> <jules_gosnell at yahoo.com> wrote:
>>>>
>>>>> Guys,
>>>>>
>>>>> I just built the Java8 / Graal / Okra stack and managed to run some
>>>>> very simple Clojure copying the contents of one int array into
>>>>> another one on Okra.
>>>>>
>>>>> https://groups.google.com/forum/#!topic/clojure/JpjK__NTR5Y
>>>>>
>>>>>
>>>>> I find the ramifications of this very exciting :-)
>>>>>
>>>>> I understand that fn-ality is limited at the moment - but I am keen
>>>>> to keep testing and to help ensure early visibility of Clojure
>>>>> related issues to both communities - perhaps even the submission of
>>>>> a Clojure testsuite if Graal developers thought that useful.
>>>>>
>>>>> I'd be very interested in your thoughts on Graal / Clojure.
>>>>>
>>>>> regards,
>>>>>
>>>>>
>>>>> Jules
>>>>>
>>>>
>>>>
>>>
>>
>>
>
>


From rednaxelafx at gmail.com  Mon Mar 31 17:58:24 2014
From: rednaxelafx at gmail.com (Krystal Mok)
Date: Mon, 31 Mar 2014 10:58:24 -0700
Subject: Graph for lambda expression
In-Reply-To: <1396287685.4089.26.camel@gofio>
References: <1396287685.4089.26.camel@gofio>
Message-ID: <CA+cQ+tRyYtWz5Mpht2UHORnLZ5b3xPdbv_xNpW2Hk3fN9SOjTw@mail.gmail.com>

Hi Juan,

Out of total ignorance of Graal and Sumatra, this looks like you're hitting
a class initialization issue. If a class has not been initialized at the
time Graal compiles a method, then any use of that class will result in a
Deopt in Graal.

If you create an instance of Tuple1 before the stream.map(), it'll ensure
the Tuple1 class is initialized. You use other means to force
initialization of Tuple1, too, e.g. Class.forName(Tuple1.getName()). Try it
and see if it works the same. If it works, then it should be it.

BTW, new Tuple1<>[] will NOT force initialization of the Tuple1 class.

- Kris


On Mon, Mar 31, 2014 at 10:41 AM, Juan Jose Fumero <juan.fumero at ed.ac.uk>
 wrote:

> Hello,
>
>    I am working with some transformations in the lambda graph. I am
> trying to create a new object in the lambda expression and return it.
> The input of the lambda expression, let say, is the type A and the
> return type is different (type B). The thing is I get different graphs
> if I declare and create at least one object of type B before the
> lambda.
>
> Here is a little testcase.
>
>      7  package com.gpu.stream.test;
>      8
>      9  public class TestCase {
>     10
>     11      public static void main(String []args) {
>     12
>     13          Tuple1<Integer>[] tuple1 = new Tuple1[N];
>     14          // tuple1[0] = new Tuple1<>();    // --> here
>     15
>     16          Stream<Integer> streamTuple1 = new Stream();
>     17          tuple1 = stream.map(
>     18               x -> {
>     19                      int i = x;
>     20                      Tuple1<Integer> t1 = new Tuple1<>();
>     21                      t1._1(x);
>     22                      return t1;
>     23             }).run(v1);
>     24      }
>     25  }
>
>
> Stream is a method which receives a Function<T,R> in jdk8. The input of
> this function is an Integer and the output should be a Tuple1 object
> created in the lambda expression.
>
> The class Tuple1 class is very simple:
>
>      7  package com.gpu.stream.test;
>      8  public class Tuple1<T> {
>      9          public T _1;
>     10
>     11          public T _1() {
>     12              return this._1;
>     13          }
>     14
>     15          public void _1(T _1) {
>     16              this._1 = _1;
>     17          }
>     18      }
>
>
> If I run the program TestCase and print the graph (note the line 14 is
> commented), I get the following graph for the lambda expression:
>
>
> 0|StartNode
> 1|Parameter(0)
> 2|FrameState at 0
> 3|Deopt
> 4|IsNull
> 5|GuardingPi
>
> I expected at least the NewInstanceNode and the ReturnNode. If I
> uncomment the line 14 in TestCase, the graph is the following:
>
> 0|StartNode
> 1|Parameter(0)
> 2|FrameState at 0
> 3|NewInstance
> 4|Return
> 5|IsNull
> 6|GuardingPi
> 7|StoreField#_1
> 8|FrameState at 5
> 9|FrameState at 15
>
>
> In this case I would say the graph is correct. It contains the
> NewInstanceNode, ReturnNode and StoreFieldNodes.
>
>
> Is this the right behaviour? Why I need to create at least one object?
>
>
> Note: I am calling to the optimisations in Graal like Inlining, but if I
> do not call to the optimisations, the result is the same.
>
>
> Any idea?
>
> Thanks
> Juanjo
>
>
> --
> PhD Student
> University of Edinburgh
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>