From tom.deneau at amd.com  Thu Nov  7 13:22:41 2013
From: tom.deneau at amd.com (Deneau, Tom)
Date: Thu, 7 Nov 2013 21:22:41 +0000
Subject: non-foreign-call  tlab refill from hsail
In-Reply-To: <33F60C70-33F8-40F6-AE82-DD817293856C@oracle.com>
References: <BC97738F8E7C8742BABED7F06FB9DF915323A4FB@SATLEXDAG01.amd.com>
	<33F60C70-33F8-40F6-AE82-DD817293856C@oracle.com>
Message-ID: <BC97738F8E7C8742BABED7F06FB9DF9153240941@SATLEXDAG01.amd.com>

Doug --

Trying to see if I understand how these pieces fit together.

NewObjectSnippets.allocateInstance makes a call to
NewInstanceStubCall.call if the current tlab does not have enough
room.

NewInstanceStubCall.call looks up the ForeignCallLinkage and finds
that it is not a simple foreign call to a specific foreign call
address (its address is 0) but instead has a stub associated with it.
I think this association came from the call to

        link(new NewInstanceStub(providers, target, registerStubCall(NEW_INSTANCE, REEXECUTABLE, NOT_LEAF, ANY_LOCATION)));

in HotSpotHostForeingCallsProvider.java

So when we try to finalizeAddress for the ForeignCallLinkage we end up
compiling this stub.

The stub is a SnippetStub implemented with the snippet called
"newInstance" in NewInstanceStub.java and tries to get a new tlab
using CAS operations.  If this stub cannot get a new tlab it makes a
"real" foreign call using
        newInstanceC(NEW_INSTANCE_C, thread(), hub);

which ends up going to the graalRuntime::new_instance


-----Original Message-----
From: Doug Simon [mailto:doug.simon at oracle.com] 
Sent: Tuesday, October 22, 2013 4:42 AM
To: Deneau, Tom
Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
Subject: Re: non-foreign-call tlab refill from hsail


On Oct 22, 2013, at 12:18 AM, "Deneau, Tom" <tom.deneau at amd.com> wrote:

> We are experimenting with object (and array) allocation from an HSA device (using graal for
> 
> the HSAIL codegen).  Where we are now:
> 
> 
> 
>   * the hsa workitems are using TLABs from "donor threads" who exist
> 
>     just to supply TLABs and don't do any allocation themselves.
> 
> 
> 
>   * To reduce the number of donor threads required, a TLAB can be
> 
>     used by more than one workitem, in which case the workitems use
> 
>     HSAIL atomic_add instructions to bump the tlab top pointer.
> 
> 
> 
>      * the HSAIL backend has its own fastpath allocation snippets
> 
>        which generate the HSAIL atomic_add instructions which
> 
>        override the snippets in NewObjectSnippets.java
> 
> 
> 
> Some junit tests have been written and pass which allocate objects, or arrays of primitives, or arrays of objects.
> 
> 
> 
> All the above only works for the fastpath case, i.e., if there is indeed enough space in the donor TLAB.  We realize there are other cases:
> 
> 
> 
>   a) not enough space in current TLAB but ability to allocate a new TLAB.
> 
> 
> 
>   b) not able to allocate a new TLAB, GC required.
> 
> 
> 
> For only case a) above, we would like to experiment with grabbing the new TLAB from HSAIL without making a "foreign call" to the VM.  From the hotspot code, I assume the logic required is what one sees in
> 
>    mutableSpace::cas_allocate(size_t size) at least for the non-G1 case.

When the NewInstanceStub fails to allocate a new TLAB, it calls out to GraalRuntime::new_instance (in graalRuntime.cpp).

> Some of this non-foreign-call allocation logic appears to exist in the Snippet called NewInstanceStub.newInstance (as opposed to NewObjectSnippets.allocateInstance snippet which is what we are currently overriding).  This comments for this snippet say
> 
>   "Re-attempts allocation after an initial TLAB allocation failed or
> 
>   was skipped (e.g., due to * -XX:-UseTLAB)."
> 
> 
> 
> Is this NewInstanceStub.newInstance snippet actually used anywhere in the current graal framework.

Yes, you can see a call to NewInstanceStubCall in NewObjectSnippets.allocateInstance.

>  Is this a starting point we could use to get a non-foreign-call TLAB refill working?

Yes. Note that this call *is* a foreign call (see the javadoc for ForeignCallDescriptor).

-Doug


From doug.simon at oracle.com  Thu Nov  7 14:03:37 2013
From: doug.simon at oracle.com (Doug Simon)
Date: Thu, 7 Nov 2013 23:03:37 +0100
Subject: non-foreign-call  tlab refill from hsail
In-Reply-To: <BC97738F8E7C8742BABED7F06FB9DF9153240941@SATLEXDAG01.amd.com>
References: <BC97738F8E7C8742BABED7F06FB9DF915323A4FB@SATLEXDAG01.amd.com>
	<33F60C70-33F8-40F6-AE82-DD817293856C@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153240941@SATLEXDAG01.amd.com>
Message-ID: <D286B71F-BA73-488E-B5A1-54DF766C0CFB@oracle.com>

That is a very correct summary of the way it works!

On Nov 7, 2013, at 10:22 PM, Deneau, Tom <tom.deneau at amd.com> wrote:

> Doug --
> 
> Trying to see if I understand how these pieces fit together.
> 
> NewObjectSnippets.allocateInstance makes a call to
> NewInstanceStubCall.call if the current tlab does not have enough
> room.
> 
> NewInstanceStubCall.call looks up the ForeignCallLinkage and finds
> that it is not a simple foreign call to a specific foreign call
> address (its address is 0) but instead has a stub associated with it.
> I think this association came from the call to
> 
>        link(new NewInstanceStub(providers, target, registerStubCall(NEW_INSTANCE, REEXECUTABLE, NOT_LEAF, ANY_LOCATION)));
> 
> in HotSpotHostForeingCallsProvider.java
> 
> So when we try to finalizeAddress for the ForeignCallLinkage we end up
> compiling this stub.
> 
> The stub is a SnippetStub implemented with the snippet called
> "newInstance" in NewInstanceStub.java and tries to get a new tlab
> using CAS operations.  If this stub cannot get a new tlab it makes a
> "real" foreign call using
>        newInstanceC(NEW_INSTANCE_C, thread(), hub);
> 
> which ends up going to the graalRuntime::new_instance
> 
> 
> -----Original Message-----
> From: Doug Simon [mailto:doug.simon at oracle.com] 
> Sent: Tuesday, October 22, 2013 4:42 AM
> To: Deneau, Tom
> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
> Subject: Re: non-foreign-call tlab refill from hsail
> 
> 
> On Oct 22, 2013, at 12:18 AM, "Deneau, Tom" <tom.deneau at amd.com> wrote:
> 
>> We are experimenting with object (and array) allocation from an HSA device (using graal for
>> 
>> the HSAIL codegen).  Where we are now:
>> 
>> 
>> 
>>  * the hsa workitems are using TLABs from "donor threads" who exist
>> 
>>    just to supply TLABs and don't do any allocation themselves.
>> 
>> 
>> 
>>  * To reduce the number of donor threads required, a TLAB can be
>> 
>>    used by more than one workitem, in which case the workitems use
>> 
>>    HSAIL atomic_add instructions to bump the tlab top pointer.
>> 
>> 
>> 
>>     * the HSAIL backend has its own fastpath allocation snippets
>> 
>>       which generate the HSAIL atomic_add instructions which
>> 
>>       override the snippets in NewObjectSnippets.java
>> 
>> 
>> 
>> Some junit tests have been written and pass which allocate objects, or arrays of primitives, or arrays of objects.
>> 
>> 
>> 
>> All the above only works for the fastpath case, i.e., if there is indeed enough space in the donor TLAB.  We realize there are other cases:
>> 
>> 
>> 
>>  a) not enough space in current TLAB but ability to allocate a new TLAB.
>> 
>> 
>> 
>>  b) not able to allocate a new TLAB, GC required.
>> 
>> 
>> 
>> For only case a) above, we would like to experiment with grabbing the new TLAB from HSAIL without making a "foreign call" to the VM.  From the hotspot code, I assume the logic required is what one sees in
>> 
>>   mutableSpace::cas_allocate(size_t size) at least for the non-G1 case.
> 
> When the NewInstanceStub fails to allocate a new TLAB, it calls out to GraalRuntime::new_instance (in graalRuntime.cpp).
> 
>> Some of this non-foreign-call allocation logic appears to exist in the Snippet called NewInstanceStub.newInstance (as opposed to NewObjectSnippets.allocateInstance snippet which is what we are currently overriding).  This comments for this snippet say
>> 
>>  "Re-attempts allocation after an initial TLAB allocation failed or
>> 
>>  was skipped (e.g., due to * -XX:-UseTLAB)."
>> 
>> 
>> 
>> Is this NewInstanceStub.newInstance snippet actually used anywhere in the current graal framework.
> 
> Yes, you can see a call to NewInstanceStubCall in NewObjectSnippets.allocateInstance.
> 
>> Is this a starting point we could use to get a non-foreign-call TLAB refill working?
> 
> Yes. Note that this call *is* a foreign call (see the javadoc for ForeignCallDescriptor).
> 
> -Doug
> 


From tom.deneau at amd.com  Thu Nov  7 14:13:49 2013
From: tom.deneau at amd.com (Deneau, Tom)
Date: Thu, 7 Nov 2013 22:13:49 +0000
Subject: non-foreign-call  tlab refill from hsail
In-Reply-To: <D286B71F-BA73-488E-B5A1-54DF766C0CFB@oracle.com>
References: <BC97738F8E7C8742BABED7F06FB9DF915323A4FB@SATLEXDAG01.amd.com>
	<33F60C70-33F8-40F6-AE82-DD817293856C@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153240941@SATLEXDAG01.amd.com>
	<D286B71F-BA73-488E-B5A1-54DF766C0CFB@oracle.com>
Message-ID: <BC97738F8E7C8742BABED7F06FB9DF9153240A4C@SATLEXDAG01.amd.com>

So I was trying to understand why the NewInstanceStub.newInstance Stub code
was not just included in the original NewObjectSnippet.allocateInstance snippet.

-- Tom


-----Original Message-----
From: Doug Simon [mailto:doug.simon at oracle.com] 
Sent: Thursday, November 07, 2013 4:04 PM
To: Deneau, Tom
Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
Subject: Re: non-foreign-call tlab refill from hsail

That is a very correct summary of the way it works!

On Nov 7, 2013, at 10:22 PM, Deneau, Tom <tom.deneau at amd.com> wrote:

> Doug --
> 
> Trying to see if I understand how these pieces fit together.
> 
> NewObjectSnippets.allocateInstance makes a call to 
> NewInstanceStubCall.call if the current tlab does not have enough 
> room.
> 
> NewInstanceStubCall.call looks up the ForeignCallLinkage and finds 
> that it is not a simple foreign call to a specific foreign call 
> address (its address is 0) but instead has a stub associated with it.
> I think this association came from the call to
> 
>        link(new NewInstanceStub(providers, target, 
> registerStubCall(NEW_INSTANCE, REEXECUTABLE, NOT_LEAF, 
> ANY_LOCATION)));
> 
> in HotSpotHostForeingCallsProvider.java
> 
> So when we try to finalizeAddress for the ForeignCallLinkage we end up 
> compiling this stub.
> 
> The stub is a SnippetStub implemented with the snippet called 
> "newInstance" in NewInstanceStub.java and tries to get a new tlab 
> using CAS operations.  If this stub cannot get a new tlab it makes a 
> "real" foreign call using
>        newInstanceC(NEW_INSTANCE_C, thread(), hub);
> 
> which ends up going to the graalRuntime::new_instance
> 
> 
> -----Original Message-----
> From: Doug Simon [mailto:doug.simon at oracle.com]
> Sent: Tuesday, October 22, 2013 4:42 AM
> To: Deneau, Tom
> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
> Subject: Re: non-foreign-call tlab refill from hsail
> 
> 
> On Oct 22, 2013, at 12:18 AM, "Deneau, Tom" <tom.deneau at amd.com> wrote:
> 
>> We are experimenting with object (and array) allocation from an HSA 
>> device (using graal for
>> 
>> the HSAIL codegen).  Where we are now:
>> 
>> 
>> 
>>  * the hsa workitems are using TLABs from "donor threads" who exist
>> 
>>    just to supply TLABs and don't do any allocation themselves.
>> 
>> 
>> 
>>  * To reduce the number of donor threads required, a TLAB can be
>> 
>>    used by more than one workitem, in which case the workitems use
>> 
>>    HSAIL atomic_add instructions to bump the tlab top pointer.
>> 
>> 
>> 
>>     * the HSAIL backend has its own fastpath allocation snippets
>> 
>>       which generate the HSAIL atomic_add instructions which
>> 
>>       override the snippets in NewObjectSnippets.java
>> 
>> 
>> 
>> Some junit tests have been written and pass which allocate objects, or arrays of primitives, or arrays of objects.
>> 
>> 
>> 
>> All the above only works for the fastpath case, i.e., if there is indeed enough space in the donor TLAB.  We realize there are other cases:
>> 
>> 
>> 
>>  a) not enough space in current TLAB but ability to allocate a new TLAB.
>> 
>> 
>> 
>>  b) not able to allocate a new TLAB, GC required.
>> 
>> 
>> 
>> For only case a) above, we would like to experiment with grabbing the 
>> new TLAB from HSAIL without making a "foreign call" to the VM.  From 
>> the hotspot code, I assume the logic required is what one sees in
>> 
>>   mutableSpace::cas_allocate(size_t size) at least for the non-G1 case.
> 
> When the NewInstanceStub fails to allocate a new TLAB, it calls out to GraalRuntime::new_instance (in graalRuntime.cpp).
> 
>> Some of this non-foreign-call allocation logic appears to exist in 
>> the Snippet called NewInstanceStub.newInstance (as opposed to 
>> NewObjectSnippets.allocateInstance snippet which is what we are 
>> currently overriding).  This comments for this snippet say
>> 
>>  "Re-attempts allocation after an initial TLAB allocation failed or
>> 
>>  was skipped (e.g., due to * -XX:-UseTLAB)."
>> 
>> 
>> 
>> Is this NewInstanceStub.newInstance snippet actually used anywhere in the current graal framework.
> 
> Yes, you can see a call to NewInstanceStubCall in NewObjectSnippets.allocateInstance.
> 
>> Is this a starting point we could use to get a non-foreign-call TLAB refill working?
> 
> Yes. Note that this call *is* a foreign call (see the javadoc for ForeignCallDescriptor).
> 
> -Doug
> 


From doug.simon at oracle.com  Thu Nov  7 14:15:12 2013
From: doug.simon at oracle.com (Doug Simon)
Date: Thu, 7 Nov 2013 23:15:12 +0100
Subject: non-foreign-call  tlab refill from hsail
In-Reply-To: <BC97738F8E7C8742BABED7F06FB9DF9153240A4C@SATLEXDAG01.amd.com>
References: <BC97738F8E7C8742BABED7F06FB9DF915323A4FB@SATLEXDAG01.amd.com>
	<33F60C70-33F8-40F6-AE82-DD817293856C@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153240941@SATLEXDAG01.amd.com>
	<D286B71F-BA73-488E-B5A1-54DF766C0CFB@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153240A4C@SATLEXDAG01.amd.com>
Message-ID: <36423EAD-CDB9-4F3B-BDD4-E210C5EE3FB5@oracle.com>

Because it is slow (well, medium) path code that we don?t want to inline at every allocation site.

On Nov 7, 2013, at 11:13 PM, Deneau, Tom <tom.deneau at amd.com> wrote:

> So I was trying to understand why the NewInstanceStub.newInstance Stub code
> was not just included in the original NewObjectSnippet.allocateInstance snippet.
> 
> -- Tom
> 
> 
> -----Original Message-----
> From: Doug Simon [mailto:doug.simon at oracle.com] 
> Sent: Thursday, November 07, 2013 4:04 PM
> To: Deneau, Tom
> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
> Subject: Re: non-foreign-call tlab refill from hsail
> 
> That is a very correct summary of the way it works!
> 
> On Nov 7, 2013, at 10:22 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> 
>> Doug --
>> 
>> Trying to see if I understand how these pieces fit together.
>> 
>> NewObjectSnippets.allocateInstance makes a call to 
>> NewInstanceStubCall.call if the current tlab does not have enough 
>> room.
>> 
>> NewInstanceStubCall.call looks up the ForeignCallLinkage and finds 
>> that it is not a simple foreign call to a specific foreign call 
>> address (its address is 0) but instead has a stub associated with it.
>> I think this association came from the call to
>> 
>>       link(new NewInstanceStub(providers, target, 
>> registerStubCall(NEW_INSTANCE, REEXECUTABLE, NOT_LEAF, 
>> ANY_LOCATION)));
>> 
>> in HotSpotHostForeingCallsProvider.java
>> 
>> So when we try to finalizeAddress for the ForeignCallLinkage we end up 
>> compiling this stub.
>> 
>> The stub is a SnippetStub implemented with the snippet called 
>> "newInstance" in NewInstanceStub.java and tries to get a new tlab 
>> using CAS operations.  If this stub cannot get a new tlab it makes a 
>> "real" foreign call using
>>       newInstanceC(NEW_INSTANCE_C, thread(), hub);
>> 
>> which ends up going to the graalRuntime::new_instance
>> 
>> 
>> -----Original Message-----
>> From: Doug Simon [mailto:doug.simon at oracle.com]
>> Sent: Tuesday, October 22, 2013 4:42 AM
>> To: Deneau, Tom
>> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
>> Subject: Re: non-foreign-call tlab refill from hsail
>> 
>> 
>> On Oct 22, 2013, at 12:18 AM, "Deneau, Tom" <tom.deneau at amd.com> wrote:
>> 
>>> We are experimenting with object (and array) allocation from an HSA 
>>> device (using graal for
>>> 
>>> the HSAIL codegen).  Where we are now:
>>> 
>>> 
>>> 
>>> * the hsa workitems are using TLABs from "donor threads" who exist
>>> 
>>>   just to supply TLABs and don't do any allocation themselves.
>>> 
>>> 
>>> 
>>> * To reduce the number of donor threads required, a TLAB can be
>>> 
>>>   used by more than one workitem, in which case the workitems use
>>> 
>>>   HSAIL atomic_add instructions to bump the tlab top pointer.
>>> 
>>> 
>>> 
>>>    * the HSAIL backend has its own fastpath allocation snippets
>>> 
>>>      which generate the HSAIL atomic_add instructions which
>>> 
>>>      override the snippets in NewObjectSnippets.java
>>> 
>>> 
>>> 
>>> Some junit tests have been written and pass which allocate objects, or arrays of primitives, or arrays of objects.
>>> 
>>> 
>>> 
>>> All the above only works for the fastpath case, i.e., if there is indeed enough space in the donor TLAB.  We realize there are other cases:
>>> 
>>> 
>>> 
>>> a) not enough space in current TLAB but ability to allocate a new TLAB.
>>> 
>>> 
>>> 
>>> b) not able to allocate a new TLAB, GC required.
>>> 
>>> 
>>> 
>>> For only case a) above, we would like to experiment with grabbing the 
>>> new TLAB from HSAIL without making a "foreign call" to the VM.  From 
>>> the hotspot code, I assume the logic required is what one sees in
>>> 
>>>  mutableSpace::cas_allocate(size_t size) at least for the non-G1 case.
>> 
>> When the NewInstanceStub fails to allocate a new TLAB, it calls out to GraalRuntime::new_instance (in graalRuntime.cpp).
>> 
>>> Some of this non-foreign-call allocation logic appears to exist in 
>>> the Snippet called NewInstanceStub.newInstance (as opposed to 
>>> NewObjectSnippets.allocateInstance snippet which is what we are 
>>> currently overriding).  This comments for this snippet say
>>> 
>>> "Re-attempts allocation after an initial TLAB allocation failed or
>>> 
>>> was skipped (e.g., due to * -XX:-UseTLAB)."
>>> 
>>> 
>>> 
>>> Is this NewInstanceStub.newInstance snippet actually used anywhere in the current graal framework.
>> 
>> Yes, you can see a call to NewInstanceStubCall in NewObjectSnippets.allocateInstance.
>> 
>>> Is this a starting point we could use to get a non-foreign-call TLAB refill working?
>> 
>> Yes. Note that this call *is* a foreign call (see the javadoc for ForeignCallDescriptor).
>> 
>> -Doug
>> 
> 
> 
> 


From tom.deneau at amd.com  Thu Nov  7 14:21:40 2013
From: tom.deneau at amd.com (Deneau, Tom)
Date: Thu, 7 Nov 2013 22:21:40 +0000
Subject: non-foreign-call  tlab refill from hsail
In-Reply-To: <36423EAD-CDB9-4F3B-BDD4-E210C5EE3FB5@oracle.com>
References: <BC97738F8E7C8742BABED7F06FB9DF915323A4FB@SATLEXDAG01.amd.com>
	<33F60C70-33F8-40F6-AE82-DD817293856C@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153240941@SATLEXDAG01.amd.com>
	<D286B71F-BA73-488E-B5A1-54DF766C0CFB@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153240A4C@SATLEXDAG01.amd.com>
	<36423EAD-CDB9-4F3B-BDD4-E210C5EE3FB5@oracle.com>
Message-ID: <BC97738F8E7C8742BABED7F06FB9DF9153240A77@SATLEXDAG01.amd.com>

Are snippets required to inline all their calls?  
Or alternatively is there no way to annotate that a method should not be inlined?

-- Tom


-----Original Message-----
From: Doug Simon [mailto:doug.simon at oracle.com] 
Sent: Thursday, November 07, 2013 4:15 PM
To: Deneau, Tom
Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
Subject: Re: non-foreign-call tlab refill from hsail

Because it is slow (well, medium) path code that we don't want to inline at every allocation site.

On Nov 7, 2013, at 11:13 PM, Deneau, Tom <tom.deneau at amd.com> wrote:

> So I was trying to understand why the NewInstanceStub.newInstance Stub 
> code was not just included in the original NewObjectSnippet.allocateInstance snippet.
> 
> -- Tom
> 
> 
> -----Original Message-----
> From: Doug Simon [mailto:doug.simon at oracle.com]
> Sent: Thursday, November 07, 2013 4:04 PM
> To: Deneau, Tom
> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
> Subject: Re: non-foreign-call tlab refill from hsail
> 
> That is a very correct summary of the way it works!
> 
> On Nov 7, 2013, at 10:22 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> 
>> Doug --
>> 
>> Trying to see if I understand how these pieces fit together.
>> 
>> NewObjectSnippets.allocateInstance makes a call to 
>> NewInstanceStubCall.call if the current tlab does not have enough 
>> room.
>> 
>> NewInstanceStubCall.call looks up the ForeignCallLinkage and finds 
>> that it is not a simple foreign call to a specific foreign call 
>> address (its address is 0) but instead has a stub associated with it.
>> I think this association came from the call to
>> 
>>       link(new NewInstanceStub(providers, target, 
>> registerStubCall(NEW_INSTANCE, REEXECUTABLE, NOT_LEAF, 
>> ANY_LOCATION)));
>> 
>> in HotSpotHostForeingCallsProvider.java
>> 
>> So when we try to finalizeAddress for the ForeignCallLinkage we end 
>> up compiling this stub.
>> 
>> The stub is a SnippetStub implemented with the snippet called 
>> "newInstance" in NewInstanceStub.java and tries to get a new tlab 
>> using CAS operations.  If this stub cannot get a new tlab it makes a 
>> "real" foreign call using
>>       newInstanceC(NEW_INSTANCE_C, thread(), hub);
>> 
>> which ends up going to the graalRuntime::new_instance
>> 
>> 
>> -----Original Message-----
>> From: Doug Simon [mailto:doug.simon at oracle.com]
>> Sent: Tuesday, October 22, 2013 4:42 AM
>> To: Deneau, Tom
>> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
>> Subject: Re: non-foreign-call tlab refill from hsail
>> 
>> 
>> On Oct 22, 2013, at 12:18 AM, "Deneau, Tom" <tom.deneau at amd.com> wrote:
>> 
>>> We are experimenting with object (and array) allocation from an HSA 
>>> device (using graal for
>>> 
>>> the HSAIL codegen).  Where we are now:
>>> 
>>> 
>>> 
>>> * the hsa workitems are using TLABs from "donor threads" who exist
>>> 
>>>   just to supply TLABs and don't do any allocation themselves.
>>> 
>>> 
>>> 
>>> * To reduce the number of donor threads required, a TLAB can be
>>> 
>>>   used by more than one workitem, in which case the workitems use
>>> 
>>>   HSAIL atomic_add instructions to bump the tlab top pointer.
>>> 
>>> 
>>> 
>>>    * the HSAIL backend has its own fastpath allocation snippets
>>> 
>>>      which generate the HSAIL atomic_add instructions which
>>> 
>>>      override the snippets in NewObjectSnippets.java
>>> 
>>> 
>>> 
>>> Some junit tests have been written and pass which allocate objects, or arrays of primitives, or arrays of objects.
>>> 
>>> 
>>> 
>>> All the above only works for the fastpath case, i.e., if there is indeed enough space in the donor TLAB.  We realize there are other cases:
>>> 
>>> 
>>> 
>>> a) not enough space in current TLAB but ability to allocate a new TLAB.
>>> 
>>> 
>>> 
>>> b) not able to allocate a new TLAB, GC required.
>>> 
>>> 
>>> 
>>> For only case a) above, we would like to experiment with grabbing 
>>> the new TLAB from HSAIL without making a "foreign call" to the VM.  
>>> From the hotspot code, I assume the logic required is what one sees 
>>> in
>>> 
>>>  mutableSpace::cas_allocate(size_t size) at least for the non-G1 case.
>> 
>> When the NewInstanceStub fails to allocate a new TLAB, it calls out to GraalRuntime::new_instance (in graalRuntime.cpp).
>> 
>>> Some of this non-foreign-call allocation logic appears to exist in 
>>> the Snippet called NewInstanceStub.newInstance (as opposed to 
>>> NewObjectSnippets.allocateInstance snippet which is what we are 
>>> currently overriding).  This comments for this snippet say
>>> 
>>> "Re-attempts allocation after an initial TLAB allocation failed or
>>> 
>>> was skipped (e.g., due to * -XX:-UseTLAB)."
>>> 
>>> 
>>> 
>>> Is this NewInstanceStub.newInstance snippet actually used anywhere in the current graal framework.
>> 
>> Yes, you can see a call to NewInstanceStubCall in NewObjectSnippets.allocateInstance.
>> 
>>> Is this a starting point we could use to get a non-foreign-call TLAB refill working?
>> 
>> Yes. Note that this call *is* a foreign call (see the javadoc for ForeignCallDescriptor).
>> 
>> -Doug
>> 
> 
> 
> 


From doug.simon at oracle.com  Thu Nov  7 14:38:14 2013
From: doug.simon at oracle.com (Doug Simon)
Date: Thu, 7 Nov 2013 23:38:14 +0100
Subject: non-foreign-call  tlab refill from hsail
In-Reply-To: <BC97738F8E7C8742BABED7F06FB9DF9153240A77@SATLEXDAG01.amd.com>
References: <BC97738F8E7C8742BABED7F06FB9DF915323A4FB@SATLEXDAG01.amd.com>
	<33F60C70-33F8-40F6-AE82-DD817293856C@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153240941@SATLEXDAG01.amd.com>
	<D286B71F-BA73-488E-B5A1-54DF766C0CFB@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153240A4C@SATLEXDAG01.amd.com>
	<36423EAD-CDB9-4F3B-BDD4-E210C5EE3FB5@oracle.com>
	<BC97738F8E7C8742BABED7F06FB9DF9153240A77@SATLEXDAG01.amd.com>
Message-ID: <C29CF685-913A-408F-9263-C007E8DFB7F0@oracle.com>


On Nov 7, 2013, at 11:21 PM, Deneau, Tom <tom.deneau at amd.com> wrote:

> Are snippets required to inline all their calls?  

Generally speaking, yes.

> Or alternatively is there no way to annotate that a method should not be inlined?

You can use the Snippet.SnippetInliningPolicy class to control inlining during snippet preparation.

-Doug

> -----Original Message-----
> From: Doug Simon [mailto:doug.simon at oracle.com] 
> Sent: Thursday, November 07, 2013 4:15 PM
> To: Deneau, Tom
> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
> Subject: Re: non-foreign-call tlab refill from hsail
> 
> Because it is slow (well, medium) path code that we don't want to inline at every allocation site.
> 
> On Nov 7, 2013, at 11:13 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> 
>> So I was trying to understand why the NewInstanceStub.newInstance Stub 
>> code was not just included in the original NewObjectSnippet.allocateInstance snippet.
>> 
>> -- Tom
>> 
>> 
>> -----Original Message-----
>> From: Doug Simon [mailto:doug.simon at oracle.com]
>> Sent: Thursday, November 07, 2013 4:04 PM
>> To: Deneau, Tom
>> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
>> Subject: Re: non-foreign-call tlab refill from hsail
>> 
>> That is a very correct summary of the way it works!
>> 
>> On Nov 7, 2013, at 10:22 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>> 
>>> Doug --
>>> 
>>> Trying to see if I understand how these pieces fit together.
>>> 
>>> NewObjectSnippets.allocateInstance makes a call to 
>>> NewInstanceStubCall.call if the current tlab does not have enough 
>>> room.
>>> 
>>> NewInstanceStubCall.call looks up the ForeignCallLinkage and finds 
>>> that it is not a simple foreign call to a specific foreign call 
>>> address (its address is 0) but instead has a stub associated with it.
>>> I think this association came from the call to
>>> 
>>>      link(new NewInstanceStub(providers, target, 
>>> registerStubCall(NEW_INSTANCE, REEXECUTABLE, NOT_LEAF, 
>>> ANY_LOCATION)));
>>> 
>>> in HotSpotHostForeingCallsProvider.java
>>> 
>>> So when we try to finalizeAddress for the ForeignCallLinkage we end 
>>> up compiling this stub.
>>> 
>>> The stub is a SnippetStub implemented with the snippet called 
>>> "newInstance" in NewInstanceStub.java and tries to get a new tlab 
>>> using CAS operations.  If this stub cannot get a new tlab it makes a 
>>> "real" foreign call using
>>>      newInstanceC(NEW_INSTANCE_C, thread(), hub);
>>> 
>>> which ends up going to the graalRuntime::new_instance
>>> 
>>> 
>>> -----Original Message-----
>>> From: Doug Simon [mailto:doug.simon at oracle.com]
>>> Sent: Tuesday, October 22, 2013 4:42 AM
>>> To: Deneau, Tom
>>> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
>>> Subject: Re: non-foreign-call tlab refill from hsail
>>> 
>>> 
>>> On Oct 22, 2013, at 12:18 AM, "Deneau, Tom" <tom.deneau at amd.com> wrote:
>>> 
>>>> We are experimenting with object (and array) allocation from an HSA 
>>>> device (using graal for
>>>> 
>>>> the HSAIL codegen).  Where we are now:
>>>> 
>>>> 
>>>> 
>>>> * the hsa workitems are using TLABs from "donor threads" who exist
>>>> 
>>>>  just to supply TLABs and don't do any allocation themselves.
>>>> 
>>>> 
>>>> 
>>>> * To reduce the number of donor threads required, a TLAB can be
>>>> 
>>>>  used by more than one workitem, in which case the workitems use
>>>> 
>>>>  HSAIL atomic_add instructions to bump the tlab top pointer.
>>>> 
>>>> 
>>>> 
>>>>   * the HSAIL backend has its own fastpath allocation snippets
>>>> 
>>>>     which generate the HSAIL atomic_add instructions which
>>>> 
>>>>     override the snippets in NewObjectSnippets.java
>>>> 
>>>> 
>>>> 
>>>> Some junit tests have been written and pass which allocate objects, or arrays of primitives, or arrays of objects.
>>>> 
>>>> 
>>>> 
>>>> All the above only works for the fastpath case, i.e., if there is indeed enough space in the donor TLAB.  We realize there are other cases:
>>>> 
>>>> 
>>>> 
>>>> a) not enough space in current TLAB but ability to allocate a new TLAB.
>>>> 
>>>> 
>>>> 
>>>> b) not able to allocate a new TLAB, GC required.
>>>> 
>>>> 
>>>> 
>>>> For only case a) above, we would like to experiment with grabbing 
>>>> the new TLAB from HSAIL without making a "foreign call" to the VM.  
>>>> From the hotspot code, I assume the logic required is what one sees 
>>>> in
>>>> 
>>>> mutableSpace::cas_allocate(size_t size) at least for the non-G1 case.
>>> 
>>> When the NewInstanceStub fails to allocate a new TLAB, it calls out to GraalRuntime::new_instance (in graalRuntime.cpp).
>>> 
>>>> Some of this non-foreign-call allocation logic appears to exist in 
>>>> the Snippet called NewInstanceStub.newInstance (as opposed to 
>>>> NewObjectSnippets.allocateInstance snippet which is what we are 
>>>> currently overriding).  This comments for this snippet say
>>>> 
>>>> "Re-attempts allocation after an initial TLAB allocation failed or
>>>> 
>>>> was skipped (e.g., due to * -XX:-UseTLAB)."
>>>> 
>>>> 
>>>> 
>>>> Is this NewInstanceStub.newInstance snippet actually used anywhere in the current graal framework.
>>> 
>>> Yes, you can see a call to NewInstanceStubCall in NewObjectSnippets.allocateInstance.
>>> 
>>>> Is this a starting point we could use to get a non-foreign-call TLAB refill working?
>>> 
>>> Yes. Note that this call *is* a foreign call (see the javadoc for ForeignCallDescriptor).
>>> 
>>> -Doug
>>> 
>> 
>> 
>> 
> 
> 
> 


From jjfumero at gmail.com  Thu Nov 14 07:54:09 2013
From: jjfumero at gmail.com (=?ISO-8859-1?Q?Juan_Jos=E9_Fumero_Alfonso?=)
Date: Thu, 14 Nov 2013 15:54:09 +0000
Subject: Inlining in Graal
Message-ID: <CAPLUgPF7TfbJYMCo7gFnkoRWdOttsqE5=2jB5xTHY50R2wZ8Ug@mail.gmail.com>

Hi,
  I am working on Inlining with Graal in my backend. Using the Log
information:

-XX:+BootstrapGraal -G:Log=InliningDecisions -XX:+PrintCompilation

I get this:

[thread:1] scope:
  [thread:1] scope: Inlining
    [thread:1] scope: Inlining.Inlining
      [thread:1] scope: Inlining.Inlining.InliningDecisions

* inlining MapEngineOpenCL.kernelComputation at 11: exact
com.edinburgh.parallel.map.MapEngineOpenCL.checkInline(int):int: trivial
(relevance=1.000000, probability=1.000000, bonus=1.000000, nodes=6) *

With this code:

    // Testing
    public void kernelComputation(int[] input, int[] output) {
        for (int i = 0; i < input.length; i++) {
*            output[i] = checkInline(input[i]);*
        }
    }

    private int checkInline(int a) {
        return 10 + a;
    }

So in this case the inlining is working and also the result in my program
is correct. But what I want to do is the following:


    public void kernelComputation(T[] input, T[] output) {
        for (int i = 0; i < input.length; i++) {
            *output[i] = mInterface.f(input[i]);*
        }
    }


And the method f is defined dynamically by the user:


Integer[] result = Map.apply(data, new MapInterface<Integer>() {
            @Override


*            public Integer f(Integer data) {                return data *
data;             }*

        }).executeOpenCL();


What I want to do is the inlining of the f method which is defined
dynamically by the user.

[thread:1] scope:
  [thread:1] scope: Inlining
    [thread:1] scope: Inlining.InliningDecisions
   * not inlining MapEngineOpenCL.kernelComputation at 14:
com.edinburgh.parallel.map.MapInterface.f(Object):Object (0 bytes): no type
profile exists*


I do not understand why "no type profile exists". Any idea about this? I am
doing the inlining as Graal shows in the tests
(com.oracle.graal.compiler.test.inlining).

Thank you very much
Juanjo

From jjfumero at gmail.com  Fri Nov 15 03:14:54 2013
From: jjfumero at gmail.com (=?ISO-8859-1?Q?Juan_Jos=E9_Fumero_Alfonso?=)
Date: Fri, 15 Nov 2013 11:14:54 +0000
Subject: Inlining in Graal
In-Reply-To: <5285FE5D0200000600105E13@gwia.im.jku.at>
References: <CAPLUgPF7TfbJYMCo7gFnkoRWdOttsqE5=2jB5xTHY50R2wZ8Ug@mail.gmail.com>
	<5285FE5D0200000600105E13@gwia.im.jku.at>
Message-ID: <CAPLUgPH_GekhyeVzfvA93_0OfZnWqSg9Pc2w-1iiCbu=zPO3HA@mail.gmail.com>

Hi Luka,
  I do not understand your second approach. In which moment the inlining is
applied if I replace it with a constant?

Is possible to force the inlining? I mean, a way to disable the heuristic
for a while and create the inline.

Thanks
Juanjo


2013/11/15 Lukas Stadler <Lukas.Stadler at jku.at>

> Hi Juan,
>
>
> in your working example, the checkInline call is a private method that
> can statically be bound to a specific target, without the help of
> profiling information. Thus, it can be inlined.
> The non-working example, however, contains an virtual/interface call,
> which means that the compiler needs a hint from somewhere as to which
> method could be the target of this call.
> The inlining heuristics in Graal assume that all important code has been
> running in the interpreter for a while (i.e. thousands of times).
> I guess that you are compiling code that has never been executed in the
> interpreter?
>
>
> One solution would be to warm up the method in the interpreter. However,
> the profiling information accumulates, so that the call site will become
> polymorphic and again stop inlining at some point.
> In your case, the closure that does the computation is known at the time
> of compilation (I guess). You could replace the parameter with a
> constant during your compilation process, which should allow the
> compiler to devirtualize the call.
> Is that a strategy that could work for you?
>
>
> - Lukas
>
> >>> Juan Jos? Fumero Alfonso<jjfumero at gmail.com> 11/14/13 4:55 PM >>>
> Hi,
>   I am working on Inlining with Graal in my backend. Using the Log
> information:
>
> -XX:+BootstrapGraal -G:Log=InliningDecisions -XX:+PrintCompilation
>
> I get this:
>
> [thread:1] scope:
>   [thread:1] scope: Inlining
>     [thread:1] scope: Inlining.Inlining
>       [thread:1] scope: Inlining.Inlining.InliningDecisions
>
> * inlining MapEngineOpenCL.kernelComputation at 11: exact
> com.edinburgh.parallel.map.MapEngineOpenCL.checkInline(int):int: trivial
> (relevance=1.000000, probability=1.000000, bonus=1.000000, nodes=6) *
>
> With this code:
>
>     // Testing
>     public void kernelComputation(int[] input, int[] output) {
>         for (int i = 0; i < input.length; i++) {
> *            output[i] = checkInline(input[i]);*
>         }
>     }
>
>     private int checkInline(int a) {
>         return 10 + a;
>     }
>
> So in this case the inlining is working and also the result in my
> program
> is correct. But what I want to do is the following:
>
>
>     public void kernelComputation(T[] input, T[] output) {
>         for (int i = 0; i < input.length; i++) {
>             *output[i] = mInterface.f(input[i]);*
>         }
>     }
>
>
> And the method f is defined dynamically by the user:
>
>
> Integer[] result = Map.apply(data, new MapInterface<Integer>() {
>             @Override
>
>
> *            public Integer f(Integer data) {                return data
> *
> data;             }*
>
>         }).executeOpenCL();
>
>
> What I want to do is the inlining of the f method which is defined
> dynamically by the user.
>
> [thread:1] scope:
>   [thread:1] scope: Inlining
>     [thread:1] scope: Inlining.InliningDecisions
>    * not inlining MapEngineOpenCL.kernelComputation at 14:
> com.edinburgh.parallel.map.MapInterface.f(Object):Object (0 bytes): no
> type
> profile exists*
>
>
> I do not understand why "no type profile exists". Any idea about this? I
> am
> doing the inlining as Graal shows in the tests
> (com.oracle.graal.compiler.test.inlining).
>
> Thank you very much
> Juanjo
>
>
>
>
>

From Lukas.Stadler at jku.at  Fri Nov 15 01:58:37 2013
From: Lukas.Stadler at jku.at (Lukas Stadler)
Date: Fri, 15 Nov 2013 10:58:37 +0100
Subject: Inlining in Graal
In-Reply-To: <CAPLUgPF7TfbJYMCo7gFnkoRWdOttsqE5=2jB5xTHY50R2wZ8Ug@mail.gmail.com>
References: <CAPLUgPF7TfbJYMCo7gFnkoRWdOttsqE5=2jB5xTHY50R2wZ8Ug@mail.gmail.com>
Message-ID: <5285FE5D0200000600105E13@gwia.im.jku.at>

Hi Juan,


in your working example, the checkInline call is a private method that
can statically be bound to a specific target, without the help of
profiling information. Thus, it can be inlined.
The non-working example, however, contains an virtual/interface call,
which means that the compiler needs a hint from somewhere as to which
method could be the target of this call.
The inlining heuristics in Graal assume that all important code has been
running in the interpreter for a while (i.e. thousands of times).
I guess that you are compiling code that has never been executed in the
interpreter?


One solution would be to warm up the method in the interpreter. However,
the profiling information accumulates, so that the call site will become
polymorphic and again stop inlining at some point.
In your case, the closure that does the computation is known at the time
of compilation (I guess). You could replace the parameter with a
constant during your compilation process, which should allow the
compiler to devirtualize the call.
Is that a strategy that could work for you?


- Lukas

>>> Juan Jos? Fumero Alfonso<jjfumero at gmail.com> 11/14/13 4:55 PM >>>
Hi,
  I am working on Inlining with Graal in my backend. Using the Log
information:

-XX:+BootstrapGraal -G:Log=InliningDecisions -XX:+PrintCompilation

I get this:

[thread:1] scope:
  [thread:1] scope: Inlining
    [thread:1] scope: Inlining.Inlining
      [thread:1] scope: Inlining.Inlining.InliningDecisions

* inlining MapEngineOpenCL.kernelComputation at 11: exact
com.edinburgh.parallel.map.MapEngineOpenCL.checkInline(int):int: trivial
(relevance=1.000000, probability=1.000000, bonus=1.000000, nodes=6) *

With this code:

    // Testing
    public void kernelComputation(int[] input, int[] output) {
        for (int i = 0; i < input.length; i++) {
*            output[i] = checkInline(input[i]);*
        }
    }

    private int checkInline(int a) {
        return 10 + a;
    }

So in this case the inlining is working and also the result in my
program
is correct. But what I want to do is the following:


    public void kernelComputation(T[] input, T[] output) {
        for (int i = 0; i < input.length; i++) {
            *output[i] = mInterface.f(input[i]);*
        }
    }


And the method f is defined dynamically by the user:


Integer[] result = Map.apply(data, new MapInterface<Integer>() {
            @Override


*            public Integer f(Integer data) {                return data
*
data;             }*

        }).executeOpenCL();


What I want to do is the inlining of the f method which is defined
dynamically by the user.

[thread:1] scope:
  [thread:1] scope: Inlining
    [thread:1] scope: Inlining.InliningDecisions
   * not inlining MapEngineOpenCL.kernelComputation at 14:
com.edinburgh.parallel.map.MapInterface.f(Object):Object (0 bytes): no
type
profile exists*


I do not understand why "no type profile exists". Any idea about this? I
am
doing the inlining as Graal shows in the tests
(com.oracle.graal.compiler.test.inlining).

Thank you very much
Juanjo