From christophe.harle at amd.com Thu Nov 8 06:18:40 2012 From: christophe.harle at amd.com (Harle, Christophe) Date: Thu, 8 Nov 2012 14:18:40 +0000 Subject: Syncup call: review contribution proposals and technical issues Message-ID: All: We propose setting up an initial Sumatra project sync-up call next week to review near term contribution plans and discuss outstanding technical issues . If you are interested in attending this call, reply to this thread indicating which of the following proposed dates/time you prefer: Nov 13 8-10am PST or Nov 14 8-10am PST; we will choose the date that gather the largest number of votes,. In the spirit of seeding the discussion for this initial call, the AMD team got together and would like to propose the following initial milestone for Sumatra. This proposed milestone involves moving enough of Aparapi's implementation down into the Project Lambda enabled JVM to allow Lambda enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). Whilst this effort would constrain demos/examples to Aparapi's existing programming model constraints (parallel primitive arrays only) it would act as a 'tracer bullet' through the code base, allowing us to familiarize ourselves with the various major JVM components that we think will to impacted via a Sumatra implementation. Maybe others might like to use this discussion thread to offer other milestone proposals which we could also discuss in the upcoming call. Regards, Christophe From Gary.Frost at amd.com Thu Nov 8 10:39:36 2012 From: Gary.Frost at amd.com (Frost, Gary) Date: Thu, 8 Nov 2012 18:39:36 +0000 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: References: Message-ID: +1 Nov 13 8-10am PST This is a great way to sync up and swap ideas for initial project milestones and research activities. Gary -----Original Message----- From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-bounces at openjdk.java.net] On Behalf Of Harle, Christophe Sent: Thursday, November 08, 2012 8:19 AM To: sumatra-dev at openjdk.java.net Subject: Syncup call: review contribution proposals and technical issues All: We propose setting up an initial Sumatra project sync-up call next week to review near term contribution plans and discuss outstanding technical issues . If you are interested in attending this call, reply to this thread indicating which of the following proposed dates/time you prefer: Nov 13 8-10am PST or Nov 14 8-10am PST; we will choose the date that gather the largest number of votes,. In the spirit of seeding the discussion for this initial call, the AMD team got together and would like to propose the following initial milestone for Sumatra. This proposed milestone involves moving enough of Aparapi's implementation down into the Project Lambda enabled JVM to allow Lambda enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). Whilst this effort would constrain demos/examples to Aparapi's existing programming model constraints (parallel primitive arrays only) it would act as a 'tracer bullet' through the code base, allowing us to familiarize ourselves with the various major JVM components that we think will to impacted via a Sumatra implementation. Maybe others might like to use this discussion thread to offer other milestone proposals which we could also discuss in the upcoming call. Regards, Christophe From Ryan.LaMothe at pnnl.gov Thu Nov 8 11:13:48 2012 From: Ryan.LaMothe at pnnl.gov (LaMothe, Ryan R) Date: Thu, 8 Nov 2012 11:13:48 -0800 Subject: Syncup call: review contribution proposals and technical issues Message-ID: <0003FD2B9D9E364CBFD03658C2991E6301AC94556C8C@EMAIL04.pnl.gov> I am only available on 11/14 __________________________________________________ Ryan LaMothe -----Original Message----- From: Frost, Gary [Gary.Frost at amd.com] Sent: Thursday, November 08, 2012 10:41 AM Pacific Standard Time To: Harle, Christophe; sumatra-dev at openjdk.java.net Subject: RE: Syncup call: review contribution proposals and technical issues +1 Nov 13 8-10am PST This is a great way to sync up and swap ideas for initial project milestones and research activities. Gary -----Original Message----- From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-bounces at openjdk.java.net] On Behalf Of Harle, Christophe Sent: Thursday, November 08, 2012 8:19 AM To: sumatra-dev at openjdk.java.net Subject: Syncup call: review contribution proposals and technical issues All: We propose setting up an initial Sumatra project sync-up call next week to review near term contribution plans and discuss outstanding technical issues . If you are interested in attending this call, reply to this thread indicating which of the following proposed dates/time you prefer: Nov 13 8-10am PST or Nov 14 8-10am PST; we will choose the date that gather the largest number of votes,. In the spirit of seeding the discussion for this initial call, the AMD team got together and would like to propose the following initial milestone for Sumatra. This proposed milestone involves moving enough of Aparapi's implementation down into the Project Lambda enabled JVM to allow Lambda enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). Whilst this effort would constrain demos/examples to Aparapi's existing programming model constraints (parallel primitive arrays only) it would act as a 'tracer bullet' through the code base, allowing us to familiarize ourselves with the various major JVM components that we think will to impacted via a Sumatra implementation. Maybe others might like to use this discussion thread to offer other milestone proposals which we could also discuss in the upcoming call. Regards, Christophe From bharadwaj.yadavalli at oracle.com Thu Nov 8 11:47:15 2012 From: bharadwaj.yadavalli at oracle.com (Bharadwaj Yadavalli) Date: Thu, 08 Nov 2012 14:47:15 -0500 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: References: Message-ID: <509C0C43.90001@oracle.com> Either day will work for me. Thanks, Bharadwaj On 11/8/2012 9:18 AM, Harle, Christophe wrote: > All: > > We propose setting up an initial Sumatra project sync-up call next week to review near term contribution plans and discuss outstanding technical issues . > If you are interested in attending this call, reply to this thread indicating which of the following proposed dates/time you prefer: Nov 13 8-10am PST or Nov 14 8-10am PST; we will choose the date that gather the largest number of votes,. > > In the spirit of seeding the discussion for this initial call, the AMD team got together and would like to propose the following initial milestone for Sumatra. > > This proposed milestone involves moving enough of Aparapi's implementation down into the Project Lambda enabled JVM to allow Lambda enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). Whilst this effort would constrain demos/examples to Aparapi's existing programming model constraints (parallel primitive arrays only) it would act as a 'tracer bullet' through the code base, allowing us to familiarize ourselves with the various major JVM components that we think will to impacted via a Sumatra implementation. > > Maybe others might like to use this discussion thread to offer other milestone proposals which we could also discuss in the upcoming call. > > Regards, > > Christophe From doug.simon at oracle.com Thu Nov 8 06:46:52 2012 From: doug.simon at oracle.com (Doug Simon @ Oracle) Date: Thu, 8 Nov 2012 15:46:52 +0100 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: <873EA866-2024-4589-AB05-C4F54581F6AC@oracle.com> References: <873EA866-2024-4589-AB05-C4F54581F6AC@oracle.com> Message-ID: <539F6CE1-A2F2-4460-824C-CFD1C7E5A83C@oracle.com> I'd be interested in listening in on the call as we're also interesting in the compiling for the GPU in the Graal[1] project. Either date or time work for me. -Doug [1] http://openjdk.java.net/projects/graal/ On Nov 8, 2012, at 3:27 PM, Thomas Wuerthinger wrote: > On Nov 8, 2012, at 3:18 PM, "Harle, Christophe" wrote: > >> All: >> >> We propose setting up an initial Sumatra project sync-up call next week to review near term contribution plans and discuss outstanding technical issues . >> If you are interested in attending this call, reply to this thread indicating which of the following proposed dates/time you prefer: Nov 13 8-10am PST or Nov 14 8-10am PST; we will choose the date that gather the largest number of votes,. >> >> In the spirit of seeding the discussion for this initial call, the AMD team got together and would like to propose the following initial milestone for Sumatra. >> >> This proposed milestone involves moving enough of Aparapi's implementation down into the Project Lambda enabled JVM to allow Lambda enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). Whilst this effort would constrain demos/examples to Aparapi's existing programming model constraints (parallel primitive arrays only) it would act as a 'tracer bullet' through the code base, allowing us to familiarize ourselves with the various major JVM components that we think will to impacted via a Sumatra implementation. >> >> Maybe others might like to use this discussion thread to offer other milestone proposals which we could also discuss in the upcoming call. >> >> Regards, >> >> Christophe > From eric.caspole at amd.com Thu Nov 8 12:01:40 2012 From: eric.caspole at amd.com (Eric Caspole) Date: Thu, 8 Nov 2012 15:01:40 -0500 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: References: Message-ID: <509C0FA4.1050403@amd.com> Either day is OK with me. Eric On 11/08/2012 09:18 AM, Harle, Christophe wrote: > All: > > We propose setting up an initial Sumatra project sync-up call next week to review near term contribution plans and discuss outstanding technical issues . > If you are interested in attending this call, reply to this thread indicating which of the following proposed dates/time you prefer: Nov 13 8-10am PST or Nov 14 8-10am PST; we will choose the date that gather the largest number of votes,. > > In the spirit of seeding the discussion for this initial call, the AMD team got together and would like to propose the following initial milestone for Sumatra. > > This proposed milestone involves moving enough of Aparapi's implementation down into the Project Lambda enabled JVM to allow Lambda enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). Whilst this effort would constrain demos/examples to Aparapi's existing programming model constraints (parallel primitive arrays only) it would act as a 'tracer bullet' through the code base, allowing us to familiarize ourselves with the various major JVM components that we think will to impacted via a Sumatra implementation. > > Maybe others might like to use this discussion thread to offer other milestone proposals which we could also discuss in the upcoming call. > > Regards, > > Christophe > From Gary.Frost at amd.com Thu Nov 8 12:04:41 2012 From: Gary.Frost at amd.com (Frost, Gary) Date: Thu, 8 Nov 2012 20:04:41 +0000 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: <539F6CE1-A2F2-4460-824C-CFD1C7E5A83C@oracle.com> References: <873EA866-2024-4589-AB05-C4F54581F6AC@oracle.com> <539F6CE1-A2F2-4460-824C-CFD1C7E5A83C@oracle.com> Message-ID: Doug, It would indeed be very interesting to see this from a Graal perspective. I have been mulling the idea of using Graal to prototype the proposed milestone (Aparapi functionality pushed down into the JVM). I would like to hear how you think this might pan out. Gary -----Original Message----- From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-bounces at openjdk.java.net] On Behalf Of Doug Simon @ Oracle Sent: Thursday, November 08, 2012 8:47 AM To: sumatra-dev at openjdk.java.net Subject: Re: Syncup call: review contribution proposals and technical issues I'd be interested in listening in on the call as we're also interesting in the compiling for the GPU in the Graal[1] project. Either date or time work for me. -Doug [1] http://openjdk.java.net/projects/graal/ On Nov 8, 2012, at 3:27 PM, Thomas Wuerthinger wrote: > On Nov 8, 2012, at 3:18 PM, "Harle, Christophe" wrote: > >> All: >> >> We propose setting up an initial Sumatra project sync-up call next week to review near term contribution plans and discuss outstanding technical issues . >> If you are interested in attending this call, reply to this thread indicating which of the following proposed dates/time you prefer: Nov 13 8-10am PST or Nov 14 8-10am PST; we will choose the date that gather the largest number of votes,. >> >> In the spirit of seeding the discussion for this initial call, the AMD team got together and would like to propose the following initial milestone for Sumatra. >> >> This proposed milestone involves moving enough of Aparapi's implementation down into the Project Lambda enabled JVM to allow Lambda enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). Whilst this effort would constrain demos/examples to Aparapi's existing programming model constraints (parallel primitive arrays only) it would act as a 'tracer bullet' through the code base, allowing us to familiarize ourselves with the various major JVM components that we think will to impacted via a Sumatra implementation. >> >> Maybe others might like to use this discussion thread to offer other milestone proposals which we could also discuss in the upcoming call. >> >> Regards, >> >> Christophe > From david.r.chase at oracle.com Thu Nov 8 12:21:28 2012 From: david.r.chase at oracle.com (David Chase) Date: Thu, 8 Nov 2012 15:21:28 -0500 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: <509C0FA4.1050403@amd.com> References: <509C0FA4.1050403@amd.com> Message-ID: <2D0972E4-5D33-45D0-A8C8-5CA62AA236DD@oracle.com> 14th is preferred, 13th is possible. Not sure how much I have to contribute yet, I'm still coming up to speed on infrastructure. David From Vasanth.Venkatachalam at amd.com Thu Nov 8 12:25:16 2012 From: Vasanth.Venkatachalam at amd.com (Venkatachalam, Vasanth) Date: Thu, 8 Nov 2012 20:25:16 +0000 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: <509C0FA4.1050403@amd.com> References: <509C0FA4.1050403@amd.com> Message-ID: <5DD1503F815BD14889DC81D28643E3A7329A2A7E@sausexdag06.amd.com> Either day is fine with me. Vasanth -----Original Message----- From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-bounces at openjdk.java.net] On Behalf Of Eric Caspole Sent: Thursday, November 08, 2012 2:02 PM To: sumatra-dev at openjdk.java.net Subject: Re: Syncup call: review contribution proposals and technical issues Either day is OK with me. Eric On 11/08/2012 09:18 AM, Harle, Christophe wrote: > All: > > We propose setting up an initial Sumatra project sync-up call next week to review near term contribution plans and discuss outstanding technical issues . > If you are interested in attending this call, reply to this thread indicating which of the following proposed dates/time you prefer: Nov 13 8-10am PST or Nov 14 8-10am PST; we will choose the date that gather the largest number of votes,. > > In the spirit of seeding the discussion for this initial call, the AMD team got together and would like to propose the following initial milestone for Sumatra. > > This proposed milestone involves moving enough of Aparapi's implementation down into the Project Lambda enabled JVM to allow Lambda enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). Whilst this effort would constrain demos/examples to Aparapi's existing programming model constraints (parallel primitive arrays only) it would act as a 'tracer bullet' through the code base, allowing us to familiarize ourselves with the various major JVM components that we think will to impacted via a Sumatra implementation. > > Maybe others might like to use this discussion thread to offer other milestone proposals which we could also discuss in the upcoming call. > > Regards, > > Christophe > From Lee.Howes at amd.com Thu Nov 8 12:53:43 2012 From: Lee.Howes at amd.com (Howes, Lee) Date: Thu, 8 Nov 2012 20:53:43 +0000 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: <5DD1503F815BD14889DC81D28643E3A7329A2A7E@sausexdag06.amd.com> References: <509C0FA4.1050403@amd.com> <5DD1503F815BD14889DC81D28643E3A7329A2A7E@sausexdag06.amd.com> Message-ID: <5E3D9627421424498670FBDDE0E9AFEA03006CD8@storexdag04.amd.com> I'd rather do the 14th if possible. Lee -----Original Message----- From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-bounces at openjdk.java.net] On Behalf Of Venkatachalam, Vasanth Sent: Thursday, November 08, 2012 12:25 To: sumatra-dev at openjdk.java.net Subject: RE: Syncup call: review contribution proposals and technical issues Either day is fine with me. Vasanth -----Original Message----- From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-bounces at openjdk.java.net] On Behalf Of Eric Caspole Sent: Thursday, November 08, 2012 2:02 PM To: sumatra-dev at openjdk.java.net Subject: Re: Syncup call: review contribution proposals and technical issues Either day is OK with me. Eric On 11/08/2012 09:18 AM, Harle, Christophe wrote: > All: > > We propose setting up an initial Sumatra project sync-up call next week to review near term contribution plans and discuss outstanding technical issues . > If you are interested in attending this call, reply to this thread indicating which of the following proposed dates/time you prefer: Nov 13 8-10am PST or Nov 14 8-10am PST; we will choose the date that gather the largest number of votes,. > > In the spirit of seeding the discussion for this initial call, the AMD team got together and would like to propose the following initial milestone for Sumatra. > > This proposed milestone involves moving enough of Aparapi's implementation down into the Project Lambda enabled JVM to allow Lambda enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). Whilst this effort would constrain demos/examples to Aparapi's existing programming model constraints (parallel primitive arrays only) it would act as a 'tracer bullet' through the code base, allowing us to familiarize ourselves with the various major JVM components that we think will to impacted via a Sumatra implementation. > > Maybe others might like to use this discussion thread to offer other milestone proposals which we could also discuss in the upcoming call. > > Regards, > > Christophe > From Laurent.Morichetti at amd.com Thu Nov 8 15:14:11 2012 From: Laurent.Morichetti at amd.com (Morichetti, Laurent) Date: Thu, 8 Nov 2012 23:14:11 +0000 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: <5E3D9627421424498670FBDDE0E9AFEA03006CD8@storexdag04.amd.com> References: <509C0FA4.1050403@amd.com> <5DD1503F815BD14889DC81D28643E3A7329A2A7E@sausexdag06.amd.com> <5E3D9627421424498670FBDDE0E9AFEA03006CD8@storexdag04.amd.com> Message-ID: I am only available on 11/14. -Laurent -----Original Message----- From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-bounces at openjdk.java.net] On Behalf Of Howes, Lee Sent: Thursday, November 08, 2012 12:54 PM To: Venkatachalam, Vasanth; sumatra-dev at openjdk.java.net Subject: RE: Syncup call: review contribution proposals and technical issues I'd rather do the 14th if possible. Lee -----Original Message----- From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-bounces at openjdk.java.net] On Behalf Of Venkatachalam, Vasanth Sent: Thursday, November 08, 2012 12:25 To: sumatra-dev at openjdk.java.net Subject: RE: Syncup call: review contribution proposals and technical issues Either day is fine with me. Vasanth -----Original Message----- From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-bounces at openjdk.java.net] On Behalf Of Eric Caspole Sent: Thursday, November 08, 2012 2:02 PM To: sumatra-dev at openjdk.java.net Subject: Re: Syncup call: review contribution proposals and technical issues Either day is OK with me. Eric On 11/08/2012 09:18 AM, Harle, Christophe wrote: > All: > > We propose setting up an initial Sumatra project sync-up call next week to review near term contribution plans and discuss outstanding technical issues . > If you are interested in attending this call, reply to this thread indicating which of the following proposed dates/time you prefer: Nov 13 8-10am PST or Nov 14 8-10am PST; we will choose the date that gather the largest number of votes,. > > In the spirit of seeding the discussion for this initial call, the AMD team got together and would like to propose the following initial milestone for Sumatra. > > This proposed milestone involves moving enough of Aparapi's implementation down into the Project Lambda enabled JVM to allow Lambda enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). Whilst this effort would constrain demos/examples to Aparapi's existing programming model constraints (parallel primitive arrays only) it would act as a 'tracer bullet' through the code base, allowing us to familiarize ourselves with the various major JVM components that we think will to impacted via a Sumatra implementation. > > Maybe others might like to use this discussion thread to offer other milestone proposals which we could also discuss in the upcoming call. > > Regards, > > Christophe > From Mark.Nutter at amd.com Fri Nov 9 08:18:58 2012 From: Mark.Nutter at amd.com (Nutter, Mark) Date: Fri, 9 Nov 2012 16:18:58 +0000 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: References: <509C0FA4.1050403@amd.com> <5DD1503F815BD14889DC81D28643E3A7329A2A7E@sausexdag06.amd.com> <5E3D9627421424498670FBDDE0E9AFEA03006CD8@storexdag04.amd.com>, Message-ID: I prefer the 14th if possible. --mark ________________________________________ From: sumatra-dev-bounces at openjdk.java.net [sumatra-dev-bounces at openjdk.java.net] on behalf of Morichetti, Laurent [Laurent.Morichetti at amd.com] Sent: Thursday, November 08, 2012 5:14 PM To: sumatra-dev at openjdk.java.net Subject: RE: Syncup call: review contribution proposals and technical issues I am only available on 11/14. -Laurent -----Original Message----- From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-bounces at openjdk.java.net] On Behalf Of Howes, Lee Sent: Thursday, November 08, 2012 12:54 PM To: Venkatachalam, Vasanth; sumatra-dev at openjdk.java.net Subject: RE: Syncup call: review contribution proposals and technical issues I'd rather do the 14th if possible. Lee -----Original Message----- From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-bounces at openjdk.java.net] On Behalf Of Venkatachalam, Vasanth Sent: Thursday, November 08, 2012 12:25 To: sumatra-dev at openjdk.java.net Subject: RE: Syncup call: review contribution proposals and technical issues Either day is fine with me. Vasanth -----Original Message----- From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-bounces at openjdk.java.net] On Behalf Of Eric Caspole Sent: Thursday, November 08, 2012 2:02 PM To: sumatra-dev at openjdk.java.net Subject: Re: Syncup call: review contribution proposals and technical issues Either day is OK with me. Eric On 11/08/2012 09:18 AM, Harle, Christophe wrote: > All: > > We propose setting up an initial Sumatra project sync-up call next week to review near term contribution plans and discuss outstanding technical issues . > If you are interested in attending this call, reply to this thread indicating which of the following proposed dates/time you prefer: Nov 13 8-10am PST or Nov 14 8-10am PST; we will choose the date that gather the largest number of votes,. > > In the spirit of seeding the discussion for this initial call, the AMD team got together and would like to propose the following initial milestone for Sumatra. > > This proposed milestone involves moving enough of Aparapi's implementation down into the Project Lambda enabled JVM to allow Lambda enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). Whilst this effort would constrain demos/examples to Aparapi's existing programming model constraints (parallel primitive arrays only) it would act as a 'tracer bullet' through the code base, allowing us to familiarize ourselves with the various major JVM components that we think will to impacted via a Sumatra implementation. > > Maybe others might like to use this discussion thread to offer other milestone proposals which we could also discuss in the upcoming call. > > Regards, > > Christophe > From tom.deneau at amd.com Fri Nov 9 08:22:19 2012 From: tom.deneau at amd.com (Deneau, Tom) Date: Fri, 9 Nov 2012 16:22:19 +0000 Subject: Syncup call: review contribution proposals and technical issues Message-ID: I am fine with either the 13th or 14th. -- Tom From christophe.harle at amd.com Sun Nov 11 12:15:58 2012 From: christophe.harle at amd.com (Harle, Christophe) Date: Sun, 11 Nov 2012 20:15:58 +0000 Subject: Sumatra sync-up call: Wednesday Nov 14, 8-10am PST In-Reply-To: References: Message-ID: All: Based on responses to the thread below, we have more participants available for a Sumatra sync up call on Wed Nov 14 8-10am PST (see objective in the thread below). Please consider this e-mail to be your meeting invite and mark your calendar if you plan to attend. Logistics info for this call: * Date: Wednesday November 14 2012 * Time: 8-10am PST (10am-noon CST/ 11am-1pm EST) * Phone bridge: if you call from the US or Canada: 1-(877) 336-1283 ID 2570385 If you call from outside the US: 1-(404) 443-6389 ID 2570385 Regards, Christophe -----Original Message----- From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-bounces at openjdk.java.net] On Behalf Of Harle, Christophe Sent: Thursday, November 08, 2012 8:19 AM To: sumatra-dev at openjdk.java.net Subject: Syncup call: review contribution proposals and technical issues All: We propose setting up an initial Sumatra project sync-up call next week to review near term contribution plans and discuss outstanding technical issues . If you are interested in attending this call, reply to this thread indicating which of the following proposed dates/time you prefer: Nov 13 8-10am PST or Nov 14 8-10am PST; we will choose the date that gather the largest number of votes,. In the spirit of seeding the discussion for this initial call, the AMD team got together and would like to propose the following initial milestone for Sumatra. This proposed milestone involves moving enough of Aparapi's implementation down into the Project Lambda enabled JVM to allow Lambda enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). Whilst this effort would constrain demos/examples to Aparapi's existing programming model constraints (parallel primitive arrays only) it would act as a 'tracer bullet' through the code base, allowing us to familiarize ourselves with the various major JVM components that we think will to impacted via a Sumatra implementation. Maybe others might like to use this discussion thread to offer other milestone proposals which we could also discuss in the upcoming call. Regards, Christophe From Ryan.LaMothe at pnnl.gov Tue Nov 13 12:44:44 2012 From: Ryan.LaMothe at pnnl.gov (LaMothe, Ryan R) Date: Tue, 13 Nov 2012 12:44:44 -0800 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: Message-ID: We've been using and developing with Aparapi extensively over the past year on a number of high performance computing projects. Is this thread open to a list of our initial impressions with both Aparapi and Java which I think we should discuss tomorrow? Sorry for the late reply, it has been a hectic couple of weeks. __________________________________________________ Ryan LaMothe On 11/8/12 6:18 AM, "Harle, Christophe" wrote: >All: > >We propose setting up an initial Sumatra project sync-up call next week >to review near term contribution plans and discuss outstanding technical >issues . >If you are interested in attending this call, reply to this thread >indicating which of the following proposed dates/time you prefer: Nov 13 >8-10am PST or Nov 14 8-10am PST; we will choose the date that gather the >largest number of votes,. > >In the spirit of seeding the discussion for this initial call, the AMD >team got together and would like to propose the following initial >milestone for Sumatra. > >This proposed milestone involves moving enough of Aparapi's >implementation down into the Project Lambda enabled JVM to allow Lambda >enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). >Whilst this effort would constrain demos/examples to Aparapi's existing >programming model constraints (parallel primitive arrays only) it would >act as a 'tracer bullet' through the code base, allowing us to >familiarize ourselves with the various major JVM components that we think >will to impacted via a Sumatra implementation. > >Maybe others might like to use this discussion thread to offer other >milestone proposals which we could also discuss in the upcoming call. > >Regards, > >Christophe From Gary.Frost at amd.com Tue Nov 13 13:21:16 2012 From: Gary.Frost at amd.com (Frost, Gary) Date: Tue, 13 Nov 2012 21:21:16 +0000 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: References: Message-ID: Ryan If you believe that this experience with Aparapi can help us define direction/goals/milestones for Sumatra then I think that that input will be very valuable. My guess is we can learn from users/implementers of many of the existing OpenCL/CUDA bindings. Gary -----Original Message----- From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-bounces at openjdk.java.net] On Behalf Of LaMothe, Ryan R Sent: Tuesday, November 13, 2012 2:45 PM To: Harle, Christophe; sumatra-dev at openjdk.java.net Subject: Re: Syncup call: review contribution proposals and technical issues We've been using and developing with Aparapi extensively over the past year on a number of high performance computing projects. Is this thread open to a list of our initial impressions with both Aparapi and Java which I think we should discuss tomorrow? Sorry for the late reply, it has been a hectic couple of weeks. __________________________________________________ Ryan LaMothe On 11/8/12 6:18 AM, "Harle, Christophe" wrote: >All: > >We propose setting up an initial Sumatra project sync-up call next week >to review near term contribution plans and discuss outstanding >technical issues . >If you are interested in attending this call, reply to this thread >indicating which of the following proposed dates/time you prefer: Nov >13 8-10am PST or Nov 14 8-10am PST; we will choose the date that gather >the largest number of votes,. > >In the spirit of seeding the discussion for this initial call, the AMD >team got together and would like to propose the following initial >milestone for Sumatra. > >This proposed milestone involves moving enough of Aparapi's >implementation down into the Project Lambda enabled JVM to allow Lambda >enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). >Whilst this effort would constrain demos/examples to Aparapi's existing >programming model constraints (parallel primitive arrays only) it would >act as a 'tracer bullet' through the code base, allowing us to >familiarize ourselves with the various major JVM components that we >think will to impacted via a Sumatra implementation. > >Maybe others might like to use this discussion thread to offer other >milestone proposals which we could also discuss in the upcoming call. > >Regards, > >Christophe From john.r.rose at oracle.com Tue Nov 13 16:21:32 2012 From: john.r.rose at oracle.com (John Rose) Date: Tue, 13 Nov 2012 16:21:32 -0800 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: References: Message-ID: <4644F269-C925-45B8-9ECA-2301E4670089@oracle.com> I agree. Please post you experiences or pointers thereto. -- John (on my iPhone) On Nov 13, 2012, at 1:21 PM, "Frost, Gary" wrote: > If you believe that this experience with Aparapi can help us define direction/goals/milestones for Sumatra then I think that that input will be very valuable. > > My guess is we can learn from users/implementers of many of the existing OpenCL/CUDA From john.r.rose at oracle.com Tue Nov 13 16:24:31 2012 From: john.r.rose at oracle.com (John Rose) Date: Tue, 13 Nov 2012 16:24:31 -0800 Subject: Sumatra sync-up call: Wednesday Nov 14, 8-10am PST In-Reply-To: References: Message-ID: Sadly I am not available this week. Will respond separately to the millstones, which are basically spot-on. -- John (on my iPhone) On Nov 11, 2012, at 12:15 PM, "Harle, Christophe" wrote: > Based on responses to the thread below From john.r.rose at oracle.com Tue Nov 13 16:53:50 2012 From: john.r.rose at oracle.com (John Rose) Date: Tue, 13 Nov 2012 16:53:50 -0800 Subject: Milestones In-Reply-To: References: Message-ID: <3A8BA43A-2FFD-45DE-9CFA-0558446650AD@oracle.com> On Nov 8, 2012, at 6:18 AM, "Harle, Christophe" wrote: > This proposed milestone involves moving enough of Aparapi's implementation down into the Project Lambda enabled JVM to allow Lambda enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). Whilst this effort would constrain demos/examples to Aparapi's existing programming model constraints (parallel primitive arrays only) it would act as a 'tracer bullet' through the code base, allowing us to familiarize ourselves with the various major JVM components that we think will to impacted via a Sumatra implementation. The "tracer bullet" is good. Do we really need lambda for a first victory? I think online code generation, just by itself, even without execution, would be huge. Other milestones: ? First use of CompileBroker to trigger GPU codegen. ? First use of C2 IR rendered to GPU code. (Note: Codegen does not imply execution. Could be OptoNoExecute mode.) ? First run of GPU code from online generation. ? First run of GPU code from (one-off) lambda collection library. ? First run of GPU code from (enhanced) standard parallel collection. ? First profitable run of ditto. On the data front: ? First use of pointer-chained Java data structure from GPU. ? First GC tracing through GPU task data. ? First call and return of GPU method with zero copying. (Already done?) Here, "zero" is any O(#GPUs), which is less than the bulk data size. Not sure exactly how to order these. I am convince that there are about 3 fronts that need roghly simultaneous advance: ? code: how to get code into the GPUs. Should be an online hot-spot reindeer by a JIT. ? data: how to get data in and out. Should contain managed pointers, but not as many as classic Java data. I.e. flatter with more "little structs". See Arrays 2.0 and value types. ? Libraries: teach parallel collections to generate GPU commands on enabled systems. Translate lambda bodies and loop kernels to GPU code. Best wishes, ? John From john.r.rose at oracle.com Tue Nov 13 17:03:50 2012 From: john.r.rose at oracle.com (John Rose) Date: Tue, 13 Nov 2012 17:03:50 -0800 Subject: Milestones In-Reply-To: <3A8BA43A-2FFD-45DE-9CFA-0558446650AD@oracle.com> References: <3A8BA43A-2FFD-45DE-9CFA-0558446650AD@oracle.com> Message-ID: <0104898D-FFE7-41F4-AE45-03CAFC6CA49A@oracle.com> Ah, please ignore the reindeer. That's a corporate trade secret weapon. Pretend I said "rendered". -- John (on my iPhone) On Nov 13, 2012, at 4:53 PM, John Rose wrote: > On Nov 8, 2012, at 6:18 AM, "Harle, Christophe" wrote: > >> This proposed milestone involves moving enough of Aparapi's implementation down into the Project Lambda enabled JVM to allow Lambda enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). Whilst this effort would constrain demos/examples to Aparapi's existing programming model constraints (parallel primitive arrays only) it would act as a 'tracer bullet' through the code base, allowing us to familiarize ourselves with the various major JVM components that we think will to impacted via a Sumatra implementation. > > The "tracer bullet" is good. > > Do we really need lambda for a first victory? I think online code generation, just by itself, even without execution, would be huge. > > Other milestones: > ? First use of CompileBroker to trigger GPU codegen. > ? First use of C2 IR rendered to GPU code. > (Note: Codegen does not imply execution. Could be OptoNoExecute mode.) > ? First run of GPU code from online generation. > ? First run of GPU code from (one-off) lambda collection library. > ? First run of GPU code from (enhanced) standard parallel collection. > ? First profitable run of ditto. > > On the data front: > ? First use of pointer-chained Java data structure from GPU. > ? First GC tracing through GPU task data. > ? First call and return of GPU method with zero copying. (Already done?) Here, "zero" is any O(#GPUs), which is less than the bulk data size. > > Not sure exactly how to order these. > > I am convince that there are about 3 fronts that need roghly simultaneous advance: > ? code: how to get code into the GPUs. Should be an online hot-spot reindeer by a JIT. > ? data: how to get data in and out. Should contain managed pointers, but not as many as classic Java data. I.e. flatter with more "little structs". See Arrays 2.0 and value types. > ? Libraries: teach parallel collections to generate GPU commands on enabled systems. Translate lambda bodies and loop kernels to GPU code. > > Best wishes, > ? John From Ryan.LaMothe at pnnl.gov Tue Nov 13 21:25:25 2012 From: Ryan.LaMothe at pnnl.gov (LaMothe, Ryan R) Date: Tue, 13 Nov 2012 21:25:25 -0800 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: <4644F269-C925-45B8-9ECA-2301E4670089@oracle.com> Message-ID: This feedback will be mostly high-level, based on day-to-day usage of Aparapi and Java to perform GPU-based computation acceleration (large scale data analytics and high performance computing). We currently have a number of researchers and software engineers working with Aparapi. Having said that, here is a summary (brain dump) of our experiences thus far, in no particular order. This information is intended to spark discussion about research areas and ideas going forward: - Java's Collections API is extremely popular and powerful. Unfortunately, Java Collections do not accept primitive types, only boxed types. Using Java arrays directly, from the Java developer perspective, is extremely unpopular especially when the arrays have to created, populated and managed independent of the original data structures just for use with Aparapi. This results in a forced mixture of application level development with systems level development in order to achieve results. - Additional point about Collections, there is a distinct lack of a Matrix collection type. We've ended up using Google's Guava for these kinds of collections, when we're not using Java arrays. - Java's multi-dimensional arrays are non-contiguous. This is currently a huge problem for us, because we are usually dealing with very large multi-dimensional matrix data (please see my talk at 2012 AFDS). This turned out to also be a limitation of OpenCL as we unfortunately cannot pass arrays of pointers. Currently, in application code we end up copying multi-dimensional data into a single dimensional array, passing the single dimensional array to OpenCL and then copying all of the OpenCL results back into multi-dimensional data structures for later processing. These are extremely expensive and wasteful operations. We currently have researchers working on a Buffer object concept in Aparapi to manage the OpenCL-level multi-dimensional array access transparently to the user. In case you are wondering why we do not use single dimensional arrays to begin with, see my points above, developers either do not like using arrays directly when Collections are available or our data is given to us to process in non-array objects. - Lambdas can be interesting and powerful, but as you can see a theme developing here, developers prefer less verbosity and increased elegance over extra verbosity and potential clunkiness. Please do not take that as a swipe at Java Lambdas or Gary's example code below, it is just a sentiment in general. For example, while we demonstrate and discuss Aparapi internally, both today's version and tomorrow's JDK 8 mock-ups, the general consensus is that it is painful and overly verbose to have to extend a base class, implement a bunch of interfaces (possibly), create host kernel management code, pass code as arguments (lambdas), etc. just to parallelize something like a for-loop. The most common questions are "why not use OpenMP or OpenACC?". In other words, why not allow developers to add @Parallel (or similar) to basic for-loops and loop blocks and have the JIT compiler be smart enough to parallelize the code to CPUs and GPUs automatically? This certainly works in OpenMP and OpenACC using pragmas and directives, respectively. I was also informed recently that JSR 308 is not going to make it into JDK 8? That is a real shame if that is true. Example: Lambdas allBodies.parallel().forEach( b -> { } ); JSR 308 @StartParallel for(?) { } @EndParallel - Here is where I think we get into the meat and potatoes of Aparapi. Right now, computation in Aparapi is performed in a synchronous, data parallel fashion. Kernels and KernelRunners are tightly coupled, where there is exactly one KernelRunner per Kernel. Kernel execution takes place in a synchronous, fully synchronized, single-threaded model (please correct me if I am mistaken Gary). While there appear to be reasons why this was done, most likely to work-around certain limitations within the JVM and JNI, it does limit the potential performance gains achievable over full-duplexed buses like PCIe. In particular, whenever we have matrices that are larger than the available GPU memory and we have to stripe/sub-block/etc. the matrices, the synchronous execution model causes a serious performance hit. What would be ideal is if there was a concept of task parallelization as well as data parallelization. This could imply the following (very simple list): - The capability of performing computations asynchronously, where there may be one KernelRunner for multiple Kernels (for example) and application-level code would receive a call-back when computation was complete for each kernel. This is generally known as "overlapping compute" or "overlapping data transfers with compute". We've started discussing how we'd implement callbacks with Aparapi, possibly using an EventBus concept, very similar to Google's Guava EventBus (i.e. Publish/Subscribe Annotations on methods) but currently have no firm ideas how we'd implement asynchronous execution. - A single KernelRunner would imply a centralized Resource manager that could support in-order and out-of-order queuing models, resource allocation, data (pre)fetching, task scheduling, hardware scheduling, etc. - The capability of performing computation on a first GPU (one Kernel), transferring the results directly to a second GPU (another Kernel or separate entry point on first Kernel) and beginning a new computation on the first GPU. This kind of execution model is supported under CUDA, for example. - The capability to target parallel execution models which may not involve GPUs. - To the last sub-bullet above, if we are truly targeting heterogenous and hybrid multi-core architectures, do we really want to limit ourselves to only GPUs and GPU execution models? By this, I mean, could we intelligently target OpenMP, OpenCL/HSA, CUDA and MPI as needed? Frameworks currently exist that support this already and work very well, please see StarPU for an excellent example (A Unified Runtime System for Heterogeneous Multicore Architectures). If we start to think about Sumatra computation execution as parallel tasks operating on parallel data right from the start, I think that using or creating a system like StarPU at the JVM level could be very powerful and flexible as we move forward. - A very basic example of this is what we are currently seeing with APUs. Current APUs are not powerful enough for all of our computational needs compared with discrete GPUs. So, we're seeing systems with both APUs and discrete GPUs, which makes a lot sense. Going forward, we're going to see a lot more Intel MIC processors in our clusters and will be keenly interested in also targeting that hardware with no code changes, if possible. If we could tap into our supercomputing clusters MPI infrastructure, then the sky's the limit. - All of this should be as easy (i.e. Annotation-driven) and transparent (i.e. Libraries, Libraries, Libraries :) as possible Anything I am forgetting? __________________________________________________ Ryan LaMothe On 11/13/12 4:21 PM, "John Rose" wrote: >I agree. Please post you experiences or pointers thereto. > >-- John (on my iPhone) > >On Nov 13, 2012, at 1:21 PM, "Frost, Gary" wrote: > >> If you believe that this experience with Aparapi can help us define >>direction/goals/milestones for Sumatra then I think that that input will >>be very valuable. >> >> My guess is we can learn from users/implementers of many of the >>existing OpenCL/CUDA From Gary.Frost at amd.com Wed Nov 14 06:18:07 2012 From: Gary.Frost at amd.com (Frost, Gary) Date: Wed, 14 Nov 2012 14:18:07 +0000 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: Message-ID: Ryan, There is some great information here. Sumatra can clearly benefit from your feedback from using Aparapi 'in the trenches'. I am on the way to the airport right now, but hope that we can pin these to a wall somewhere ;) and include these thoughts in our research and development going forward. I think you are really going to like the Lambda extensions to the collections API's (if you have not played with them so far I would highly recommend it), whilst the self imposed constraint we have placed (to work within the Java 8 language specification) will limit us, I do think this is a healthy/pragmatic decision which will help us determine where GPU offload will fit-in rather than jumping immediately to language extensions which may not fit the bigger picture. I know John has been looking at arrays so I do think there will be some opportunities for discuss WRT array layout, multi-dim array layout (and my favorite concern :) ) memory layout for arrays of objects. I don't mind the annotation style (I know we have discussed it before offline) but of course the current annotation limitations will not permit annotations on arbitrary blocks of code. I would like to see this (in the future) but don't feel too strongly to push for it here. You are right to point out that Aparapi essentially blocks until the GPU compute has returned, this was necessary due to limitations of pinning memory from the GC. Once we have finer control of memory (via Sumatra under the cover in the JVM) these limitations (and many others) should be surmountable, and we should be able to provide a much richer execution model. However, I think (once again) the opportunities for some simpler and more optimal models will become obvious as we dive into the Collections+Lambda APIs. Ok. I am being summoned to put bags in the car :) I hope some of these points come up in the call today. This is a great discussion. Gary On 11/13/12 11:25 PM, "LaMothe, Ryan R" wrote: >This feedback will be mostly high-level, based on day-to-day usage of >Aparapi and Java to perform GPU-based computation acceleration (large >scale data analytics and high performance computing). We currently have a >number of researchers and software engineers working with Aparapi. > >Having said that, here is a summary (brain dump) of our experiences thus >far, in no particular order. This information is intended to spark >discussion about research areas and ideas going forward: > >- Java's Collections API is extremely popular and powerful. Unfortunately, >Java Collections do not accept primitive types, only boxed types. Using >Java arrays directly, from the Java developer perspective, is extremely >unpopular especially when the arrays have to created, populated and >managed independent of the original data structures just for use with >Aparapi. This results in a forced mixture of application level development >with systems level development in order to achieve results. > >- Additional point about Collections, there is a distinct lack of a Matrix >collection type. We've ended up using Google's Guava for these kinds of >collections, when we're not using Java arrays. > >- Java's multi-dimensional arrays are non-contiguous. This is currently a >huge problem for us, because we are usually dealing with very large >multi-dimensional matrix data (please see my talk at 2012 AFDS). This >turned out to also be a limitation of OpenCL as we unfortunately cannot >pass arrays of pointers. Currently, in application code we end up copying >multi-dimensional data into a single dimensional array, passing the single >dimensional array to OpenCL and then copying all of the OpenCL results >back into multi-dimensional data structures for later processing. These >are extremely expensive and wasteful operations. We currently have >researchers working on a Buffer object concept in Aparapi to manage the >OpenCL-level multi-dimensional array access transparently to the user. In >case you are wondering why we do not use single dimensional arrays to >begin with, see my points above, developers either do not like using >arrays directly when Collections are available or our data is given to us >to process in non-array objects. > >- Lambdas can be interesting and powerful, but as you can see a theme >developing here, developers prefer less verbosity and increased elegance >over extra verbosity and potential clunkiness. Please do not take that as >a swipe at Java Lambdas or Gary's example code below, it is just a >sentiment in general. For example, while we demonstrate and discuss >Aparapi internally, both today's version and tomorrow's JDK 8 mock-ups, >the general consensus is that it is painful and overly verbose to have to >extend a base class, implement a bunch of interfaces (possibly), create >host kernel management code, pass code as arguments (lambdas), etc. just >to parallelize something like a for-loop. The most common questions are >"why not use OpenMP or OpenACC?". In other words, why not allow developers >to add @Parallel (or similar) to basic for-loops and loop blocks and have >the JIT compiler be smart enough to parallelize the code to CPUs and GPUs >automatically? This certainly works in OpenMP and OpenACC using pragmas >and directives, respectively. I was also informed recently that JSR 308 is >not going to make it into JDK 8? That is a real shame if that is true. > >Example: > >Lambdas > >allBodies.parallel().forEach( b -> { > >} ); > > >JSR 308 > >@StartParallel >for(?) { > >} >@EndParallel > >- Here is where I think we get into the meat and potatoes of Aparapi. >Right now, computation in Aparapi is performed in a synchronous, data >parallel fashion. Kernels and KernelRunners are tightly coupled, where >there is exactly one KernelRunner per Kernel. Kernel execution takes place >in a synchronous, fully synchronized, single-threaded model (please >correct me if I am mistaken Gary). While there appear to be reasons why >this was done, most likely to work-around certain limitations within the >JVM and JNI, it does limit the potential performance gains achievable over >full-duplexed buses like PCIe. In particular, whenever we have matrices >that are larger than the available GPU memory and we have to >stripe/sub-block/etc. the matrices, the synchronous execution model causes >a serious performance hit. What would be ideal is if there was a concept >of task parallelization as well as data parallelization. This could imply >the following (very simple list): > > - The capability of performing computations asynchronously, where there >may be one KernelRunner for multiple Kernels (for example) and >application-level code would receive a call-back when computation was >complete for each kernel. This is generally known as "overlapping compute" >or "overlapping data transfers with compute". We've started discussing how >we'd implement callbacks with Aparapi, possibly using an EventBus concept, >very similar to Google's Guava EventBus (i.e. Publish/Subscribe >Annotations on methods) but currently have no firm ideas how we'd >implement asynchronous execution. > > - A single KernelRunner would imply a centralized Resource manager that >could support in-order and out-of-order queuing models, resource >allocation, data (pre)fetching, task scheduling, hardware scheduling, etc. > > - The capability of performing computation on a first GPU (one Kernel), >transferring the results directly to a second GPU (another Kernel or >separate entry point on first Kernel) and beginning a new computation on >the first GPU. This kind of execution model is supported under CUDA, for >example. > > - The capability to target parallel execution models which may not >involve GPUs. > >- To the last sub-bullet above, if we are truly targeting heterogenous and >hybrid multi-core architectures, do we really want to limit ourselves to >only GPUs and GPU execution models? By this, I mean, could we >intelligently target OpenMP, OpenCL/HSA, CUDA and MPI as needed? >Frameworks currently exist that support this already and work very well, >please see StarPU for an excellent example (A Unified Runtime System for >Heterogeneous Multicore Architectures). If we start to think about Sumatra >computation execution as parallel tasks operating on parallel data right >from the start, I think that using or creating a system like StarPU at the >JVM level could be very powerful and flexible as we move forward. > > - A very basic example of this is what we are currently seeing with APUs. >Current APUs are not powerful enough for all of our computational needs >compared with discrete GPUs. So, we're seeing systems with both APUs and >discrete GPUs, which makes a lot sense. Going forward, we're going to see >a lot more Intel MIC processors in our clusters and will be keenly >interested in also targeting that hardware with no code changes, if >possible. If we could tap into our supercomputing clusters MPI >infrastructure, then the sky's the limit. > >- All of this should be as easy (i.e. Annotation-driven) and transparent >(i.e. Libraries, Libraries, Libraries :) as possible > > >Anything I am forgetting? > > >__________________________________________________ > >Ryan LaMothe > > > >On 11/13/12 4:21 PM, "John Rose" wrote: > >>I agree. Please post you experiences or pointers thereto. >> >>-- John (on my iPhone) >> >>On Nov 13, 2012, at 1:21 PM, "Frost, Gary" wrote: >> >>> If you believe that this experience with Aparapi can help us define >>>direction/goals/milestones for Sumatra then I think that that input will >>>be very valuable. >>> >>> My guess is we can learn from users/implementers of many of the >>>existing OpenCL/CUDA > > From jowens at ece.ucdavis.edu Wed Nov 14 10:23:19 2012 From: jowens at ece.ucdavis.edu (John Owens) Date: Wed, 14 Nov 2012 10:23:19 -0800 Subject: looking for research problems Message-ID: Greetings, I enjoyed the discussion on the concall this morning. I promised to follow up with a note explaining our interest. My name is John Owens and I'm an associate professor at UC Davis, where my group focuses on GPU computing. We have substantial experience in the field, with particular focus on fundamental data structures and algorithms, multi-GPU computing, and applications. I've appended pointers to some of our work at the end of this note. We have been sponsored by all three of AMD, Intel, and NVIDIA, though for different projects, as well as NSF and the DOE. I'm also a NVIDIA CUDA Fellow and slated to be on the upcoming academic advisory board for OpenCL. Opening up Java to GPUs is a huge task, and there's an enormous amount of engineering effort and monkeying with the compiler that is really important that we're not so suited to do. I'm happy that many of you have already started this work! What we want to find is the hard research problems. Where we've had a lot of success is "mapping X to a GPU programming model", where "X" is complex algorithms and/or data structures. My broad read of other work that's been done with languages and GPUs is that it's mostly been done from the language side, where there's clearly a lot of interesting work to be done, but at the end of the day, the GPU is used to run "map"-style computation that is straightforward to parallelize. We'd like to broaden the scope of what should be run on a GPU: more irregular computation that is harder to parallelize, a broader set of data structures, and so on. For instance, we've recently done some work on hash tables on GPUs. We're currently starting a project on large graph algorithms. The scan primitives are rarely core components of language-mapping efforts, though they should be. These are the sorts of tasks that aren't usually addressed in the GPU-language efforts that I've seen. NSF has an upcoming call for proposals in which I am quite interested. http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504842&org=CISE&from=home I am considering two areas of focus, both inspired by my current sabbatical at Twitter (where we're working in Scala atop the JVM): 1) Looking at mapping a higher-level language to the GPU. I think this is exactly the focus of Sumatra, though the functional nature of Scala makes it attractive as well. 2) Looking at making GPUs useful in the data center. Twitter uses lots of large open-source components like memcached or various databases or Hadoop. Where does the compute go in a data center and can we make it more efficient / faster by mapping it to the GPU. This CFP requires that I submit proposals with at least one (academic) partner; they don't accept sole PI proposals, so I'll be looking for folks who have language and data center expertise, respectively. I am knowledgeable about the GPU computing literature in general (though not an expert on Java-specific GPU work like Aparapi, Rootbeer, etc., where others on this list are better suited) and am happy to address any questions on the research side. I hope we can find some common ground for future collaboration! (Please note that I'm only on the daily digest for this email list, so please copy me on any replies so I can address them more quickly.) JDO ====== http://www.ece.ucdavis.edu/~jowens/ http://www.ece.ucdavis.edu/~jowens/research.html http://www.ece.ucdavis.edu/~jowens/pubs.html Notable work: - comparison-based sort (e.g. strings) http://www.idav.ucdavis.edu/publications/print_pub?pub_id=1085 - data compression (bzip2) http://www.idav.ucdavis.edu/publications/print_pub?pub_id=1087 - hash tables http://www.sciencedirect.com/science/article/pii/B9780123859631000046 http://www.idav.ucdavis.edu/publications/print_pub?pub_id=973 - task parallelism on GPU: http://www.idav.ucdavis.edu/publications/print_pub?pub_id=1091 http://www.idav.ucdavis.edu/publications/print_pub?pub_id=1036 - MapReduce (first multi-GPU MR, first out-of-core MR) http://www.idav.ucdavis.edu/publications/print_pub?pub_id=1051 - the scan primitives (first O(n) scan, first segmented scan, SpMV, quicksort) http://www.idav.ucdavis.edu/publications/print_pub?pub_id=1041 http://www.idav.ucdavis.edu/publications/print_pub?pub_id=915 - MPI (first GPU MPI work) http://www.idav.ucdavis.edu/publications/print_pub?pub_id=959 -- John Owens Associate Professor, Electrical and Computer Engineering University of California, Davis One Shields Avenue, Davis, CA 95616 http://www.ece.ucdavis.edu/~jowens/ From alex.buckley at oracle.com Wed Nov 14 10:53:55 2012 From: alex.buckley at oracle.com (Alex Buckley) Date: Wed, 14 Nov 2012 10:53:55 -0800 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: References: Message-ID: <50A3E8C3.7040402@oracle.com> Ryan, On 11/13/2012 9:25 PM, LaMothe, Ryan R wrote: > In other words, why not allow developers to add @Parallel (or > similar) to basic for-loops and loop blocks and have the JIT compiler > be smart enough to parallelize the code to CPUs and GPUs > automatically? This certainly works in OpenMP and OpenACC using > pragmas and directives, respectively. I was also informed recently > that JSR 308 is not going to make it into JDK 8? That is a real shame > if that is true. As co-spec lead for JSR 308, let me clarify two points: - JSR 308 is part of Java SE 8. This is stated in JSR 337, the umbrella JSR for Java SE 8 [1]. JDK 8 milestone 6 in January will include JEP 104, the reference implementation of JSR 308 [2]. Please see the OpenJDK Type Annotations project for further details [3]. - JSR 308 is about annotations on _types_, not statements or expressions. Type annotations are the foundation for pluggable type checking whose primary goal is giving additional safety guarantees to the programmer. In contrast, statement and expression annotations tend to have as their primary goal a new semantics for the annotated statement or expression, different from the semantics in the JLS. Our mantra has always been "one language, the same everywhere", so new semantics for statements and expressions have not been welcome. Also, annotating statements and expressions may not be ideal because a source compiler may transform a statement or expression into whatever bytecode it likes, provided it respects the semantics in the JLS; moreover there is no support in the java.lang.reflect or javax.lang.model APIs for reflection over statements and expressions. Alex [1] http://jcp.org/en/jsr/detail?id=337 [2] http://openjdk.java.net/projects/jdk8/milestones [3] http://openjdk.java.net/projects/type-annotations/ From gerard.ziemski at oracle.com Wed Nov 14 11:45:43 2012 From: gerard.ziemski at oracle.com (Gerard Ziemski) Date: Wed, 14 Nov 2012 13:45:43 -0600 Subject: Feedback/questions from today's call In-Reply-To: <3A8BA43A-2FFD-45DE-9CFA-0558446650AD@oracle.com> References: <3A8BA43A-2FFD-45DE-9CFA-0558446650AD@oracle.com> Message-ID: <50A3F4E7.5070206@oracle.com> hi all, Forgive me if I'm addressing the elephant in the room, but wouldn't it have been useful to invite NVIDIA to participate in these calls? It seems to me that today's discussion covered a lot of general topics that would apply to the other GPU team and even the platform specific questions to AMD folks would benefit the project if answered from the NVIDIA team at the same time. A couple of general questions: #1 What kind of performance counters does the GPU provide that could possibly help the JVM to figure out the most optimal implementation when compiling the Java code for it? Is it possible to have some sort of feedback loop from GPU telling JVM how well its code is executing? #2 Is it possible to debug/analyze code executing on GPU? Can it be exposed in Java debugger in any way? #3 Can a GPU program crash? What are the examples that would cause it crash and how is it handled? #4 For a client system, which I will define as a system where a process is just one of many other processes running concurrently that may want to use the same resources (ie. for GPU other apps such as a window manager, browser), or even the same process that uses the GPU for computation and also for rendering visuals (ie. JavaFX which uses GPU to render its UI), is it a concern to possibly starve other parts of the system by JVM if it starts using the GPU aggressively? cheers From bernard.traversat at oracle.com Wed Nov 14 13:24:45 2012 From: bernard.traversat at oracle.com (Bernard Traversat) Date: Wed, 14 Nov 2012 13:24:45 -0800 Subject: Feedback/questions from today's call In-Reply-To: <50A3F4E7.5070206@oracle.com> References: <3A8BA43A-2FFD-45DE-9CFA-0558446650AD@oracle.com> <50A3F4E7.5070206@oracle.com> Message-ID: <66A7BD45-BB26-4103-9013-A1FDB5658472@oracle.com> Gerard, OpenJDK project participation is open. You don't need a special invite to participate. Everyone is welcome to subscribe and participate. We mentioned this project to NVidia as well as other potential interested Java community members. Cheers, B. On Nov 14, 2012, at 11:45 AM, Gerard Ziemski wrote: > hi all, > > Forgive me if I'm addressing the elephant in the room, but wouldn't it have been useful to invite NVIDIA to participate in these calls? It seems to me that today's discussion covered a lot of general topics that would apply to the other GPU team and even the platform specific questions to AMD folks would benefit the project if answered from the NVIDIA team at the same time. > > A couple of general questions: > > #1 What kind of performance counters does the GPU provide that could possibly help the JVM to figure out the most optimal implementation when compiling the Java code for it? Is it possible to have some sort of feedback loop from GPU telling JVM how well its code is executing? > > #2 Is it possible to debug/analyze code executing on GPU? Can it be exposed in Java debugger in any way? > > #3 Can a GPU program crash? What are the examples that would cause it crash and how is it handled? > > #4 For a client system, which I will define as a system where a process is just one of many other processes running concurrently that may want to use the same resources (ie. for GPU other apps such as a window manager, browser), or even the same process that uses the GPU for computation and also for rendering visuals (ie. JavaFX which uses GPU to render its UI), is it a concern to possibly starve other parts of the system by JVM if it starts using the GPU aggressively? > > > cheers > From doug.simon at oracle.com Fri Nov 9 09:04:48 2012 From: doug.simon at oracle.com (Doug Simon @ Oracle) Date: Fri, 9 Nov 2012 18:04:48 +0100 Subject: Syncup call: review contribution proposals and technical issues In-Reply-To: References: <873EA866-2024-4589-AB05-C4F54581F6AC@oracle.com> <539F6CE1-A2F2-4460-824C-CFD1C7E5A83C@oracle.com> Message-ID: <164EA090-B1C2-45D8-A208-7BD6509BC3A4@oracle.com> On Nov 8, 2012, at 9:04 PM, "Frost, Gary" wrote: > Doug, > > It would indeed be very interesting to see this from a Graal perspective. I have been mulling the idea of using Graal to prototype the proposed milestone (Aparapi functionality pushed down into the JVM). I would like to hear how you think this might pan out. Sure. I'll admit up front that I'm not very familiar with compiling for the GPU. As such, I'll brush up on Arapari between now and the call next week! -Doug From witold.bolt at hope.art.pl Wed Nov 14 14:05:48 2012 From: witold.bolt at hope.art.pl (=?iso-8859-2?Q?Witold_Bo=B3t?=) Date: Wed, 14 Nov 2012 23:05:48 +0100 Subject: Feedback/questions from today's call In-Reply-To: <50A3F4E7.5070206@oracle.com> References: <3A8BA43A-2FFD-45DE-9CFA-0558446650AD@oracle.com> <50A3F4E7.5070206@oracle.com> Message-ID: <80F89286-60DC-4337-8D37-A29283B91700@hope.art.pl> Hi. I have partial answers to your questions? #2: AFAIK this is difficult and vendor dependent. AMD (http://developer.amd.com/tools/heterogeneous-computing/amd-gdebugger/), nVidia (http://www.nvidia.com/object/nsight.html) and Intel (http://software.intel.com/en-us/articles/programming-with-the-intel-sdk-for-opencl-applications-development-tools) have some tools that help to analyze/debug kernels? but its nothing "standardized", open and accessible to 3rd parties. Also? from some small experiences with Aparapi I found that using such tools in some cases is really hard and in others its not possible at all. Maybe it all changed during last couple of months, but when I tried it?. it was really disappointing. The only "standard" thing that is present is a basic profiling API defined within OpenCL standard (http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetEventProfilingInfo.html), that allows execution time queries for different OpenCL operations. #3: Obviously there are fatal errors possible - like dividing by zero, accessing buffers outside of defined scope and so on. This is quite interesting topic. For now Aparapi doesn't help much in this area - see this: "Runtime Exceptions ? When run on the GPU, array accesses will not generate an ArrayIndexOutOfBoundsException. Instead the behavior will be unspecified. ? When run on the GPU, ArithmeticExceptions will not be generated, for example with integer division by zero. Instead the behavior will be unspecified." (http://code.google.com/p/aparapi/wiki/JavaKernelGuidelines) So it means that you can get "some" results from faulty code execution and you might not get informed about those faults. Interestingly OpenCL specification doesn't provide any means to get such detailed info on kernel execution. For example - errors possible with clEnqueueNDRangeKernel (the function that enqueues your kernel for execution) are limited only to some core stuff like "out of memory", "not defined kernel" and so on (see here: http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueNDRangeKernel.html). It seems that this will be the hard part. Br, Witold Bo?t | witold.bolt at hope.art.pl | +48 600 374 215 Wiadomo?? napisana przez Gerard Ziemski w dniu 14 lis 2012, o godz. 20:45: > hi all, > > Forgive me if I'm addressing the elephant in the room, but wouldn't it have been useful to invite NVIDIA to participate in these calls? It seems to me that today's discussion covered a lot of general topics that would apply to the other GPU team and even the platform specific questions to AMD folks would benefit the project if answered from the NVIDIA team at the same time. > > A couple of general questions: > > #1 What kind of performance counters does the GPU provide that could possibly help the JVM to figure out the most optimal implementation when compiling the Java code for it? Is it possible to have some sort of feedback loop from GPU telling JVM how well its code is executing? > > #2 Is it possible to debug/analyze code executing on GPU? Can it be exposed in Java debugger in any way? > > #3 Can a GPU program crash? What are the examples that would cause it crash and how is it handled? > > #4 For a client system, which I will define as a system where a process is just one of many other processes running concurrently that may want to use the same resources (ie. for GPU other apps such as a window manager, browser), or even the same process that uses the GPU for computation and also for rendering visuals (ie. JavaFX which uses GPU to render its UI), is it a concern to possibly starve other parts of the system by JVM if it starts using the GPU aggressively? > > > cheers > From bharadwaj.yadavalli at oracle.com Thu Nov 15 11:28:25 2012 From: bharadwaj.yadavalli at oracle.com (Bharadwaj Yadavalli) Date: Thu, 15 Nov 2012 14:28:25 -0500 Subject: Compiling to heterogeneous core ISA Message-ID: <50A54259.5050104@oracle.com> On a heterogeneous system, one can imagine various ways for the (Java or any other language JIT) compiler can decide the architecture (GPU or CPU) to target. Techniques such as language extensions (or some form of hints provided by the application developer); automatic decision making by the compiler (such as automatic parallelization) - along with some combination of the two - come to mind. However, I am interested in exploring the possibility of generating code for GPUs for code written using the currently defined Java language constructs and using the existing JVM specification. I believe that this exploration will expose missing pieces, if any - either in the language or the JVM - that are needed to achieve the goal of generating code to target the heterogeneous cores, while inflicting as small amount of pain as possible on the language users, yet maximizing the utilization of the compute resources. As we all know, Lambda expressions for Java are taking shape and are planned to be supported in the very near future, I am wondering if we could use lambda function definition as the hint for the compiler to generate code for the lambda function to run on the GPU (providing any necessary conditions are met). At the same time, the compiler will also need to generate code to map data to execute the lambda function on GPU to the appropriate memory space. This model will conceptually allow for visualizing and implementing a hotspot compiler (C1 or C2) backend targeting GPUs - one of the areas I plan to work in. By starting with compiling lambda functions to GPUs, I believe we can get a better understanding of the ways to make online decisions of work scheduling between various heterogeneous cores and using the lessons to potentially extend compilation to standard (non-lambda) class methods targeting GPUs. I realize there are several people working on projects such as Aparapi on the mailing-list and am hoping that I can get comments and suggestions from them and others that have had experience in writing code using such frameworks. Thanks, Bharadwaj From john.r.rose at oracle.com Thu Nov 15 14:16:20 2012 From: john.r.rose at oracle.com (John Rose) Date: Thu, 15 Nov 2012 14:16:20 -0800 Subject: Fwd: hg: lambda/lambda/jdk: (weak) parallel version of SortedOp References: <50A550F5.1060101@oracle.com> Message-ID: <03211544-AE4F-4022-B939-E3101A9C545C@oracle.com> FYI: This Lambda milestone prepares the ground for aggressive parallel implementations of bulk data ops. ? John Begin forwarded message: > From: Brian Goetz > Date: November 15, 2012, 12:30:45 PM PST > To: lambda-dev at openjdk.java.net > Subject: Re: hg: lambda/lambda/jdk: (weak) parallel version of SortedOp > > This is a minor milestone -- all implemented Stream operations now have > parallel implementations (previously, some parallel implementations were > just the serial implementation.) They don't all have *good* parallel > implementations, but we'll work on that. > > On 11/15/2012 2:27 PM, brian.goetz at oracle.com wrote: >> Changeset: 3832536fbed9 >> Author: briangoetz >> Date: 2012-11-15 14:27 -0500 >> URL: http://hg.openjdk.java.net/lambda/lambda/jdk/rev/3832536fbed9 >> >> (weak) parallel version of SortedOp >> >> ! src/share/classes/java/util/streams/ops/SortedOp.java > From Gary.Frost at amd.com Fri Nov 16 08:11:40 2012 From: Gary.Frost at amd.com (Frost, Gary) Date: Fri, 16 Nov 2012 16:11:40 +0000 Subject: Milestones In-Reply-To: <0104898D-FFE7-41F4-AE45-03CAFC6CA49A@oracle.com> Message-ID: John, Thanks for the feedback. Sorry for the delay following up - I have been traveling. Now I want to call this initial 'Sumatra/Aparapi tracer bullet milestone'. 'Reindeer' ;) You ask whether we need to use Lambda in this first milestone. We could obviously skip it, but we have already coded up Aparapi changes to create OpenCL from byte code created by lambdas, and we intend to (in parallel) lambdafy Aparapi, so this is not too much of a distraction at this time. It does imply that we fork the initial code from the 'Project Lambda' source - unless the lambda/openjdk 8 merge is imminent. Thanks for the list of suggested other milestones. This is a great list. I am on vacation for a week or so, but I do plan to pull together this list of milestones as well as the research items listed in project 'Reindeer' - I wasn't joking!-) on the Sumatra project wiki. Gary On 11/13/12 7:03 PM, "John Rose" wrote: >Ah, please ignore the reindeer. That's a corporate trade secret weapon. > >Pretend I said "rendered". > >-- John (on my iPhone) > >On Nov 13, 2012, at 4:53 PM, John Rose wrote: > >> On Nov 8, 2012, at 6:18 AM, "Harle, Christophe" >> wrote: >> >>> This proposed milestone involves moving enough of Aparapi's >>>implementation down into the Project Lambda enabled JVM to allow Lambda >>>enabled Aparapi demos/workloads to execute directly (no Aparapi.dll). >>>Whilst this effort would constrain demos/examples to Aparapi's existing >>>programming model constraints (parallel primitive arrays only) it would >>>act as a 'tracer bullet' through the code base, allowing us to >>>familiarize ourselves with the various major JVM components that we >>>think will to impacted via a Sumatra implementation. >> >> The "tracer bullet" is good. >> >> Do we really need lambda for a first victory? I think online code >>generation, just by itself, even without execution, would be huge. >> >> Other milestones: >> ? First use of CompileBroker to trigger GPU codegen. >> ? First use of C2 IR rendered to GPU code. >> (Note: Codegen does not imply execution. Could be OptoNoExecute mode.) >> ? First run of GPU code from online generation. >> ? First run of GPU code from (one-off) lambda collection library. >> ? First run of GPU code from (enhanced) standard parallel collection. >> ? First profitable run of ditto. >> >> On the data front: >> ? First use of pointer-chained Java data structure from GPU. >> ? First GC tracing through GPU task data. >> ? First call and return of GPU method with zero copying. (Already >>done?) Here, "zero" is any O(#GPUs), which is less than the bulk data >>size. >> >> Not sure exactly how to order these. >> >> I am convince that there are about 3 fronts that need roghly >>simultaneous advance: >> ? code: how to get code into the GPUs. Should be an online hot-spot >>reindeer by a JIT. >> ? data: how to get data in and out. Should contain managed pointers, >>but not as many as classic Java data. I.e. flatter with more "little >>structs". See Arrays 2.0 and value types. >> ? Libraries: teach parallel collections to generate GPU commands on >>enabled systems. Translate lambda bodies and loop kernels to GPU code. >> >> Best wishes, >> ? John >