From bharadwaj.yadavalli at oracle.com Fri Jan 17 14:26:22 2014 From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli) Date: Fri, 17 Jan 2014 17:26:22 -0500 Subject: Execution of a simple Lambda Method on GPU Message-ID: <52D9AE0E.7010903@oracle.com> A quick update on reaching the next milestone in executing Lambda methods on GPUs as a step towards the goals of Project Sumatra. The JVM in Graal repo, now has the the ability to recognize a simple Lambda method defined in main class method, schedule compilation of such a method to target PTX backend, offload execution of the generated PTX code on a supported nVidia GPU hardware and get back the result to the VM. This was made possible owing to recent enhancements and code reorganization made in Graal compiler to enhance support for GPU backends. The current implementation is in experimental stage. For example, executing the Java class [1] results in [2]. Additional refinements to choose candidate Java methods for offloading to GPU are planned. We look forward to any open source community involvement that will help as move faster towards the next milestones of Project Sumatra. Regards, Bharadwaj [1] interface BinaryOperation { int apply(int a, int b); } class FindSumL { public static void main(String args[]) { BinaryOperation add = (x, y) -> x + y; int result = add.apply(8, 12); System.out.println("Sum is " + result); } } [2] $ ./mx.sh vmg -XX:+TraceGPUInteraction -XX:-BootstrapGraal -G:Threads=1 FindSumL Found supported nVidia GPU device vendor : 0x10de device 0x06dd gpu_linux::probe_gpu(): 1 [CUDA] Success: library linkage CUDA driver initialization: Success [CUDA] Number of compute-capable devices found: 2 [CUDA] Got the handle of first compute-device [CUDA] Unified addressing support on device 0: 1 [CUDA] Using GeForce GTX 780 Compiling Lambda method FindSumL::lambda$main$0 to PTX [CUDA] Success: Created context for device: 0 [CUDA] Success: Set current context for device: 0 [CUDA] PTX Kernel .version 3.0 .target sm_30 .entry lambda$main$0 ( .param .s32 param1, .param .s32 param2, .param .u64 param0 ) { .reg .s32 %r3; .reg .s32 %r4; .reg .s32 %r5; .reg .u64 %r6; L0: ld.param.s32 %r3, [param1 + 0]; ld.param.s32 %r4, [param2 + 0]; add.s32 %r5, %r4, %r3; ld.param.u64 %r6, [param0 + 0]; st.global.s32 [%r6 + 0], %r5; ret; } [CUDA] Function name : lambda$main$0 [CUDA] Loaded data for PTX Kernel [CUDA] Got function handle for lambda$main$0 kernel address 0x7fba686bfd80 [CUDA] Generated kernel External method:FindSumL.lambda$main$0(II)I installCode0: ExternalCompilationResult [CUDA] launching kernel [CUDA] Success: Kernel Launch: X: 1 Y: 1 Z: 1 [CUDA] Success: Synchronized launch kernel [CUDA] Success: Freed device memory of return value [CUDA] Success: Destroy context Sum is 20 From Gary.Frost at amd.com Fri Jan 17 15:58:51 2014 From: Gary.Frost at amd.com (Frost, Gary) Date: Fri, 17 Jan 2014 23:58:51 +0000 Subject: Execution of a simple Lambda Method on GPU In-Reply-To: <52D9AE0E.7010903@oracle.com> References: <52D9AE0E.7010903@oracle.com> Message-ID: Bharadwaj, Nice work. It is great to see the Graal GPU backend and the Sumatra JVM enhancements required to dispatch to the GPU progressing like this. We noted the recent checkins and knew something good was coming! ;) Gary -----Original Message----- From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev-bounces at openjdk.java.net] On Behalf Of S. Bharadwaj Yadavalli Sent: Friday, January 17, 2014 4:26 PM To: sumatra-dev at openjdk.java.net; graal-dev at openjdk.java.net Subject: Execution of a simple Lambda Method on GPU A quick update on reaching the next milestone in executing Lambda methods on GPUs as a step towards the goals of Project Sumatra. The JVM in Graal repo, now has the the ability to recognize a simple Lambda method defined in main class method, schedule compilation of such a method to target PTX backend, offload execution of the generated PTX code on a supported nVidia GPU hardware and get back the result to the VM. This was made possible owing to recent enhancements and code reorganization made in Graal compiler to enhance support for GPU backends. The current implementation is in experimental stage. For example, executing the Java class [1] results in [2]. Additional refinements to choose candidate Java methods for offloading to GPU are planned. We look forward to any open source community involvement that will help as move faster towards the next milestones of Project Sumatra. Regards, Bharadwaj [1] interface BinaryOperation { int apply(int a, int b); } class FindSumL { public static void main(String args[]) { BinaryOperation add = (x, y) -> x + y; int result = add.apply(8, 12); System.out.println("Sum is " + result); } } [2] $ ./mx.sh vmg -XX:+TraceGPUInteraction -XX:-BootstrapGraal -G:Threads=1 FindSumL Found supported nVidia GPU device vendor : 0x10de device 0x06dd gpu_linux::probe_gpu(): 1 [CUDA] Success: library linkage CUDA driver initialization: Success [CUDA] Number of compute-capable devices found: 2 [CUDA] Got the handle of first compute-device [CUDA] Unified addressing support on device 0: 1 [CUDA] Using GeForce GTX 780 Compiling Lambda method FindSumL::lambda$main$0 to PTX [CUDA] Success: Created context for device: 0 [CUDA] Success: Set current context for device: 0 [CUDA] PTX Kernel .version 3.0 .target sm_30 .entry lambda$main$0 ( .param .s32 param1, .param .s32 param2, .param .u64 param0 ) { .reg .s32 %r3; .reg .s32 %r4; .reg .s32 %r5; .reg .u64 %r6; L0: ld.param.s32 %r3, [param1 + 0]; ld.param.s32 %r4, [param2 + 0]; add.s32 %r5, %r4, %r3; ld.param.u64 %r6, [param0 + 0]; st.global.s32 [%r6 + 0], %r5; ret; } [CUDA] Function name : lambda$main$0 [CUDA] Loaded data for PTX Kernel [CUDA] Got function handle for lambda$main$0 kernel address 0x7fba686bfd80 [CUDA] Generated kernel External method:FindSumL.lambda$main$0(II)I installCode0: ExternalCompilationResult [CUDA] launching kernel [CUDA] Success: Kernel Launch: X: 1 Y: 1 Z: 1 [CUDA] Success: Synchronized launch kernel [CUDA] Success: Freed device memory of return value [CUDA] Success: Destroy context Sum is 20 From doychin at dsoft-bg.com Sat Jan 18 07:04:07 2014 From: doychin at dsoft-bg.com (Doychin Bondzhev) Date: Sat, 18 Jan 2014 17:04:07 +0200 Subject: Any instructions how to test current code on HSA enabled APU? Message-ID: <52DA97E7.3030603@dsoft-bg.com> Hi, guys I'm looking for some instructions how I can test current code from graal on Kavery APU. I already have VM with ubuntu linux that I used to build and run graal tests but I want to try it on real hardware and write some sample code. I plan to buy A10-7850 next week and use it for playing with these new features. thanks in advance -- Doychin Bondzhev dSoft-Bulgaria Ltd. PowerPro - billing & provisioning solution for Service providers PowerStor - Warehouse & POS http://www.dsoft-bg.com/ Mobile: +359888243116 From tom.deneau at amd.com Mon Jan 20 08:45:46 2014 From: tom.deneau at amd.com (Deneau, Tom) Date: Mon, 20 Jan 2014 16:45:46 +0000 Subject: Execution of a simple Lambda Method on GPU In-Reply-To: <52D9AE0E.7010903@oracle.com> References: <52D9AE0E.7010903@oracle.com> Message-ID: Bharadwaj -- There is a bug in Method::is_lambda() in that it does not return false if methodPrefix == 0. Actually I don't understand why the gcc compiler would not complain about this. Also, I noticed the CompilationPolicy::must_be_compiled, is_lambda path is not hit in the debug build. Is this by design? -- Tom > -----Original Message----- > From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev- > bounces at openjdk.java.net] On Behalf Of S. Bharadwaj Yadavalli > Sent: Friday, January 17, 2014 4:26 PM > To: sumatra-dev at openjdk.java.net; graal-dev at openjdk.java.net > Subject: Execution of a simple Lambda Method on GPU > > > A quick update on reaching the next milestone in executing Lambda > methods on GPUs as a step towards the goals of Project Sumatra. > > The JVM in Graal repo, now has the the ability to recognize a simple > Lambda method defined in main class method, schedule compilation of such > a method to target PTX backend, offload execution of the generated PTX > code on a supported nVidia GPU hardware and get back the result to the > VM. > > This was made possible owing to recent enhancements and code > reorganization made in Graal compiler to enhance support for GPU > backends. > > The current implementation is in experimental stage. For example, > executing the Java class [1] results in [2]. Additional refinements to > choose candidate Java methods for offloading to GPU are planned. We look > forward to any open source community involvement that will help as move > faster towards the next milestones of Project Sumatra. > > Regards, > > Bharadwaj > > [1] > interface BinaryOperation { > int apply(int a, int b); > } > > class FindSumL { > public static void main(String args[]) { > BinaryOperation add = (x, y) -> x + y; > int result = add.apply(8, 12); > System.out.println("Sum is " + result); > } > } > > [2] > $ ./mx.sh vmg -XX:+TraceGPUInteraction -XX:-BootstrapGraal -G:Threads=1 > FindSumL > Found supported nVidia GPU device vendor : 0x10de device 0x06dd > gpu_linux::probe_gpu(): 1 > [CUDA] Success: library linkage > CUDA driver initialization: Success > [CUDA] Number of compute-capable devices found: 2 > [CUDA] Got the handle of first compute-device > [CUDA] Unified addressing support on device 0: 1 > [CUDA] Using GeForce GTX 780 > Compiling Lambda method FindSumL::lambda$main$0 to PTX > [CUDA] Success: Created context for device: 0 > [CUDA] Success: Set current context for device: 0 > [CUDA] PTX Kernel > .version 3.0 > .target sm_30 > .entry lambda$main$0 ( > .param .s32 param1, > .param .s32 param2, > .param .u64 param0 > ) { > .reg .s32 %r3; > .reg .s32 %r4; > .reg .s32 %r5; > .reg .u64 %r6; > L0: > ld.param.s32 %r3, [param1 + 0]; > ld.param.s32 %r4, [param2 + 0]; > add.s32 %r5, %r4, %r3; > ld.param.u64 %r6, [param0 + 0]; > st.global.s32 [%r6 + 0], %r5; > ret; > } > > [CUDA] Function name : lambda$main$0 > [CUDA] Loaded data for PTX Kernel > [CUDA] Got function handle for lambda$main$0 kernel address > 0x7fba686bfd80 > [CUDA] Generated kernel > External method:FindSumL.lambda$main$0(II)I > installCode0: ExternalCompilationResult > [CUDA] launching kernel > [CUDA] Success: Kernel Launch: X: 1 Y: 1 Z: 1 > [CUDA] Success: Synchronized launch kernel > [CUDA] Success: Freed device memory of return value > [CUDA] Success: Destroy context > Sum is 20 > > > From doug.simon at oracle.com Mon Jan 20 09:18:01 2014 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 20 Jan 2014 18:18:01 +0100 Subject: Execution of a simple Lambda Method on GPU In-Reply-To: References: <52D9AE0E.7010903@oracle.com> Message-ID: <4DA6845C-1649-4264-A87C-AEDAA904A968@oracle.com> Tom, You can now file bug reports for Graal via JBS: https://bugs.openjdk.java.net/browse/GRAAL For smaller issues such as the one below, email is maybe still a better option. -Doug On Jan 20, 2014, at 5:45 PM, Deneau, Tom wrote: > Bharadwaj -- > > There is a bug in Method::is_lambda() in that it does not return false > if methodPrefix == 0. Actually I don't understand why the gcc > compiler would not complain about this. > > Also, I noticed the CompilationPolicy::must_be_compiled, is_lambda path > is not hit in the debug build. Is this by design? > > -- Tom > >> -----Original Message----- >> From: sumatra-dev-bounces at openjdk.java.net [mailto:sumatra-dev- >> bounces at openjdk.java.net] On Behalf Of S. Bharadwaj Yadavalli >> Sent: Friday, January 17, 2014 4:26 PM >> To: sumatra-dev at openjdk.java.net; graal-dev at openjdk.java.net >> Subject: Execution of a simple Lambda Method on GPU >> >> >> A quick update on reaching the next milestone in executing Lambda >> methods on GPUs as a step towards the goals of Project Sumatra. >> >> The JVM in Graal repo, now has the the ability to recognize a simple >> Lambda method defined in main class method, schedule compilation of such >> a method to target PTX backend, offload execution of the generated PTX >> code on a supported nVidia GPU hardware and get back the result to the >> VM. >> >> This was made possible owing to recent enhancements and code >> reorganization made in Graal compiler to enhance support for GPU >> backends. >> >> The current implementation is in experimental stage. For example, >> executing the Java class [1] results in [2]. Additional refinements to >> choose candidate Java methods for offloading to GPU are planned. We look >> forward to any open source community involvement that will help as move >> faster towards the next milestones of Project Sumatra. >> >> Regards, >> >> Bharadwaj >> >> [1] >> interface BinaryOperation { >> int apply(int a, int b); >> } >> >> class FindSumL { >> public static void main(String args[]) { >> BinaryOperation add = (x, y) -> x + y; >> int result = add.apply(8, 12); >> System.out.println("Sum is " + result); >> } >> } >> >> [2] >> $ ./mx.sh vmg -XX:+TraceGPUInteraction -XX:-BootstrapGraal -G:Threads=1 >> FindSumL >> Found supported nVidia GPU device vendor : 0x10de device 0x06dd >> gpu_linux::probe_gpu(): 1 >> [CUDA] Success: library linkage >> CUDA driver initialization: Success >> [CUDA] Number of compute-capable devices found: 2 >> [CUDA] Got the handle of first compute-device >> [CUDA] Unified addressing support on device 0: 1 >> [CUDA] Using GeForce GTX 780 >> Compiling Lambda method FindSumL::lambda$main$0 to PTX >> [CUDA] Success: Created context for device: 0 >> [CUDA] Success: Set current context for device: 0 >> [CUDA] PTX Kernel >> .version 3.0 >> .target sm_30 >> .entry lambda$main$0 ( >> .param .s32 param1, >> .param .s32 param2, >> .param .u64 param0 >> ) { >> .reg .s32 %r3; >> .reg .s32 %r4; >> .reg .s32 %r5; >> .reg .u64 %r6; >> L0: >> ld.param.s32 %r3, [param1 + 0]; >> ld.param.s32 %r4, [param2 + 0]; >> add.s32 %r5, %r4, %r3; >> ld.param.u64 %r6, [param0 + 0]; >> st.global.s32 [%r6 + 0], %r5; >> ret; >> } >> >> [CUDA] Function name : lambda$main$0 >> [CUDA] Loaded data for PTX Kernel >> [CUDA] Got function handle for lambda$main$0 kernel address >> 0x7fba686bfd80 >> [CUDA] Generated kernel >> External method:FindSumL.lambda$main$0(II)I >> installCode0: ExternalCompilationResult >> [CUDA] launching kernel >> [CUDA] Success: Kernel Launch: X: 1 Y: 1 Z: 1 >> [CUDA] Success: Synchronized launch kernel >> [CUDA] Success: Freed device memory of return value >> [CUDA] Success: Destroy context >> Sum is 20 >> >> >> > > From bharadwaj.yadavalli at oracle.com Mon Jan 20 16:34:24 2014 From: bharadwaj.yadavalli at oracle.com (S. Bharadwaj Yadavalli) Date: Mon, 20 Jan 2014 19:34:24 -0500 Subject: Execution of a simple Lambda Method on GPU In-Reply-To: References: <52D9AE0E.7010903@oracle.com> Message-ID: <52DDC090.2060200@oracle.com> Hi Tom, Thank you for looking at the changes. On 01/20/2014 11:45 AM, Deneau, Tom wrote: > There is a bug in Method::is_lambda() in that it does not return false > if methodPrefix == 0. Actually I don't understand why the gcc > compiler would not complain about this. Yes, that was a coding error. I notice that a fix has already been pushed for this. > Also, I noticed the CompilationPolicy::must_be_compiled, is_lambda path > is not hit in the debug build. Is this by design? I am not sure I understand your observation/question. Using a debug build, I am able to see that is_lambda() is executed in the debugger, e.g., for a lambda method as well as for a non-lambda method [1]. However, the test is currently too restrictive and only picks on lambda methods in main method. There is no real reason to be that restrictive. The code is just a first cut and I expect to improve on the criteria to select lambda methods to be offloaded to GPUs. Thanks, Bharadwaj [1] .... (gdb) b 117 Breakpoint 3 at 0x7ffff68186e5: file /home/bharadwaj/graal/src/share/vm/runtime/compilationPolicy.cpp, line 117. (gdb) c Continuing. CUDA driver initialization: Success [CUDA] Number of compute-capable devices found: 2 [CUDA] Got the handle of first compute-device [CUDA] Unified addressing support on device 0: 1 [CUDA] Using GeForce GTX 780 Compiling Lambda method FindSumL::lambda$main$0 Breakpoint 3, CompilationPolicy::must_be_compiled (m=..., comp_level=-1) at /home/bharadwaj/graal/src/share/vm/runtime/compilationPolicy.cpp:117 117 switch (gpu::get_target_il_type()) { (gdb) n 119 tty->print_cr(" to PTX"); (gdb) to PTX 120 break; (gdb) n 128 return true; (gdb) b 110 Breakpoint 4 at 0x7ffff6818652: file /home/bharadwaj/graal/src/share/vm/runtime/compilationPolicy.cpp, line 110. (gdb) c Continuing. Breakpoint 4, CompilationPolicy::must_be_compiled (m=..., comp_level=-1) at /home/bharadwaj/graal/src/share/vm/runtime/compilationPolicy.cpp:110 110 if (m->is_lambda()) { (gdb) n 133 if (!can_be_compiled(m, comp_level)) return false; (gdb) call m->print_value() {method} {0x00007fffe9e264f0} 'compileMethod' '(JIZ)V' in 'com/oracle/graal/hotspot/bridge/VMToCompilerImpl'(gdb) ... From tom.deneau at amd.com Tue Jan 21 13:51:57 2014 From: tom.deneau at amd.com (Deneau, Tom) Date: Tue, 21 Jan 2014 21:51:57 +0000 Subject: question on TLABs Message-ID: A hotspot-related question about java heap buffers... A few months ago we had experimented with allowing the hsail kernel to do java heap allocations. At that time, we had one or more inactive "donor threads" which were normal Java threads who didn't do any allocations but their TLABs were used by the hsail kernel. (A single TLAB can be shared among multiple workitems). Since that time our hsail runtime routines have been more tightly integrated into the JVM (as opposed to going thru JNI as they used to) so that gives us more flexibility in this area. My question is, what is the preferred way to get some heap memory that the hsail kernel can allocate into? Do we need to have a do-nothing java thread backing up each tlab? Can the GC be made aware of TLABs that are not associated with threads? -- Tom From doug.simon at oracle.com Wed Jan 22 12:24:16 2014 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 22 Jan 2014 21:24:16 +0100 Subject: Upcoming GPU backend changes Message-ID: Hi, I?ve just created a JBS issue supporting multiple co-existing GPU backends: https://bugs.openjdk.java.net/browse/GRAAL-1 Prior to making the C++ changes need for this, I want to bring the HSAIL Java code more in alignment with the recently revised PTX backend. Namely: 1. All compilation harness logic will be moved into HSAILHotSpotBackend (out of HSAILCompilationResult). 2. Create the equivalent of PTXWrapperBuilder for HSAIL. This is a generated stub that handles the transition from CPU to GPU and back. It also removes the need for the C++ HSAILKernelArguments class. If you have any questions or concerns about this, please let me know asap. I?m happy to have a Skype call as well if that is desirable. -Doug From tom.deneau at amd.com Wed Jan 22 13:06:14 2014 From: tom.deneau at amd.com (Deneau, Tom) Date: Wed, 22 Jan 2014 21:06:14 +0000 Subject: Upcoming GPU backend changes In-Reply-To: References: Message-ID: Doug -- It would be good if AMD could get access to this before it gets checked in so we can give some feedback. In particular, there are hsail-hardware based targets which are not publicly available which we can test on. -- Tom > -----Original Message----- > From: graal-dev-bounces at openjdk.java.net [mailto:graal-dev- > bounces at openjdk.java.net] On Behalf Of Doug Simon > Sent: Wednesday, January 22, 2014 2:24 PM > To: sumatra-dev at openjdk.java.net; graal-dev at openjdk.java.net > Subject: Upcoming GPU backend changes > > Hi, > > I've just created a JBS issue supporting multiple co-existing GPU > backends: https://bugs.openjdk.java.net/browse/GRAAL-1 > > Prior to making the C++ changes need for this, I want to bring the HSAIL > Java code more in alignment with the recently revised PTX backend. > Namely: > > 1. All compilation harness logic will be moved into HSAILHotSpotBackend > (out of HSAILCompilationResult). > 2. Create the equivalent of PTXWrapperBuilder for HSAIL. This is a > generated stub that handles the transition from CPU to GPU and back. It > also removes the need for the C++ HSAILKernelArguments class. > > If you have any questions or concerns about this, please let me know > asap. I'm happy to have a Skype call as well if that is desirable. > > -Doug > > From doug.simon at oracle.com Wed Jan 22 13:10:37 2014 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 22 Jan 2014 22:10:37 +0100 Subject: Upcoming GPU backend changes In-Reply-To: References: Message-ID: Sure. I?ll post a webrev when I have one. -Doug On Jan 22, 2014, at 10:06 PM, Deneau, Tom wrote: > Doug -- > > It would be good if AMD could get access to this before it gets checked in > so we can give some feedback. In particular, there are hsail-hardware based targets > which are not publicly available which we can test on. > > -- Tom > > >> -----Original Message----- >> From: graal-dev-bounces at openjdk.java.net [mailto:graal-dev- >> bounces at openjdk.java.net] On Behalf Of Doug Simon >> Sent: Wednesday, January 22, 2014 2:24 PM >> To: sumatra-dev at openjdk.java.net; graal-dev at openjdk.java.net >> Subject: Upcoming GPU backend changes >> >> Hi, >> >> I've just created a JBS issue supporting multiple co-existing GPU >> backends: https://bugs.openjdk.java.net/browse/GRAAL-1 >> >> Prior to making the C++ changes need for this, I want to bring the HSAIL >> Java code more in alignment with the recently revised PTX backend. >> Namely: >> >> 1. All compilation harness logic will be moved into HSAILHotSpotBackend >> (out of HSAILCompilationResult). >> 2. Create the equivalent of PTXWrapperBuilder for HSAIL. This is a >> generated stub that handles the transition from CPU to GPU and back. It >> also removes the need for the C++ HSAILKernelArguments class. >> >> If you have any questions or concerns about this, please let me know >> asap. I'm happy to have a Skype call as well if that is desirable. >> >> -Doug >> >> > > From doug.simon at oracle.com Mon Jan 27 07:26:28 2014 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 27 Jan 2014 16:26:28 +0100 Subject: Upcoming GPU backend changes In-Reply-To: References: Message-ID: Here?s a webrev for the first part of the changes described below: http://cr.openjdk.java.net/~dnsimon/remove-HSAILCompilationResult/ -Doug On Jan 22, 2014, at 10:10 PM, Doug Simon wrote: > Sure. I?ll post a webrev when I have one. > > -Doug > > On Jan 22, 2014, at 10:06 PM, Deneau, Tom wrote: > >> Doug -- >> >> It would be good if AMD could get access to this before it gets checked in >> so we can give some feedback. In particular, there are hsail-hardware based targets >> which are not publicly available which we can test on. >> >> -- Tom >> >> >>> -----Original Message----- >>> From: graal-dev-bounces at openjdk.java.net [mailto:graal-dev- >>> bounces at openjdk.java.net] On Behalf Of Doug Simon >>> Sent: Wednesday, January 22, 2014 2:24 PM >>> To: sumatra-dev at openjdk.java.net; graal-dev at openjdk.java.net >>> Subject: Upcoming GPU backend changes >>> >>> Hi, >>> >>> I've just created a JBS issue supporting multiple co-existing GPU >>> backends: https://bugs.openjdk.java.net/browse/GRAAL-1 >>> >>> Prior to making the C++ changes need for this, I want to bring the HSAIL >>> Java code more in alignment with the recently revised PTX backend. >>> Namely: >>> >>> 1. All compilation harness logic will be moved into HSAILHotSpotBackend >>> (out of HSAILCompilationResult). >>> 2. Create the equivalent of PTXWrapperBuilder for HSAIL. This is a >>> generated stub that handles the transition from CPU to GPU and back. It >>> also removes the need for the C++ HSAILKernelArguments class. >>> >>> If you have any questions or concerns about this, please let me know >>> asap. I'm happy to have a Skype call as well if that is desirable. >>> >>> -Doug >>> >>> >> >> > From doug.simon at oracle.com Thu Jan 30 08:15:36 2014 From: doug.simon at oracle.com (Doug Simon) Date: Thu, 30 Jan 2014 17:15:36 +0100 Subject: Upcoming GPU backend changes In-Reply-To: References: Message-ID: <4EED9455-B028-4A46-BFF5-857D1A449092@oracle.com> I?ve now pushed the remaining changes for supporting for co-existing GPUs: http://hg.openjdk.java.net/graal/graal/rev/49db2c1e3bee The major change is the removal of CompilerToGPU and its associated C++ code. Instead, each GPU backend has only the native methods it needs and directly implements them in the C++ layer. Also, all device initialization (linking device/simulator functions plus any device specific initialization) is *only* done in the constructors of PTXHotSpotBackend and HSAILHotSpotBackend. The Okra library is still correctly provisioned from okra-1.6-with-sim.jar if the latter is on the class path. Otherwise, Okra (still) relies on PATH and LD_LIBRARY_PATH being configured. To demonstrate this and to ensure the Sumatra prototype still works, I?ve created a webrev that includes Sumatra patch[1] directly in the Graal source code. This simplifies experimentation as one does not have to build a patched JDK: http://cr.openjdk.java.net/~dnsimon/sumatra-on-graal/ The webrev includes a top level sumatra.sh script for running the demo. I tested all this as much as I can but please let me know if it has broken anything. -Doug [1] http://cr.openjdk.java.net/~ecaspole/sumatrajdk.01/webrev/ On Jan 22, 2014, at 9:24 PM, Doug Simon wrote: > Hi, > > I?ve just created a JBS issue supporting multiple co-existing GPU backends: https://bugs.openjdk.java.net/browse/GRAAL-1 > > Prior to making the C++ changes need for this, I want to bring the HSAIL Java code more in alignment with the recently revised PTX backend. Namely: > > 1. All compilation harness logic will be moved into HSAILHotSpotBackend (out of HSAILCompilationResult). > 2. Create the equivalent of PTXWrapperBuilder for HSAIL. This is a generated stub that handles the transition from CPU to GPU and back. It also removes the need for the C++ HSAILKernelArguments class. > > If you have any questions or concerns about this, please let me know asap. I?m happy to have a Skype call as well if that is desirable. > > -Doug > > From tom.rodriguez at oracle.com Tue Jan 21 14:39:15 2014 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 21 Jan 2014 22:39:15 -0000 Subject: question on TLABs In-Reply-To: References: Message-ID: <9FC2AC40-B6B2-4261-A136-E5870BE2904E@oracle.com> On Jan 21, 2014, at 1:51 PM, Deneau, Tom wrote: > A hotspot-related question about java heap buffers... > > A few months ago we had experimented with allowing the hsail kernel to do java heap allocations. > At that time, we had one or more inactive "donor threads" which were normal Java threads > who didn't do any allocations but their TLABs were used by the hsail kernel. (A single > TLAB can be shared among multiple workitems). > > Since that time our hsail runtime routines have been more tightly integrated into the JVM > (as opposed to going thru JNI as they used to) so that gives us more flexibility in this area. > > My question is, what is the preferred way to get some heap memory that the hsail kernel > can allocate into? Do we need to have a do-nothing java thread backing up each flab? > Can the GC be made aware of TLABs that are not associated with threads? It should be relatively easy to create a list of TLABs which aren?t associated with Java threads. ThreadLocalAllocBuffer contains most of the machinery for managing them so you could create a C++ object containing one of those that can be used from the GPU. Presumably you?d need more than one so you can just have a global list of those. CollectedHeap::ensure_parsability would need to be updated to walk over them as is done for the threads before a GC so they behave like regular TLABs. You?ll also need to refactor CollectedHeap::allocate_from_tlab_slow to build the refill machinery for them. Presumably you have some mechanism for calling from the GPU to get them refilled already? The real question is whether there?s something to be gained from doing this. Apart from the vague sense that it?s not very clean to use donor threads, disassociating them from threads may make things inside the VM a little more confusing. There?s some logging and tracing machinery that knows about TLABs and creating a new source of TLABs could be a little confusing. Using a donor thread nicely segregates things from the regular Java threads which may make sense. You might also want different TLAB sizing with GPU allocation so reusing TLABs may not be the best way to go. Much of that could probably be handled with a refactoring of the TLAB code to generalize it a bit. My inclination would be to hold off on changing how TLABs are managed until the needs of GPU Java allocation are understood a little better but maybe things are clear enough? tom > > -- Tom >