From tom.deneau at amd.com  Mon Oct 21 15:18:11 2013
From: tom.deneau at amd.com (Deneau, Tom)
Date: Mon, 21 Oct 2013 22:18:11 +0000
Subject: non-foreign-call  tlab refill from hsail
Message-ID: <BC97738F8E7C8742BABED7F06FB9DF915323A4FB@SATLEXDAG01.amd.com>

We are experimenting with object (and array) allocation from an HSA device (using graal for

the HSAIL codegen).  Where we are now:


   * the hsa workitems are using TLABs from "donor threads" who exist

     just to supply TLABs and don't do any allocation themselves.


   * To reduce the number of donor threads required, a TLAB can be

     used by more than one workitem, in which case the workitems use

     HSAIL atomic_add instructions to bump the tlab top pointer.


      * the HSAIL backend has its own fastpath allocation snippets

        which generate the HSAIL atomic_add instructions which

        override the snippets in NewObjectSnippets.java


Some junit tests have been written and pass which allocate objects, or arrays of primitives, or arrays of objects.


All the above only works for the fastpath case, i.e., if there is indeed enough space in the donor TLAB.  We realize there are other cases:


   a) not enough space in current TLAB but ability to allocate a new TLAB.


   b) not able to allocate a new TLAB, GC required.


For only case a) above, we would like to experiment with grabbing the new TLAB from HSAIL without making a "foreign call" to the VM.  From the hotspot code, I assume the logic required is what one sees in

    mutableSpace::cas_allocate(size_t size) at least for the non-G1 case.


Some of this non-foreign-call allocation logic appears to exist in the Snippet called NewInstanceStub.newInstance (as opposed to NewObjectSnippets.allocateInstance snippet which is what we are currently overriding).  This comments for this snippet say


   "Re-attempts allocation after an initial TLAB allocation failed or

   was skipped (e.g., due to * -XX:-UseTLAB)."


Is this NewInstanceStub.newInstance snippet actually used anywhere in the current graal framework.  Is this a starting point we could use to get a non-foreign-call TLAB refill working?


Or is this a path we should not even try to go down?


-- Tom


From doug.simon at oracle.com  Tue Oct 22 02:41:44 2013
From: doug.simon at oracle.com (Doug Simon)
Date: Tue, 22 Oct 2013 11:41:44 +0200
Subject: non-foreign-call  tlab refill from hsail
In-Reply-To: <BC97738F8E7C8742BABED7F06FB9DF915323A4FB@SATLEXDAG01.amd.com>
References: <BC97738F8E7C8742BABED7F06FB9DF915323A4FB@SATLEXDAG01.amd.com>
Message-ID: <33F60C70-33F8-40F6-AE82-DD817293856C@oracle.com>


On Oct 22, 2013, at 12:18 AM, "Deneau, Tom" <tom.deneau at amd.com> wrote:

> We are experimenting with object (and array) allocation from an HSA device (using graal for
> 
> the HSAIL codegen).  Where we are now:
> 
> 
> 
>   * the hsa workitems are using TLABs from "donor threads" who exist
> 
>     just to supply TLABs and don't do any allocation themselves.
> 
> 
> 
>   * To reduce the number of donor threads required, a TLAB can be
> 
>     used by more than one workitem, in which case the workitems use
> 
>     HSAIL atomic_add instructions to bump the tlab top pointer.
> 
> 
> 
>      * the HSAIL backend has its own fastpath allocation snippets
> 
>        which generate the HSAIL atomic_add instructions which
> 
>        override the snippets in NewObjectSnippets.java
> 
> 
> 
> Some junit tests have been written and pass which allocate objects, or arrays of primitives, or arrays of objects.
> 
> 
> 
> All the above only works for the fastpath case, i.e., if there is indeed enough space in the donor TLAB.  We realize there are other cases:
> 
> 
> 
>   a) not enough space in current TLAB but ability to allocate a new TLAB.
> 
> 
> 
>   b) not able to allocate a new TLAB, GC required.
> 
> 
> 
> For only case a) above, we would like to experiment with grabbing the new TLAB from HSAIL without making a "foreign call" to the VM.  From the hotspot code, I assume the logic required is what one sees in
> 
>    mutableSpace::cas_allocate(size_t size) at least for the non-G1 case.

When the NewInstanceStub fails to allocate a new TLAB, it calls out to GraalRuntime::new_instance (in graalRuntime.cpp).

> Some of this non-foreign-call allocation logic appears to exist in the Snippet called NewInstanceStub.newInstance (as opposed to NewObjectSnippets.allocateInstance snippet which is what we are currently overriding).  This comments for this snippet say
> 
>   "Re-attempts allocation after an initial TLAB allocation failed or
> 
>   was skipped (e.g., due to * -XX:-UseTLAB)."
> 
> 
> 
> Is this NewInstanceStub.newInstance snippet actually used anywhere in the current graal framework.

Yes, you can see a call to NewInstanceStubCall in NewObjectSnippets.allocateInstance.

>  Is this a starting point we could use to get a non-foreign-call TLAB refill working?

Yes. Note that this call *is* a foreign call (see the javadoc for ForeignCallDescriptor).

-Doug

From bharadwaj.yadavalli at oracle.com  Thu Oct 24 12:55:49 2013
From: bharadwaj.yadavalli at oracle.com (Bharadwaj Yadavalli)
Date: Thu, 24 Oct 2013 15:55:49 -0400
Subject: PTX backend development status
Message-ID: <52697B45.2030506@oracle.com>

A PTX backend that generates code for simple Java methods is taking 
shape in the graal sources (|hg clone 
http:|//hg.openjdk.java.net/graal/graal).
|
htps://wiki.openjdk.java.net/display/Graal/Main provides necessary 
information about building and running graal.

Current implementation can generate and execute PTX code on CUDA capable 
hardware (if one exists) for simple Java methods. Primitive type Java 
method argument passing and capture of return value work. Object type 
argument passing still needs work.

There are a few PTX tests in 
raal/com.oracle.graal.compiler.ptx.test/src/com/oracle/graal/compiler/ptx/test 
that can be run.

The generated PTX code can printed to stdout by passing the commandline 
option -XX:+TraceGPUInteraction to ./mx.sh

An initial goal is to be able to recognize Java constructs suitable for 
GPU execution at runtime.

Looking forward to all PTX experts to contribute in enhancing this 
OpenJDK project.

Thanks,

Bharadwaj

|