CFV: Project Trinity

Mon Apr 24 19:54:54 UTC 2017

Hi Volker,

On 4/24/2017 12:30 PM, Volker Simonis wrote:
> This certainly sounds very ambitious! I'm not an expert in this area,
> but I don't think there's even a good C/C++ API which covers this
> broad range of "accelerators".
It may look ambitious, but if we restrict ourselves to a particular 
domain of bulk data processing and look at this library as a domain 
specific java library that offers a standard interface backend to 
multiple accelerators, it is still a plausible goal to achieve. Such an 
API will have the best chance for adoption in the "portable" Java world.

> What we should certainly avoid is
> providing an API which only works with accelerator XXX of vendor YYY.
Indeed, portability will be a key design goal.
> If the goal of this project is to eventually provide a standard Java
> API, it should at least support a wide range of available
> "accelerators" which, to repeat my self, makes it quite ambitious.
The biggest of the problems with accelerator offload is detection of 
code patterns that are suitable to be translated to the accelerator. 
With a dedicated API, and operations in a specific domain, we 
significantly simplify this problem. Based on some prototyping work we 
have done using DAX, we have clearly seen the merits of this approach.

We are just signing up to provide a better interface/implementation than 
what the current Streams API provides for offload and acceleration in 
this domain which is already being used by existing offload related 
projects. We are not targeting any dynamic code generation for these 
backends which would be redundant given the existing projects like 
Sumatra and AparAPI.
>
> That said, how is this new library supposed to work? Will it be mainly
> implemented in Java with various native C/C++ back-ends or do you plan
> to still use VM (aka. HotSpot) support via intrinsics and various
> other sorts of JIT compiler optimizations?
It is something that we would like to explore as part of this project 
with input from previous and ongoing offload related projects. Based on 
some of our initial experiments with both intrinsics and JNI, it does 
not have to be one or the other and we are open to both.  Especially 
with the artifacts offered by Panama, this will be very interesting to 
explore further.

Thanks,
Karthik

>
>> Inspired by the existing Streams API that brings succinct functional
>> programming to Java using lambdas, this project will try to retain such rich
>> features, significantly simplifying programming in Java for the performance
>> oriented developers focusing on bulk data processing.
>>
>> Regards,
>>
>> Karthik
>>
>>
>>
>> On 4/24/2017 4:09 AM, Doug Simon wrote:
>>>> On 24 Apr 2017, at 10:50, Volker Simonis <volker.simonis at gmail.com>
>>>> wrote:
>>>>
>>>> On Sun, Apr 23, 2017 at 1:39 PM, Doug Simon <doug.simon at oracle.com>
>>>> wrote:
>>>>>> On 21 Apr 2017, at 23:54, Christian Thalinger <cthalinger at twitter.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>> On Apr 21, 2017, at 11:41 AM, Karthik Ganesan
>>>>>>> <karthik.ganesan at oracle.com> wrote:
>>>>>>>
>>>>>>> Hi Christian,
>>>>>>>
>>>>>>> Thanks for your interest. This question was brought up previously in
>>>>>>> the discussion email thread for this project:
>>>>>>>
>>>>>>> Project Sumatra was aimed at translation of Java byte code to execute
>>>>>>> on
>>>>>>> GPU, which was an ambitious goal and a challenging task to take up. In
>>>>>>> this
>>>>>>> project, we aim to come up with APIs targeting the most common
>>>>>>> Analytics
>>>>>>> operations that can be readily offloaded to accelerators
>>>>>>> transparently. Most
>>>>>>> of the information needed for offload to the accelerator is expected
>>>>>>> to be
>>>>>>> readily provided by the API semantics and there by, simplifying the
>>>>>>> need to
>>>>>>> do tedious byte code analysis.
>>>>>> I disagree.  The first paragraph on the Sumatra project page says:
>>>>>>
>>>>>> "This primary goal of this project is to enable Java applications to
>>>>>> take advantage of graphics processing units (GPUs) and accelerated
>>>>>> processing units (APUs)--whether they are discrete devices or integrated
>>>>>> with a CPU--to improve performance.”
>>>>>>
>>>>>> while you state:
>>>>>>
>>>>>> "This Project would explore enhanced execution of bulk
>>>>>> aggregate calculations over Streams through offloading
>>>>>> calculations to hardware accelerators.”
>>>>>>
>>>>>> It’s the same thing.  I just don’t see the need to spin up yet-another
>>>>>> OpenJDK project that aims at the same goal.
>>>>> Maybe this is just a discrepancy between the officially stated aims. I
>>>>> understood Sumatra to be about *automatic* offloading work for existing APIs
>>>>> (such as the Streams API) to a GPU where as Trinity seems to be more about
>>>>> designing an explicit API for GPU offloading.
>>>>>
>>>> So if this is about a explicit API for GPU offloading, will this be a
>>>> Java implementation/wrapper for already existing C/C++ APIs like
>>>> CUDA/OpenCL. Designing a completely new, Java-specific API seems to be
>>>> not very promising to me.
>>> I agree.
>>>
>>> Karthik, maybe you could discuss the differences/similarities between
>>> Trinity and the Arapapi project (https://github.com/aparapi/aparapi).
>>>
>>> -Doug
>>