RFR: 8187033: [PPC] Imporve performance of ObjectStreamClass.getClassDataLayout()

Wed Sep 20 13:47:55 UTC 2017

Hi Ogata,

On 09/20/2017 12:12 PM, Kazunori Ogata wrote:
> Hi Peter,
>
> The benchmark is GradientBoostingTree of Intel HiBench [1].  HiBench is a
> suite of programs using Hadoop or Spark, and GradientBoostingTree is a
> Spark program.  The source code (in Scala) is [2].  To build the code, you
> need Apache Spark.
>
> The command line is equivalent to java -Xmx10g -D spark.master="local[4]"
> GradientBoostingTree <inputDir> 100, but what I actually use is a Java
> program that calls the main method and measures its execution time using
> currentTimeMills().
>
> By the way, I'm running the benchmark on POWER8 machine.  Removing
> volatile won't change the performance on x86.
>
>
> [1] https://github.com/intel-hadoop/HiBench
> [2]
> https://github.com/intel-hadoop/HiBench/blob/master/sparkbench/ml/src/main/scala/com/intel/sparkbench/ml/GradientBoostingTree.scala
>
>
> Regards,
> Ogata
>

Huh, I thought it would be something easier to run. Am I right that the 
improvement we are expecting comes from execution of Java serialization 
and deserialization of some data structure? If you could extract from 
the benchmark just the approximate shape of the data structure and 
typical values it contains, I could create a JMH benchmark that tests 
just that part. Which would be appropriate to tune serialization code. 
After some best variant is chosen, you could verify it by running your 
test in your Spark setup. I think there is still room for improvement. I 
have a few ideas I would like to test.

Regards, Peter