RFR: 8187033: [PPC] Imporve performance of ObjectStreamClass.getClassDataLayout()
Peter Levart
peter.levart at gmail.com
Wed Sep 20 13:47:55 UTC 2017
Hi Ogata,
On 09/20/2017 12:12 PM, Kazunori Ogata wrote:
> Hi Peter,
>
> The benchmark is GradientBoostingTree of Intel HiBench [1]. HiBench is a
> suite of programs using Hadoop or Spark, and GradientBoostingTree is a
> Spark program. The source code (in Scala) is [2]. To build the code, you
> need Apache Spark.
>
> The command line is equivalent to java -Xmx10g -D spark.master="local[4]"
> GradientBoostingTree <inputDir> 100, but what I actually use is a Java
> program that calls the main method and measures its execution time using
> currentTimeMills().
>
> By the way, I'm running the benchmark on POWER8 machine. Removing
> volatile won't change the performance on x86.
>
>
> [1] https://github.com/intel-hadoop/HiBench
> [2]
> https://github.com/intel-hadoop/HiBench/blob/master/sparkbench/ml/src/main/scala/com/intel/sparkbench/ml/GradientBoostingTree.scala
>
>
> Regards,
> Ogata
>
Huh, I thought it would be something easier to run. Am I right that the
improvement we are expecting comes from execution of Java serialization
and deserialization of some data structure? If you could extract from
the benchmark just the approximate shape of the data structure and
typical values it contains, I could create a JMH benchmark that tests
just that part. Which would be appropriate to tune serialization code.
After some best variant is chosen, you could verify it by running your
test in your Spark setup. I think there is still room for improvement. I
have a few ideas I would like to test.
Regards, Peter
More information about the core-libs-dev
mailing list