[OpenJDK Rasterizer] Marlin #4
Jim Graham
james.graham at oracle.com
Thu Sep 24 17:26:34 UTC 2015
Hi Laurent,
You are looking at the wrong loop. It's tough to explain...
vis_*.c are only ever compiled or used on Solaris. They convince the
compiler to emit Sparc's version of MMX instructions. They are not even
compiled on any other build except for Solaris.
You were probably confused because they look like the implementations of
the functions you were looking for and you never saw any other
implementation of that function. That's because all of the software
loops are actually constructed using a very complicated system of
Macros. If you look at loops/IntArgbPre.c you will see a bunch of macro
calls at the top which expand to declaring the functions such as
"IntArgbPreSrcMaskFill". Then you will see a structure with a bunch of
Macro invocations in it which expand to declaring a structure describing
the loops, one per loop function. Then you will see a bunch more macro
invocations, one per line, which surprisingly expand to entire functions
for each one of them.
You'd have to do some serious tracing of macros to see what the code
looks like, but most of the macros expand from either IntArgb.h or
LoopMacros.h...
...jim
On 9/24/15 7:59 AM, Laurent Bourgès wrote:
> Sergey,
>
> I managed to create a new benchmark with JMH + perfasm profiler:
> http://cr.openjdk.java.net/~lbourges/jmh/ellipse_fill/
>
> See MyBenchMark.java that fills an ellipse with radius in {"100", "500",
> "900", "1400"}
>
> I tested with both Oracle JDK8 and Oracle JDK9 EA b81 ie using the
> ductus rendering engine:
> http://cr.openjdk.java.net/~lbourges/jmh/ellipse_fill/bench_jdk8.log
> http://cr.openjdk.java.net/~lbourges/jmh/ellipse_fill/bench_jdk9.log
>
> JDK8:
> Benchmark (size) Mode Cnt Score Error Units
> MyBenchmark.fillEllipse 100 avgt 3 0,207 ± 0,034 ms/op
> MyBenchmark.fillEllipse 500 avgt 3 1,931 ± 0,112 ms/op
> MyBenchmark.fillEllipse 900 avgt 3 5,158 ± 0,346 ms/op
> MyBenchmark.fillEllipse 1400 avgt 3 9,628 ± 1,321 ms/op
>
> JDK9:
> Benchmark (size) Mode Cnt Score Error Units
> MyBenchmark.fillEllipse 100 avgt 3 0,223 ± 0,005 ms/op
> MyBenchmark.fillEllipse 500 avgt 3 2,069 ± 0,044 ms/op
> MyBenchmark.fillEllipse 900 avgt 3 5,393 ± 0,285 ms/op
> MyBenchmark.fillEllipse 1400 avgt 3 12,305 ± 0,104 ms/op
>
> JDK9 is slower ~ 10% in this test.
>
>
> I tried to interpret the profiler info but I just noticed the hotspots
> are located in native code (libawt.so):
>
> JDK8:
>
> ....[Hottest Regions]...............................................................................
> 48,53% 51,78% [0x7f78197f9ae1:0x7f78197f9b27] in IntArgbPreSrcMaskFill (libawt.so)
> 11,27% 11,68% [0x7f78197f9900:0x7f78197f9aa6] in IntArgbPreSrcMaskFill (libawt.so)
> 9,91% 11,58% [0x7f7813bc6527:0x7f7813bc65bd] in writeAlpha8 (libdcpr.so)
> 6,51% 2,73% [0x7f7813bc5471:0x7f7813bc560a] in processJumpBuffer; processSubBufferInTile (libdcpr.so)
> 2,13% 2,16% [0x7f7813bc6436:0x7f7813bc6506] in writeAlpha8 (libdcpr.so)
>
>
> JDK9:
> ...[Hottest
> Regions]...............................................................................
> 61,90% 66,72% [0x7f71ae7f5678:0x7f71ae7f5837] in
> IntArgbPreSrcMaskFill (libawt.so)
> 10,06% 5,40% [0x7f71acb0aa77:0x7f71acb0afa9] in processJumpBuffer;
> processSubBufferInTile; reset.isra.4 (libdcpr.so)
> 9,23% 10,45% [0x7f71acb0bb68:0x7f71acb0bc7d] in writeAlpha8
> (libdcpr.so)
>
> So this test is using the software pixel loop [IntArgbPreSrcMaskFill].
>
> I looked at the source code and compared the libawt / java2d / loops /
> vis_IntArgbPre_Mask.c from openjdk8 and openjdk9 but those are the same !
>
> Can it be a JNI issue or a compilation issue (gcc settings ...) with
> that native code ?
>
> Any idea, Sergey ?
>
> Thanks for the tips,
> Laurent
>
> 2015-09-24 4:17 GMT+02:00 Sergey Bylokhov <Sergey.Bylokhov at oracle.com
> <mailto:Sergey.Bylokhov at oracle.com>>:
>
> On 22.09.15 0:15, Laurent Bourgès wrote:
>
> Conclusion:
> The new patch seems promising as it is very close to ductus
> performance.
> Filling ellipse seems slower on OpenJDK9 (492 / 437 = 12%
> slower) ! Any
> MaskFill changes ?
>
>
> For such checks I suggest to use JMH + "prof perfasm". It will
> provide really good info per java methods(before/after compilation)
> including assemblers, plus the log include the native methods.
> Example looks like this:
> http://cr.openjdk.java.net/~shade/jmh/perfasm-sample.log
>
> http://openjdk.java.net/projects/code-tools/jmh
>
> It is really good in java2d because sometimes it is unclear where
> the problem is occurs(java or native or new objects etc), and any
> java profilers can change the behavior of application.
>
> --
> Best regards, Sergey.
>
>
More information about the graphics-rasterizer-dev
mailing list