<div dir="ltr">On Mon, Jun 17, 2013 at 1:40 PM, Laurent Bourgès <span dir="ltr"><<a href="mailto:bourges.laurent@gmail.com" target="_blank">bourges.laurent@gmail.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Andrea,<br>thanks for your time testing my patch in a real benchmark !<br><br>I think that the ratio of pisces rendering / request processing is very low (few percents) that's why the performance gains between L1 and L4 are so little.<br>


<br>How many cpu cores have your machine ?<br></blockquote><div><br></div><div style>It's a core I7 860, has 4 phisical cores, but the OS sees 8 because of hyperthreading (the extra 4 HT units can only do integer math as far as I remember... may be wrong about this)</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br><div class="gmail_quote"><div class="im"><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<div dir="ltr"><div><div>

<br></div><div>As you can see L1 provides most of the benefit, althought L4 managed to give another boost when the number of concurrent requests is higher.</div><div>The benchmarks have been run with the thread local storage option, I did not manage to run them with the concurrent linked queue approach (planning to do that next weekend).</div>


</div></div></blockquote></div><div><br>That's would be very interesting because CLQ mode is normally a bit slower than TL mode but in a web server it will avoid wasting memory ~ 1Mb per thread (for 200 threads ~ 200 to 300 Mb) ! <br>


<br>I still have to finalize some array sizing (initial capacity ...) of the renderer context to have a good compromise between performance and memory usage.<br></div></div></blockquote><div><br></div><div style>Yes, I see. I'll have a look.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_quote"><div> </div><div class="im"><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">


<div dir="ltr"><div>

<div><br></div><div>The remaining bottlenecks in the benchmarks are somewhat... funny? ;-)</div><div>Concurrency wise the major offender is now FreeTypeScaler.getLaboutTableCache() (the map has several labels), CPU wise the CLibPNGImageWriter. write(...) is eating 75% of the overall time request time... </div>


<div>This class comes with JAI ImageIO native extension, and it's a major speedup compared to the one built into the JDK, if I make GeoServer use that one the top performance goes down to 30req/s, a really major drop.</div>


<div>Huston, we really need a faster PNG encoder! :-p</div></div></div></blockquote></div><div><br>So you implicitly confirm that pisces only represents < 25% so let's say 10% of the request processing time.<br></div>

</div></blockquote><div><br></div><div style>Yes, in the past data loading from the OS file system cache and rendering were similarly sized, so I guess it's fair to say</div><div style>the renderer is now using around 10-12% of the overall processing time.<br>

Hard to be more precise since GeoServer is fully based on a streaming architecture, read a bunch of data,</div><div style>process it, read another bunch, in a way that makes it rather hard to separate the two elements in a profile.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_quote"><div><br>Many you should submit a concurrency issue related to FreeTypeScaler.getLaboutTableCache() !<br>

</div></div></blockquote><div><br></div><div style>I had a look, but all it's doing is to wrap a native method call, it may well be that the underlying native library</div><div style>is not thread safe and the synchronization is actually required.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_quote"><div>

<br>Could you perform benchmark using other image format (bmp, jpg or any faster encoding) ?<br></div></div></blockquote><div><br></div><div style>Yes, I can have a look, although it's going to be an academic exercise: that kind of map (typical road map</div>

<div style>with buildings and the like) is ever only requested in PNG, bmp is not compressed, JPEG ruins it visibly</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="gmail_quote"><div>Again it would be interesting to identify the performance bottleneck in the C library ? please look at JAI bugs ...<br></div></div></blockquote><div><br></div><div style>Yes, I'm going to spend some time looking at it, maybe oprofile can help?</div>

<div style>The issue here is that the CLIB encoder is a native one, but where are the CLIB sources?</div><div style><br></div><div style>Cheers</div><div style>Andrea</div><div> </div></div>-- <br><div dir="ltr"><div><div>

==</div><div>Our support, Your Success! Visit <a href="http://opensdi.geo-solutions.it" target="_blank">http://opensdi.geo-solutions.it</a> for more information.</div><div>==</div></div><div><br></div><div>Ing. Andrea Aime <br>

</div><div>@geowolf</div><div>Technical Lead</div><div><br></div><div>GeoSolutions S.A.S.</div><div>Via Poggio alle Viti 1187</div><div>55054  Massarosa (LU)</div><div>Italy</div><div>phone: +39 0584 962313</div><div>fax: +39 0584 1660272</div>

<div>mob: +39  339 8844549</div><div><br></div><div><a href="http://www.geo-solutions.it" target="_blank">http://www.geo-solutions.it</a></div><div><a href="http://twitter.com/geosolutions_it" target="_blank">http://twitter.com/geosolutions_it</a></div>

<div><br></div><div>-------------------------------------------------------</div></div>

</div></div>