[OpenJDK 2D-Dev] [PATCH] JDK-4627340 : RFE: A way to improve text printing performance for postscript devices (Improved proposal)

Mon May 12 07:16:36 UTC 2014

Hi,
now that my OCA has been processed I would like to bring to attention a 
proposal that I posted earlier this year to this group. Find below a 
repost of the most recent version of the patch.
I don't know if anyone remembers but the idea of the patch is to use 
embedded Type 3 fonts in favor of glyph vectors in calls to 
Graphics2D.drawString().
If required I can summarize the whole topic once more.
Kind regards,
Alex

-------- Original Message --------
Subject: 	[PATCH] JDK-4627340 : RFE: A way to improve text printing 
performance for postscript devices (Improved proposal)
Date: 	Tue, 04 Feb 2014 12:12:55 +0100
From: 	Alex Geller <ag at 4js.com>
To: 	2d-dev at openjdk.java.net

Hi,
my OCA submitted on January 10 didn't make its appearance and I am
guessing it is because I checked the "Vice president" option by mistake.
I have submitted a corrected version and I hope it is OK to continue
posting on this topic in spite of that.
After some minor adjustments to the patch I improved the test program so
that instead of printing a static test document it now reads external
files containing arbitrary attributed text. Based on the new version I
have created a couple of test documents, compared the output under
various settings and performed some benchmarks.

The changes to the patch are:
- For testing purposes the patch can be activated/deactivated by the
temporary system property "PSRenderType3"
- I removed the code that allowed to choose different byte encodings for
the purpose of yielding a more compact Postscript representation mainly
because text extraction via tools like "pstotext" or "ps2pdf" can't work
correctly with anything else then Latin1 encoding anyway (Disregarding
the option to use CID-Keyed fonts).
- I spotted "Font.deriveFont(1000f)" as a source of slowness and
replaced it with code that scales the glyphs of the original font instead.

The test program can now be invoked with some command line options as
list below:
$java PSTest -help
Usage: java PSTest -help|[-inputFileName file name ("example.xml")]
                           [-latinFontName font name ("SansSerif")]
                           [-asianFontName font name ("WenQuanYi Zen Hei")]
                           [-renderingMethod
DrawString|DrawGlyphVector|DrawTextLayout (DrawString)]
                           [-renderIntoBufferedImage true|false (false)]
                           [-useFractionalMetricsForLayoutComputation
true|false (true)]
                           [-useFractionalMetricsForPainting true|false
(true)]
                           [-useAntiAliasingForLayoutComputation
true|false (false)]
                           [-useAntiAliasingForPainting true|false (false)]
                           [-numberOfPrintIterations number (3)]
                           [-paintExpectedStringSizeMarkers number (false)]
                           [-bufferedImageDPI number (300.0)]

I created the following test documents:

-oracle.xml: This is conversion of the Oracle terms and conditions page
(http://www.oracle.com/us/legal/terms/index.html). The document has
about 20,000 characters printed on two pages.
                About 1% of the characters are rendered using the default
rendering due to text colorization (FOREGROUND attribute).
-t-mobile.xml: This is conversion of the T-Mobile terms and conditions
page
(http://www.t-mobile.com/Templates/Popup.aspx?PAsset=Ftr_Ftr_TermsAndConditions&print=true).
The document has about 56,000 characters printed on eight pages.
                About 0.1% of the characters are rendered using the
default rendering due to underlining (UNDERLINE attribute).
-baidu.xml: This is conversion of the Chinese Baidu terms and conditions
page (http://adm.baidu.com/contract.html). The document has about 4,000
characters printed on three pages.
                There are 536 distinct characters in the text. All
characters can be rendered using Type-3 fonts.

-benchmark1.xml: A document containing a page with 64 lines of 100 'a'
characters. This represents the "best case" for the font embedding strategy.
-benchmark2.xml: A document containing 64 lines of 100 characters with
10 different characters
-benchmark2.xml: A document containing 64 lines of 100 characters with
82 different characters
-benchmark4.xml: A document containing a page with 64 lines of 99 'a'
characters and one asian character. This is to test the bit set and to
force the usage of "glyphshow" instead of the
                   more compact "show" string representation
-benchmark5.xml: A document containing a page with 12 lines of unique
characters where each line uses an entirely different font. This
represents the "worst case" for the font embedding strategy.
-benchmark6.xml: Same as benchmark5.xml but just enough non unique
characters are added so that size and performance exceed outline drawing.
-example.xml: A two paged document that replaces the static document
from the previous version.

Other files:
- results.html: Test results. The results include Java execution time,
resulting Postscript file size and the time Ghostscript needed to
rasterize the result.
- results.txt: Detailed test results
- condense.awk: A script that condenses the data in "results.txt"
producing "results.html"
- runtests.sh: A shellscript that produces "results.txt" and "results.html"
- Makefile: A makefile with the targets "run" and "clean"

Running the tests:
The test are run via "make run" which compiles PSTest.java and then runs
the shell script "runtests.sh" which in turn creates the HTML result
file "results.html".
Two of the tests requires a list of fonts. This list is located in
"runtests.sh" and should be adjusted before running.

Measuring rasterization time with Ghostscript:
If "gs" is installed then the script will measure the time Ghostscript
takes to render the document to a 600 DPI raster. As far as I can tell
there is no option to perform
rasterization only. Instead one has to select an output format where
image encoding and file IO does
not dominate the results. After some tests I decided to use the
"pngmono" option with a scaledown of 3.

Regarding the Java performance measurements:
The values "Time for first run" and  "Time for second run" are obtained
by calls to "System.nanoTime()" immediately before and after the code
that sets up and executes the print job.
The JVM startup, the loading of the document and the computation of the
layout are not included in this measure. Each print job is executed
twice from the same JVM hence the differentiation between the values
"first run" and "second run".
The values "Performance for first/second in characters/s" are computed
from those time measurements and the document size.

Summary of results:
The rendering seems to be accurate and the fallback to outline rendering
when required works for all tested cases. Suggestions for additional
tests are very much appreciated since I have only superficial knowledge
in this area.

Rendering speed using Ghostview:
Font embedding is always faster including in the "worst case" scenario
"benchmark5.xml" where there is absolutely no character reuse.
Type 3 font embedding is more than 20 times faster on the Latin
documents "t-mobile.xml" and "oracle.xml" and even with the Chinese
document "baidu.xml" the gain is more than factor 6.

In absolute numbers this means that the 8 page document "t-mobile.xml"
is rendered in under 2 seconds while it takes about 40 seconds using the
current method.
Printing that document on my local DELL 2330dn takes 2 minutes using
embedded fonts while it takes over 40 minutes using the current method.

Rendering speed in Java:
There is a large improvement in speed between the first run and
consecutive runs using the new font embedding strategy for which I have
not yet found an explanation while the effect is not observable when
using outline drawing.
However, even regarding the slower run only, the new method outperforms
the existing method but for the "worst case" scenario where it reaches
only about 80 % of the current performance.
In all other case the improvement is at least factor 3 and gets better
with growing document sizes. For the largest document "t-mobile.xml" the
gain is factor 16 so that it is rendered in less then 300 ms while the
current method takes over 4 seconds.
On a "long running" JVM the gain nearly reached factor 40 so that the
performance went from about 18 thousand characters per second to about
700 thousand characters per second.

Postscript file size:
The examples show an average of about 600 to 700 bytes per character
using outlines for Latin text and 1.5 KB per Chinese glyph.
Using font embedding the initial character definition has the same size
as a outline character but any subsequent character usage requires about
5 bytes for Latin and 20 bytes for Asian text (For non fractional
metrics the values roughly double). The values could be further reduced
a little by using single character shortcuts for the "show", "glyphshow"
and "rmoveto" commands.
The files size varies between factor 400 ("best case" document
"benchmark1.xml") and a 20 % increase of size ("worst case" document
benchmark5.xml). For the two paged latin document "oracle.xml" the gain
is more than factor 40 and for the longer document "t-mobile.xml" the
gain is more than factor 120 so that the document is 300 KB instead of
35 MB. The Chinese document "baidu.xml" is reduced to about 20 % of the
original size. The size difference increases after conversion to PDF so
that for example the "t-mobile" document shrinks from 22 MB to less than
100KB.

Thanks,
Alex

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/2d-dev/attachments/20140512/ed3978bd/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PSTest.zip
Type: application/zip
Size: 65412 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/2d-dev/attachments/20140512/ed3978bd/PSTest.zip>