[OpenJDK 2D-Dev] [PATCH] JDK-4627340 : RFE: A way to improve text printing performance for postscript devices (Improved proposal)

Alex Geller ag at 4js.com
Tue Feb 4 11:12:55 UTC 2014

my OCA submitted on January 10 didn't make its appearance and I am 
guessing it is because I checked the "Vice president" option by mistake. 
I have submitted a corrected version and I hope it is OK to continue 
posting on this topic in spite of that.
After some minor adjustments to the patch I improved the test program so 
that instead of printing a static test document it now reads external 
files containing arbitrary attributed text. Based on the new version I 
have created a couple of test documents, compared the output under 
various settings and performed some benchmarks.

The changes to the patch are:
- For testing purposes the patch can be activated/deactivated by the 
temporary system property "PSRenderType3"
- I removed the code that allowed to choose different byte encodings for 
the purpose of yielding a more compact Postscript representation mainly 
because text extraction via tools like "pstotext" or "ps2pdf" can't work 
correctly with anything else then Latin1 encoding anyway (Disregarding 
the option to use CID-Keyed fonts).
- I spotted "Font.deriveFont(1000f)" as a source of slowness and 
replaced it with code that scales the glyphs of the original font instead.

The test program can now be invoked with some command line options as 
list below:
$java PSTest -help
Usage: java PSTest -help|[-inputFileName file name ("example.xml")]
                          [-latinFontName font name ("SansSerif")]
                          [-asianFontName font name ("WenQuanYi Zen Hei")]
DrawString|DrawGlyphVector|DrawTextLayout (DrawString)]
                          [-renderIntoBufferedImage true|false (false)]
true|false (true)]
                          [-useFractionalMetricsForPainting true|false 
true|false (false)]
                          [-useAntiAliasingForPainting true|false (false)]
                          [-numberOfPrintIterations number (3)]
                          [-paintExpectedStringSizeMarkers number (false)]
                          [-bufferedImageDPI number (300.0)]

I created the following test documents:

-oracle.xml: This is conversion of the Oracle terms and conditions page 
(http://www.oracle.com/us/legal/terms/index.html). The document has 
about 20,000 characters printed on two pages.
               About 1% of the characters are rendered using the default 
rendering due to text colorization (FOREGROUND attribute).
-t-mobile.xml: This is conversion of the T-Mobile terms and conditions 
The document has about 56,000 characters printed on eight pages.
               About 0.1% of the characters are rendered using the 
default rendering due to underlining (UNDERLINE attribute).
-baidu.xml: This is conversion of the Chinese Baidu terms and conditions 
page (http://adm.baidu.com/contract.html). The document has about 4,000 
characters printed on three pages.
               There are 536 distinct characters in the text. All 
characters can be rendered using Type-3 fonts.

-benchmark1.xml: A document containing a page with 64 lines of 100 'a' 
characters. This represents the "best case" for the font embedding strategy.
-benchmark2.xml: A document containing 64 lines of 100 characters with 
10 different characters
-benchmark2.xml: A document containing 64 lines of 100 characters with 
82 different characters
-benchmark4.xml: A document containing a page with 64 lines of 99 'a' 
characters and one asian character. This is to test the bit set and to 
force the usage of "glyphshow" instead of the
                  more compact "show" string representation
-benchmark5.xml: A document containing a page with 12 lines of unique 
characters where each line uses an entirely different font. This 
represents the "worst case" for the font embedding strategy.
-benchmark6.xml: Same as benchmark5.xml but just enough non unique 
characters are added so that size and performance exceed outline drawing.
-example.xml: A two paged document that replaces the static document 
from the previous version.

Other files:
- results.html: Test results. The results include Java execution time, 
resulting Postscript file size and the time Ghostscript needed to 
rasterize the result.
- results.txt: Detailed test results
- condense.awk: A script that condenses the data in "results.txt" 
producing "results.html"
- runtests.sh: A shellscript that produces "results.txt" and "results.html"
- Makefile: A makefile with the targets "run" and "clean"

Running the tests:
The test are run via "make run" which compiles PSTest.java and then runs 
the shell script "runtests.sh" which in turn creates the HTML result 
file "results.html".
Two of the tests requires a list of fonts. This list is located in 
"runtests.sh" and should be adjusted before running.

Measuring rasterization time with Ghostscript:
If "gs" is installed then the script will measure the time Ghostscript 
takes to render the document to a 600 DPI raster. As far as I can tell 
there is no option to perform
rasterization only. Instead one has to select an output format where 
image encoding and file IO does
not dominate the results. After some tests I decided to use the 
"pngmono" option with a scaledown of 3.

Regarding the Java performance measurements:
The values "Time for first run" and  "Time for second run" are obtained 
by calls to "System.nanoTime()" immediately before and after the code 
that sets up and executes the print job.
The JVM startup, the loading of the document and the computation of the 
layout are not included in this measure. Each print job is executed 
twice from the same JVM hence the differentiation between the values 
"first run" and "second run".
The values "Performance for first/second in characters/s" are computed 
from those time measurements and the document size.

Summary of results:
The rendering seems to be accurate and the fallback to outline rendering 
when required works for all tested cases. Suggestions for additional 
tests are very much appreciated since I have only superficial knowledge 
in this area.

Rendering speed using Ghostview:
Font embedding is always faster including in the "worst case" scenario 
"benchmark5.xml" where there is absolutely no character reuse.
Type 3 font embedding is more than 20 times faster on the Latin 
documents "t-mobile.xml" and "oracle.xml" and even with the Chinese 
document "baidu.xml" the gain is more than factor 6.

In absolute numbers this means that the 8 page document "t-mobile.xml" 
is rendered in under 2 seconds while it takes about 40 seconds using the 
current method.
Printing that document on my local DELL 2330dn takes 2 minutes using 
embedded fonts while it takes over 40 minutes using the current method.

Rendering speed in Java:
There is a large improvement in speed between the first run and 
consecutive runs using the new font embedding strategy for which I have 
not yet found an explanation while the effect is not observable when 
using outline drawing.
However, even regarding the slower run only, the new method outperforms 
the existing method but for the "worst case" scenario where it reaches 
only about 80 % of the current performance.
In all other case the improvement is at least factor 3 and gets better 
with growing document sizes. For the largest document "t-mobile.xml" the 
gain is factor 16 so that it is rendered in less then 300 ms while the 
current method takes over 4 seconds.
On a "long running" JVM the gain nearly reached factor 40 so that the 
performance went from about 18 thousand characters per second to about 
700 thousand characters per second.

Postscript file size:
The examples show an average of about 600 to 700 bytes per character 
using outlines for Latin text and 1.5 KB per Chinese glyph.
Using font embedding the initial character definition has the same size 
as a outline character but any subsequent character usage requires about 
5 bytes for Latin and 20 bytes for Asian text (For non fractional 
metrics the values roughly double). The values could be further reduced 
a little by using single character shortcuts for the "show", "glyphshow" 
and "rmoveto" commands.
The files size varies between factor 400 ("best case" document 
"benchmark1.xml") and a 20 % increase of size ("worst case" document 
benchmark5.xml). For the two paged latin document "oracle.xml" the gain 
is more than factor 40 and for the longer document "t-mobile.xml" the 
gain is more than factor 120 so that the document is 300 KB instead of 
35 MB. The Chinese document "baidu.xml" is reduced to about 20 % of the 
original size. The size difference increases after conversion to PDF so 
that for example the "t-mobile" document shrinks from 22 MB to less than 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: PSTest.zip
Type: application/zip
Size: 65412 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/2d-dev/attachments/20140204/98104668/PSTest.zip>

More information about the 2d-dev mailing list