RFR(M): 8249963: Make the zlib implementation selectively configurable at startup
Volker Simonis
volker.simonis at gmail.com
Thu Jul 23 17:18:25 UTC 2020
Hi,
can I please get some reviews for the following small enhancement
which will allow you to configure different zlib implementations at VM
start-up and get up to 100% better throughput for compression and
about 50% better throughput for decompression (depending on the
selected zlib implementation). We're using a similar change
productively in Amazon since quite a while with good results on
compression/decompression heavy workloads.
As usual, my write-up is moch longer and much more complex than the
change itself :)
http://cr.openjdk.java.net/~simonis/webrevs/2020/8249963/
https://bugs.openjdk.java.net/browse/JDK-8249963
I've tested these changes locally on Linux/Windows/Mac without any
problems but got a Mach5 build error for Windows after sending them to
the submit repo (Job:
mach5-one-simonis-JDK-8249963-20200723-1236-12890347, BuildId:
2020-07-23-1235507.volker.simonis.source). Would be great if somebody
from Oracle could shed some light on that issue.
Thank you and best regards,
Volker
Following some background, benchmarks and implementation details for the change:
The JDK bundles the original zlib implementation [1] almost from the
very beginning, starting with zlib 1.0.4 in JDK 1.1. In the meantime,
the bundled zlib implementation was updated to the latest released
version 1.2.11. However, the innovation velocity in the original zlib
version has considerably slowed down in the last years.
Lately, new projects have picked up the original zlib implementation
and considerably improved its deflate and inflate speed. Testing shows
that compression throughput can be improved between 50 and 100% with
Cloudflare's zlib version [2] and decompression throughput between 30
and 60% using the zlib implementation from the Chromium project [3].
Overview
--------
This enhancement proposes some simple changes to the JDK's zlib
wrapper libzip which make it possible to choose alternative zlib
implementations selectively for deflation and/or inflation at VM
startup based on a system property. As an additional benefit it makes
it possible to use the system zlib version even if the JDK was
configured with a bundled zlib.
Already now, the JDK can be configured at build time to either use the
bundled zlib version or to dynamically link against the system zlib.
If linked dynamically, LD_LIBRARY_PATH or LD_PRELOAD can be used to
run the JDK with an alternative zlib implementation. This approach is
however not very flexible because:
- once configured with a "bundled" zlib it's not possible to use
another implementation any more. Some platforms like Windows don't
have a default, system provided zlib version so they are always built
with "--with-zlib=bundled" and therefore can't change the
implementation at runtime.
- even if configured with "--with-zlib=system", it is only possible a
single alternative version as a substitute for the system zlib. But
we've found (see benchmarks [4]) that there's no single zlib
implementation which improves both compression and decompression
equally well. It may therefore make sense to have the possibility to
selectively use one implementation for compression and another one for
decompression.
This enhancement proposes three new system properties:
org.openjdk.zlib.implementation
org.openjdk.zlib.implementation.inflate
org.openjdk.zlib.implementation.deflate
which can be used to either replace the full zlib functionality or
selectively just the deflate or inflate part with a new
implementation. As an argument they take an absolute or relative path
to an API-compatible, shared zlib library. E.g.
-Dorg.openjdk.zlib.implementation=zlib-cloudflare.so
-Dorg.openjdk.zlib.implementation.deflate=/Git/zlib-bench/lib/x86_64/zlib-cloudflare.dylib
-Dorg.openjdk.zlib.implementation.inflate='D:\Git\zlib-bench\lib\x86_64\zlib-chromium.dll`
An argument with the value "system" can be used to instruct the JDK to
load the default zlib version on the corresponding system. Notice that
this option now also works in the case where the JDK was configured
with a bundled zlib.
The naming of the properties is of course subject to discussion. If we
get an agreement on this change and the naming, I'll be happy to open
a CSR for the introduction of the new properties.
For testing purpose I've made some precompiled version of
zlib-cloudflare and zlib-chromium for Linux/x86_64, Linux/aarch64,
MacOS/x86_64 and Windows/x86_64 available in my GitHub repository [6].
The implementation of this whole enhancement is fairly simple. Instead
of calling right into zlib from libzip I've replaced all direct calls
by indirect calls through function pointers. By default, these
function pointers are pointing either to the corresponding functions
in the bundled zlib implementation or to the corresponding functions
in the dynamically linked system zlib. If none of the above system
properties will be used, the only change to the current implementation
is the replacement of some direct calls by indirect calls. Because
these function calls are usually made through JNI and because the
underlying code is usually quite compute intensive, I couldn't measure
any performance regression because of this change.
If one of the above system properties will be set, the selected shared
library will be dynamically loaded and the corresponding function
pointers will be redirected to point to the new implementation. I have
verified that this works on Linux, Windows and Mac OS X. You can find
some more implementation details at the end of this write-up.
Benchmarks
----------
I've run native benchmarks to compare the compression/decompression
throughput and the compression ratio at various compression levels for
zlib-adler [7], zlib-chromium [8], zlib-ng [9], zlib-ipp [10],
zlib-cloudflare [11] and the isa-l [12] library. The results and more
details can be found in my zlib-bench GitHub repository [5].
After that I've used precompiled versions of zlib-cloudflare,
zlib-chromium and the changes proposed by this enhancement to rerun
these benchmarks in Java on Linux/x86_64 and Linux/aarch64. I think
the results are quite impressive [4] showing around 100% higher
throughput for compression and around 50% higher throughput for
decompression on both architectures.
Implementation details
----------------------
Before change "8237750: Load libzip.so only if necessary" [13] libzip
(and transitively libz) was loaded early at VM startup in
"init_globals() -> ClassLoader::initialize()". For this change I've
re-enabled this early loading if and only if one of the new system
properties was given on the command line. I think these new properties
will only be used if heavy compression/decompression usage is expected
and in such a case it doesn't harm to load libzip early at startup.
Loading libzip later and on demand would otherwise make the
synchronization of the function pointer initialization unnecessary
hard because libzip can be loaded independently from the VM as well as
from the class library.
An alternative approach would be to use __attribute(constructor) (on
Linux) or DLLMain() (on Windows) to perform the function pointer
initialization once when libzip gets loaded. But first, this approach
is quite system dependent and I'm not sure it is available on all
platforms (I'm pretty sure it doesn't work on AIX :). Second, I think
it is not possible to dynamically load DLLs from DLLMain() on Windows
which would be required in this case.
A third possibility would be to define a JNI OnLoad function for
libzip. But unfortunately, libzip can not only be loaded from
java.util.zip where we have a JNIEnv but also from HotSpot or directly
with dlopen() from libjimage.
I've also removed the following, OpenJDK-specific patch to "zconf.h"
in the bundled zlib version which seems to have been there "since
ever" but which I don't think is required any more:
src/java.base/share/native/libzip/zlib/zconf.h
-#ifdef _LP64
-typedef unsigned int uLong; /* 32 bits or more */
-#else
typedef unsigned long uLong; /* 32 bits or more */
-#endif
The change is needed to make the bundled zlib compatible with the
system or the other zlib implementations which all do not have this
change. Notice that if we're building with the system zlib, we are
already using this setting today because in that case "zconf.h" is
taken from the system include path.
Finally, it has to be noticed that although we are loading libzip
early there are still two call sites of zlib functionality which won't
be able to benefit from the new implementations because they either
statically link in parts of the bundled zlib version or directly and
dynamically link against the system zlib version. That's
libsplashscreen and libjli. Both are only used at startup and both
only make limited use of zlib (if they use it at all) so I don't think
that they are relevant.
Risks
-----
Unfortunately ZipInputStream/ZipOutputStream have been designed such
that it is very easy to use them in an unsupported way. It is in
general not possible to read a ZipEntry from a ZipInputStream and
write that exact ZipEntry into a ZipOutputStream followed by the
inflated and re-deflated data of that entry. However this naive
approach is sometimes used to copy zip entries from a zip input file
into another zip output file. This approach will only work in the case
where the exact same deflate implementation and compression was used
for the initial compression like for the recompression.
This is because the ZipEntry will contain the compressed size of the
data belonging to that entry. But the Deflate format doesn't mandate a
specific, fixed compression algorithm which will always result in the
same compressed size. Instead, different implementations are free to
use different approaches for compression which can result in valid
entries but potentially with a different compressed size.
Actually, even if just a single implementation is used for both
compression and decompression, it is still not possible to detect the
compression level from the compressed data. It is therefore possible
that decompressing and re-compressing that data into a ZipOutputStream
will result in a different compressed size. However, ZipOutputStream
will throw an exception if the compressed size entry in a ZipEntry is
different from the actual size of the compressed data.
I've recently fixed two such cases in the OpenJDK (JDK-8240333 [14],
JDK-8240235 [15]) itself, but there may be other such use cases in the
wild which may throw an "ZipException: invalid entry compressed size
(expected 66 but got 67 bytes)". If that happens, the user can either
fix his code (which is trivial and advised because of the problems
explained before) or simply remove the new system properties in order
to fall back to the bundled or system implementation.
[1] https://www.zlib.net/
[2] https://github.com/cloudflare/zlib
[3] https://chromium.googlesource.com/chromium/src/third_party/zlib/
[4] https://github.com/simonis/zlib-bench/blob/master/Results-ojdk.md
[5] https://github.com/simonis/zlib-bench/blob/master/Results.md
[6] https://github.com/simonis/zlib-bench/tree/master/lib
[7] https://github.com/madler/zlib
[8] https://chromium.googlesource.com/chromium/src/third_party/zlib/
[9] https://github.com/zlib-ng/zlib-ng
[10] https://software.intel.com/en-us/articles/intel-ipp-zlib-coding-functions
[11] https://github.com/cloudflare/zlib
[12] https://github.com/intel/isa-l
[13] https://bugs.openjdk.java.net/browse/JDK-8237750
[14] https://bugs.openjdk.java.net/browse/JDK-8240333
[15] https://bugs.openjdk.java.net/browse/JDK-8240235
More information about the core-libs-dev
mailing list