Jigsaw EA feedback for elasticsearch

Robert Muir rcmuir at gmail.com
Fri Sep 11 05:07:11 UTC 2015


I just got elasticsearch working on the jigsaw EA build (thanks again,
this is really helpful).

Most fixes were very minor, but #5 and #6 are actual loss of
features/functionality to users.

1. compilation issues I encountered are not related to jigsaw, but
happen for all 9-ea builds. Looks like there is a maven build issue
for 9-ea if you try to use -Werror (and we want to fail on compiler
warnings). We just haven't noticed because of stability issues, they
have not been running in jenkins.
2. we have a "jar hell detector" that threw an
UnsupportedOperationException, because classloader is no longer a
URLClassLoader, so we can't get the list of urls. This caused all
tests to fail. I changed the code to parse java.class.path.
3. we have a "jvm info" api that provides information about the jvm,
e.g. to assist our engineers in debugging different nodes in the
cluster. it was not prepared to handle UnsupportedOperationException
from RuntimeMXBean.getBootClassPath: I fixed it to fall back to
sun.boot.class.path, otherwise fall back to "unknown".
4. exception serialization tests failed, because we manually serialize
exceptions. We previously used java serialization, but it causes
serious trouble because of backwards compatibility breaks between even
minor jdk versions: this would strike when users try to upgrade their
jvms for nodes in their cluster with a rolling restart. The tests fail
because the stacktrace "loses" stuff after deserialization (the module
version). For now i just disabled the tests on java 9, because I don't
know how we can support e.g. java 8 and java 9 and populate this stuff
"optionally" yet without more digging.
5. we have monitoring apis that provide basic system information,
similar to #3, for debugging purposes, and to feed monitoring tools so
people can track the health of the cluster. previously, we used the
sigar library (JNI) for this, but it has bugs that caused users
crashes. So we were forced to limit ourselves to what is provided with
java management apis: which is much less, but we figure it has the
basics. For some very basic stats, this means we also look for
com.sun.management apis
(https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/package-summary.html)
and if they are available, we provide the stuff available there too,
like how much ram is on the machine, swap in use, number of open/max
file descriptors, and so on. We test what is available and what is not
based on platform so we can detect if something changes in the JDK,
like what happens with jigsaw, where they all become unavailable.
These stats are important for debugging: e.g. if lucene is not
behaving right, merging could fall behind and many open
filedescriptors is a sign of that. I disabled these tests for java 9
but I think this is too much, we will be forced to write our own JNA
or /proc reading code for platforms we care about if this is not
available from the JDK, its too important. I'd really really really
like to not have to do that. it seems a little odd to block these apis
but allow Unsafe IMO.
6. cluster snapshot/restore to amazon s3 does not work, because of
their use of internal ssl libraries. I've tried to get them to fix it
for a while now (https://github.com/aws/aws-sdk-java/pull/432). This
is also a serious loss of functionality, if they wont fix it, I guess
we have to fork the aws sdk.
7. we had some test bugs similar to the lucene case (static leaks in
tests and the same test framework tried to compute the size of some
internal class).
8. during testing I hit some kind of bug, where the thai break
iterator returned wrong information. This might be hotspot-related or
something else, and it never reproduced again. We use this check
(https://github.com/apache/lucene-solr/blob/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/th/ThaiTokenizer.java#L37-L47)
to see if we can "really" tokenize thai, otherwise we throw an
exception. For some IBM JVM versions at least in the past, they did
not have a breakiterator for thai. I guess it just goes to show the EA
build is really a prototype, and not yet ready to be added to our CI
servers and so on... which is the only way I can ensure this huge
codebase stays working with jigsaw.


More information about the jigsaw-dev mailing list