Improving ZipFile.getEntry performance using Bloom filters
Andrew Dinn
adinn at redhat.com
Wed Apr 15 09:01:53 UTC 2020
On 14/04/2020 23:34, Ioi Lam wrote:
> I am a little worried adding extra overhead unconditionally into the JAR
> reader that people may not need/want.
A reasonable worry. We should always try to avoid fixes that benefit a
majority if they 'punish' a minority . . .
> Maybe we should take a step back and see why there are so many misses?
> Is it because you have a long classpath with many JARs on it, and you
> end up searching a lot of JAR files unnecessarily?
Well, there are a fair few as can be identified simply by Googling the pom
https://github.com/spring-projects/spring-petclinic/blob/master/pom.xml
and reading the dependencies section. n.b. that only shows top-level
dependencies but not recursive ones.
Unfortunately, this is going to be the reality for a many existing and
new apps. Most production Java apps are built from many library jars.
Java is the biggest 'divide and conquer' programming eco-system we have
ever seen in the history of computing. That's a direct corollary of it
being the biggest eco-system we have ever seen -- scale /necessitates/
divide and conquer.
> If this is the case, I think converting the app to modules will give you
> the speed up. A package can exist only in a single module so lookup is
> fast. You won't have misses unless you intentionally look for things
> that don't exist.
Well, yes but that also is just not going to fly for the majority of
Java developers for the reason given above. Most app developers are not
in a position to modularize their apps because the libraries they depend
on are not modularized, because the libraries /they/ depend on are not
modularized, because the libraries /they/ depend on are not modularized
... and so on. It's rarely going to be one group or organization with
one fixed goal that would need to schedule and implement such a change.
Now, you may lament that situation (or not) but it /is/ a brute fact and
is going to remain the status quo for a very long time. An eco-system of
the size of Java is like an ocean-liner. Which means the above advice is
going to whistle over the heads of many Java developers.
> Or, you can just package the app into one giant JAR file.
Again, that completely misses the reality of most developer's circumstances.
Now, I hope I don't come across like I am simply being negative here. I
have posted this reply because it's critically important that we, the
OpenJDK project devs, understand and keep in mind how most app
developers use (are able to use) Java. Suggestions that bypass the
realities and limitations of that usage say to me that we are at risk of
not meeting those users' needs.
regards,
Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill
More information about the core-libs-dev
mailing list