[threeten-dev] SubstringTree
Xueming Shen
xueming.shen at oracle.com
Tue Dec 18 20:45:45 PST 2012
Stephen,
I had some doubts about the speed/memory performance of SubstringTree
used for
id parsing in DateTimeFormatterBuilder. Given the nature of HashMap +
String (after
the hash hit, you still need to do the char by char comparing for
equality, except in
the special case that they are the same String object), I doubt the
HashMap+subString
approach used in SubstringTree can really be "faster" than a normal
character by
character match.
So I wrote a simple version of RadixTree for the prefix-string match
http://cr.openjdk.java.net/~sherman/jdk8_threeten/radixtree/RadixTree.java
run a non-scientific bench-mark
http://cr.openjdk.java.net/~sherman/jdk8_threeten/radixtree/Parse.java
to compare the RadixTree approach against the SubstringTree appraoch
http://cr.openjdk.java.net/~sherman/jdk8_threeten/radixtree/SubstringTree.java
The result suggests that SubstringTree is slower in most cases except if you
feed in the zids (which is used to build the tree) directly, in which
case it's an
identity equal, so no character by character match needed. Even in this
case,
the "fastness" is marginal (I have not tried to feed in a non-String
CharSequence,
yet).
Worse is the memory usage in the SubstringTree matching, millions of bytes
of String objects got created during the run (substring for hashing). I
know it's
in young-gen, and GC should take well care of them. But if can avoid
generate
them. I guess this is also one of the reasons it is not as fast as
expected.
(we happened to have some internal meetings this week regarding the
"millions
of String objects in heap", 20%+ in some benchmarks. So we are sensitive of
Strings these days :-) )
here is a snapshot of the netbeans profile.
http://cr.openjdk.java.net/~sherman/jdk8_threeten/radixtree/SubStr.png
compared to the RadixTree case
http://cr.openjdk.java.net/~sherman/jdk8_threeten/radixtree/Radix.png
It appears it might be hard to tune the SubstringTree approach, given the
nature of the approach (to utilize hashmap for the substring).
I would suggest to replace it with the RadixTree (or any thing better),
if tuning
is really difficult as I expected.
-Sherman
More information about the threeten-dev
mailing list