[threeten-dev] SubstringTree

Xueming Shen xueming.shen at oracle.com
Tue Dec 18 20:45:45 PST 2012


Stephen,

I had some doubts about the speed/memory performance of SubstringTree 
used for
id parsing in DateTimeFormatterBuilder. Given the nature of HashMap + 
String (after
the hash hit, you still  need to do the char by char comparing for 
equality, except in
the special case that they are the same String object), I doubt the 
HashMap+subString
approach used in SubstringTree can really be "faster" than a normal 
character by
character match.

So I wrote a simple version of RadixTree for the prefix-string match

http://cr.openjdk.java.net/~sherman/jdk8_threeten/radixtree/RadixTree.java

run a non-scientific bench-mark

http://cr.openjdk.java.net/~sherman/jdk8_threeten/radixtree/Parse.java

to compare the RadixTree approach against the SubstringTree appraoch

http://cr.openjdk.java.net/~sherman/jdk8_threeten/radixtree/SubstringTree.java

The result suggests that SubstringTree is slower in most cases except if you
feed in the zids (which is used to build the tree) directly, in which 
case it's an
identity equal, so no character by character match needed. Even in this 
case,
the "fastness" is marginal (I have not tried to feed in a non-String 
CharSequence,
yet).

Worse is the memory usage in the SubstringTree matching,  millions of bytes
of String objects got created during the run (substring for hashing). I 
know it's
in young-gen, and GC should take well care of them. But if can avoid 
generate
them. I guess this is also one of the reasons it is not as fast as 
expected.
(we happened to have some internal meetings this week regarding the 
"millions
of String objects in heap", 20%+ in some benchmarks. So we are sensitive of
Strings these days :-) )

here is a snapshot of the netbeans profile.

http://cr.openjdk.java.net/~sherman/jdk8_threeten/radixtree/SubStr.png

compared to the RadixTree case

http://cr.openjdk.java.net/~sherman/jdk8_threeten/radixtree/Radix.png

It appears it might be hard to tune the SubstringTree approach, given the
nature of the approach (to utilize hashmap for the substring).

I would suggest to replace it with the RadixTree (or any thing better), 
if tuning
is really difficult as I expected.

-Sherman




More information about the threeten-dev mailing list