[threeten-dev] SubstringTree

Wed Dec 19 02:46:38 PST 2012

I'm very happy for you to replace SubstringTree with anything that
goes faster or has a smaller footprint. Push away!
Stephen

On 19 December 2012 04:45, Xueming Shen <xueming.shen at oracle.com> wrote:
> Stephen,
>
> I had some doubts about the speed/memory performance of SubstringTree used
> for
> id parsing in DateTimeFormatterBuilder. Given the nature of HashMap + String
> (after
> the hash hit, you still  need to do the char by char comparing for equality,
> except in
> the special case that they are the same String object), I doubt the
> HashMap+subString
> approach used in SubstringTree can really be "faster" than a normal
> character by
> character match.
>
> So I wrote a simple version of RadixTree for the prefix-string match
>
> http://cr.openjdk.java.net/~sherman/jdk8_threeten/radixtree/RadixTree.java
>
> run a non-scientific bench-mark
>
> http://cr.openjdk.java.net/~sherman/jdk8_threeten/radixtree/Parse.java
>
> to compare the RadixTree approach against the SubstringTree appraoch
>
> http://cr.openjdk.java.net/~sherman/jdk8_threeten/radixtree/SubstringTree.java
>
> The result suggests that SubstringTree is slower in most cases except if you
> feed in the zids (which is used to build the tree) directly, in which case
> it's an
> identity equal, so no character by character match needed. Even in this
> case,
> the "fastness" is marginal (I have not tried to feed in a non-String
> CharSequence,
> yet).
>
> Worse is the memory usage in the SubstringTree matching,  millions of bytes
> of String objects got created during the run (substring for hashing). I know
> it's
> in young-gen, and GC should take well care of them. But if can avoid
> generate
> them. I guess this is also one of the reasons it is not as fast as expected.
> (we happened to have some internal meetings this week regarding the
> "millions
> of String objects in heap", 20%+ in some benchmarks. So we are sensitive of
> Strings these days :-) )
>
> here is a snapshot of the netbeans profile.
>
> http://cr.openjdk.java.net/~sherman/jdk8_threeten/radixtree/SubStr.png
>
> compared to the RadixTree case
>
> http://cr.openjdk.java.net/~sherman/jdk8_threeten/radixtree/Radix.png
>
> It appears it might be hard to tune the SubstringTree approach, given the
> nature of the approach (to utilize hashmap for the substring).
>
> I would suggest to replace it with the RadixTree (or any thing better), if
> tuning
> is really difficult as I expected.
>
> -Sherman
>
>