Future for the JOni regex library
Hannes Wallnoefer
hannes.wallnoefer at oracle.com
Wed May 1 15:50:58 PDT 2013
Hi Charlie,
I feel a bit guilty for not getting (or keeping) in touch with you about
this. We recently switched to Joni as our default regexp engine and it's
working pretty well.
What we have in Nashorn now is still relatively close to the JRuby
codebase. Both share the same package structure, classes, and methods.
Our code is just simpler because it doesn't have to deal with different
encodings. My github fork contains a "noencoding" branch that represents
the connection between the two:
https://github.com/hns/joni/tree/noencoding
However, there are some forces that might force us to drift further
apart. One of them is code coverage. As it is, JavaScript uses a rather
limited subset of what Joni provides, and this means a lot of code is
neither used nor tested. Maintaining these bits doesn't seem to make
sense (as far as Nashorn is concerned).
It's a similar story with coding standards. We ran FindBugs over Joni
and it found a number of issues, including things like public final
arrays. Fixing these could require us to change the package structure or
make other structural changes. Not to mention missing Javadocs and
obscure naming, which would also drive us apart when fixed on our side.
As Jim said I also worked on ASM bytecode generation and got quite far
with it except for some combinations of nested quantifiers and captures
I couldn't figure out. I've suspended the work for the time being since
it's not the highest priority thing to do, but here's the patch:
http://cr.openjdk.java.net/~hannesw/8012269/
I definitely think it would be a great idea to keep our versions of Joni
connected and evolving together. Right now this would still be
relatively easy, but it will become harder as time goes by.
Hannes
Am 2013-05-01 22:10, schrieb Charles Oliver Nutter:
> Hello!
>
> I saw a few weeks back that you guys have adopted JRuby's regex
> engine, JOni, modified to work only with Java's char[]. We're thrilled
> that you've found our engine useful enough to incorporate!
>
> However, I'm wondering about the future of these engines. We have
> planned improvements, patches that come in from time to time, and so
> on, and maintaining two separate copies will eventually lead to them
> diverging. But without any way to specialize our byte[]-based JOni for
> char[] easily, I'm not sure what can be done.
>
> Any thoughts on this? Just to tempt you... a few of the planned improvements:
>
> * JVM bytecode compiler, for more fastness
> * Thread interruptible execution, to kill off regex runs that don't complete
>
> It would be great if we could collaborate on such things.
>
> - Charlie
More information about the nashorn-dev
mailing list