Future for the JOni regex library

Wed May 1 15:50:58 PDT 2013

Hi Charlie,

I feel a bit guilty for not getting (or keeping) in touch with you about 
this. We recently switched to Joni as our default regexp engine and it's 
working pretty well.

What we have in Nashorn now is still relatively close to the JRuby 
codebase. Both share the same package structure, classes, and methods. 
Our code is just simpler because it doesn't have to deal with different 
encodings. My github fork contains a "noencoding" branch that represents 
the connection between the two:

https://github.com/hns/joni/tree/noencoding

However, there are some forces that might force us to drift further 
apart. One of them is code coverage. As it is, JavaScript uses a rather 
limited subset of what Joni provides, and this means a lot of code is 
neither used nor tested. Maintaining these bits doesn't seem to make 
sense (as far as Nashorn is concerned).

It's a similar story with coding standards. We ran FindBugs over Joni 
and it found a number of issues, including things like public final 
arrays. Fixing these could require us to change the package structure or 
make other structural changes. Not to mention missing Javadocs and 
obscure naming, which would also drive us apart when fixed on our side.

As Jim said I also worked on ASM bytecode generation and got quite far 
with it except for some combinations of nested quantifiers and captures 
I couldn't figure out. I've suspended the work for the time being since 
it's not the highest priority thing to do, but here's the patch:

http://cr.openjdk.java.net/~hannesw/8012269/

I definitely think it would be a great idea to keep our versions of Joni 
connected and evolving together. Right now this would still be 
relatively easy, but it will become harder as time goes by.

Hannes

Am 2013-05-01 22:10, schrieb Charles Oliver Nutter:
> Hello!
>
> I saw a few weeks back that you guys have adopted JRuby's regex
> engine, JOni, modified to work only with Java's char[]. We're thrilled
> that you've found our engine useful enough to incorporate!
>
> However, I'm wondering about the future of these engines. We have
> planned improvements, patches that come in from time to time, and so
> on, and maintaining two separate copies will eventually lead to them
> diverging. But without any way to specialize our byte[]-based JOni for
> char[] easily, I'm not sure what can be done.
>
> Any thoughts on this? Just to tempt you... a few of the planned improvements:
>
> * JVM bytecode compiler, for more fastness
> * Thread interruptible execution, to kill off regex runs that don't complete
>
> It would be great if we could collaborate on such things.
>
> - Charlie