Future for the JOni regex library

Mon May 6 11:09:25 PDT 2013

Thanks for the updates, Hannes!

Wow, I didn't expect to hear you already had a regex compiler, but I
suppose I should have known :-) I wonder if we can port it back to
Joni and finish it...regex compilation has been on our want list for a
long time.

I understand what you mean about stripping stuff out. We have to use
Joni almost as-is because of the complexities of Ruby regex and
encoding logic, but there's not much need for you to do the same.
Sharing in the long term will probably be difficult.

I'm also really excited to hear that you will try to JEP this into
OpenJDK as the new regex backend. Have you been in contact with the
author of Joni, Marcin Mielzinsky? He would be very proud to know this
is in process, and obviously he deserves pretty much all the credit
for making this thing happen.

- Charlie

On Wed, May 1, 2013 at 5:50 PM, Hannes Wallnoefer
<hannes.wallnoefer at oracle.com> wrote:
> Hi Charlie,
>
> I feel a bit guilty for not getting (or keeping) in touch with you about
> this. We recently switched to Joni as our default regexp engine and it's
> working pretty well.
>
> What we have in Nashorn now is still relatively close to the JRuby codebase.
> Both share the same package structure, classes, and methods. Our code is
> just simpler because it doesn't have to deal with different encodings. My
> github fork contains a "noencoding" branch that represents the connection
> between the two:
>
> https://github.com/hns/joni/tree/noencoding
>
> However, there are some forces that might force us to drift further apart.
> One of them is code coverage. As it is, JavaScript uses a rather limited
> subset of what Joni provides, and this means a lot of code is neither used
> nor tested. Maintaining these bits doesn't seem to make sense (as far as
> Nashorn is concerned).
>
> It's a similar story with coding standards. We ran FindBugs over Joni and it
> found a number of issues, including things like public final arrays. Fixing
> these could require us to change the package structure or make other
> structural changes. Not to mention missing Javadocs and obscure naming,
> which would also drive us apart when fixed on our side.
>
> As Jim said I also worked on ASM bytecode generation and got quite far with
> it except for some combinations of nested quantifiers and captures I
> couldn't figure out. I've suspended the work for the time being since it's
> not the highest priority thing to do, but here's the patch:
>
> http://cr.openjdk.java.net/~hannesw/8012269/
>
> I definitely think it would be a great idea to keep our versions of Joni
> connected and evolving together. Right now this would still be relatively
> easy, but it will become harder as time goes by.
>
> Hannes
>
>
> Am 2013-05-01 22:10, schrieb Charles Oliver Nutter:
>
>> Hello!
>>
>> I saw a few weeks back that you guys have adopted JRuby's regex
>> engine, JOni, modified to work only with Java's char[]. We're thrilled
>> that you've found our engine useful enough to incorporate!
>>
>> However, I'm wondering about the future of these engines. We have
>> planned improvements, patches that come in from time to time, and so
>> on, and maintaining two separate copies will eventually lead to them
>> diverging. But without any way to specialize our byte[]-based JOni for
>> char[] easily, I'm not sure what can be done.
>>
>> Any thoughts on this? Just to tempt you... a few of the planned
>> improvements:
>>
>> * JVM bytecode compiler, for more fastness
>> * Thread interruptible execution, to kill off regex runs that don't
>> complete
>>
>> It would be great if we could collaborate on such things.
>>
>> - Charlie
>
>