Future for the JOni regex library
Hannes Wallnoefer
hannes.wallnoefer at oracle.com
Thu May 9 18:48:18 PDT 2013
Sorry for the late reply again. I've been at JavaOne India most of the
week and was a bit distracted.
All our changes to Joni were made under the original license, so in my
understanding there shouldn't be any problem in taking them back. A I
said the patch is incomplete and probably won't make it into JDK8, but
of course if you could make something out if it that would be wonderful!
We've just started discussing our plans for JDK9. I haven't contacted
Marcin yet but I'll absolutely contact him and you about this.
Hannes
Am 2013-05-06 23:39, schrieb Charles Oliver Nutter:
> Thanks for the updates, Hannes!
>
> Wow, I didn't expect to hear you already had a regex compiler, but I
> suppose I should have known :-) I wonder if we can port it back to
> Joni and finish it...regex compilation has been on our want list for a
> long time.
>
> I understand what you mean about stripping stuff out. We have to use
> Joni almost as-is because of the complexities of Ruby regex and
> encoding logic, but there's not much need for you to do the same.
> Sharing in the long term will probably be difficult.
>
> I'm also really excited to hear that you will try to JEP this into
> OpenJDK as the new regex backend. Have you been in contact with the
> author of Joni, Marcin Mielzinsky? He would be very proud to know this
> is in process, and obviously he deserves pretty much all the credit
> for making this thing happen.
>
> - Charlie
>
> On Wed, May 1, 2013 at 5:50 PM, Hannes Wallnoefer
> <hannes.wallnoefer at oracle.com> wrote:
>> Hi Charlie,
>>
>> I feel a bit guilty for not getting (or keeping) in touch with you about
>> this. We recently switched to Joni as our default regexp engine and it's
>> working pretty well.
>>
>> What we have in Nashorn now is still relatively close to the JRuby codebase.
>> Both share the same package structure, classes, and methods. Our code is
>> just simpler because it doesn't have to deal with different encodings. My
>> github fork contains a "noencoding" branch that represents the connection
>> between the two:
>>
>> https://github.com/hns/joni/tree/noencoding
>>
>> However, there are some forces that might force us to drift further apart.
>> One of them is code coverage. As it is, JavaScript uses a rather limited
>> subset of what Joni provides, and this means a lot of code is neither used
>> nor tested. Maintaining these bits doesn't seem to make sense (as far as
>> Nashorn is concerned).
>>
>> It's a similar story with coding standards. We ran FindBugs over Joni and it
>> found a number of issues, including things like public final arrays. Fixing
>> these could require us to change the package structure or make other
>> structural changes. Not to mention missing Javadocs and obscure naming,
>> which would also drive us apart when fixed on our side.
>>
>> As Jim said I also worked on ASM bytecode generation and got quite far with
>> it except for some combinations of nested quantifiers and captures I
>> couldn't figure out. I've suspended the work for the time being since it's
>> not the highest priority thing to do, but here's the patch:
>>
>> http://cr.openjdk.java.net/~hannesw/8012269/
>>
>> I definitely think it would be a great idea to keep our versions of Joni
>> connected and evolving together. Right now this would still be relatively
>> easy, but it will become harder as time goes by.
>>
>> Hannes
>>
>>
>> Am 2013-05-01 22:10, schrieb Charles Oliver Nutter:
>>
>>> Hello!
>>>
>>> I saw a few weeks back that you guys have adopted JRuby's regex
>>> engine, JOni, modified to work only with Java's char[]. We're thrilled
>>> that you've found our engine useful enough to incorporate!
>>>
>>> However, I'm wondering about the future of these engines. We have
>>> planned improvements, patches that come in from time to time, and so
>>> on, and maintaining two separate copies will eventually lead to them
>>> diverging. But without any way to specialize our byte[]-based JOni for
>>> char[] easily, I'm not sure what can be done.
>>>
>>> Any thoughts on this? Just to tempt you... a few of the planned
>>> improvements:
>>>
>>> * JVM bytecode compiler, for more fastness
>>> * Thread interruptible execution, to kill off regex runs that don't
>>> complete
>>>
>>> It would be great if we could collaborate on such things.
>>>
>>> - Charlie
>>
More information about the nashorn-dev
mailing list