encoding-agnostic byte[]-based regexp engine...interested?

charlie hunt charlie.hunt at sun.com
Thu Nov 15 13:08:22 UTC 2007


Hi Charlie,

I'm adding OpenJDK's Java SE core libraries since that's where Java NIO 
lives. I doubt anything could be done at the class libraries level since 
an API addition / enhancement would likely require JCP activity.  But, 
there may be some value in raising some awareness at the class libraries 
level ?

I'd like to hear others reactions on this mailing list.  My initial 
reaction is what you are describing sounds like something that could be 
very useful for a protocol parser.  The core of Grizzly is protocol 
independent.  But, this might be a useful be able to offer to those who 
are implementing the com.sun.grizzly.ProtocolParser<T> interface.  
ProtocolParser is part of core Grizzly / Grizzly Framework.

I think some additional exploration / investigation is worthy.   We are 
in the process of gathering new feature requests.  I think we should add 
this to that list.

Again, anyone else who has some comments / reactions, please feel free 
to jump in. :-)

charlie ...

Charles Oliver Nutter wrote:
> Oniguruma is a C-based regular expression engine starting to get some 
> attention. The key selling points are its speed and the fact that it 
> can be applied to string content with arbitrary encodings. It will be 
> the default regex engine in Ruby 1.9.
>
> JRuby 1.1 will ship with a port of Oniguruma dubbed "Joni". For us, 
> the benefit is that we'll finally have a fast regex engine that can 
> work with Ruby's encoding-free byte[]-based strings, where before we 
> had to convert to/from char[] for all regex engines. We expect to see 
> great gains in regex performance with JRuby 1.1 when we release the 
> final version in Decemberish timeframe.
>
> But it has occurred to me there could be an even more interesting use 
> of Joni: as a regexp engine that could accept NIO bytebuffers 
> directly. Because it just walks byte[], no decoding is necessary. 
> Because it's encoding-agnostic, any arbitrary byte content could be 
> matched. So in theory it could easily be adapted to be a fast NIO 
> bytebuffer regex engine.
>
> Would there be interest in such a thing? I'm sure there are other 
> NIO-related lists that would be appropriate, but Grizzly is the first 
> actual project that springs to mind when I think of NIO, so I thought 
> I'd toss it out there.
>
> - Charlie
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe at grizzly.dev.java.net
> For additional commands, e-mail: dev-help at grizzly.dev.java.net
>

-- 

Charlie Hunt
Java Performance Engineer

<http://java.sun.com/docs/performance/>




More information about the core-libs-dev mailing list