Strings and things

Brian Goetz brian.goetz at oracle.com
Wed Oct 13 20:46:30 UTC 2021


> I love where this feature is going.

I like that you are excited about it, but maybe a little too excited; 
there's a real danger that it gets taken too far, and I think you're 
there.  Notes inline.

> There are a few use cases and intended effects (at least, I assume they are
> intended) that this feature would have which the document doesn't name. I'm
> not sure if it's fully on the radar of Jim and Brian.
>
> ## Regexp literals
>
> "Make the language support regexp literals natively" is a feature request
> that (in my experience) comes up _a lot_, and it was more or less
> determined to be part of the bathwater during the text-block discussion (in
> that 'raw strings' were ejected from that proposal). That was mostly about
> changing how java interpolates backslashes in string literals which this
> feature will not solve. However, surely:
>
> Pattern."^foo|bar$"

Meh.  Aside from the "long" name of "compile", this is no different 
than, and only trivially shorter than,

     Pattern.of("^foo|bar$");

What you want is _opportunistic constant folding_, as outlined here 
https://www.youtube.com/watch?v=iSEjlLFCS3E , where the pattern factory 
can be marked as "if you call me with constants, I can return a 
constant", and translated with condy.  (See the video for bonus points; 
it might even be evaluatable at compile time, and syntax errors in the 
regex could be turned into compilation errors.)  But all of this is 
orthogonal to the syntax.

(Note too that putting a type on the LHS is not currently supported, and 
won't be until we have something like type classes.)

At this point, using this feature _just_ because you get constantization 
is kind of an abuse.  You could easily do the same with

     private static final Pattern PAT = Pattern.compile(...);

Sure, moving it to the point of use is cool, but that's not the goal.  
Its a cute hack, though!

> ## A small issue: The type of the formatted string.
>
> Cay mentioned this as well (in the context of how other languages do it),
> but right now the proposal uses `String templateString` (in TemplatedString
> / TemplatePolicy), but it's not obvious what this string would contain.
> Presumably not just "Hello {name} you are {age} years old" - a policy would
> have no way to differentiate a 'hole' like "{name}" from a literal open
> brace character. I assume this is something to be worked on later?

There's a special unicode character for "insertion point."

> As long as that object is constructed only once in the runtime, that
> doesn't seem like a costly move, performance-wise. But, it would be nice if
> the code that actually runs to process that object + the `List<Object>` of
> params gets to also rely on some pre-processing "only once" without
> handrolling a lazy-initializing system. As Stephen indicated, some of the
> use cases of this feature have non-trivial preprocessing needs (and usually
> validation needs as well). Brian mentioned that a lot of the 'cost' in the
> java ecosystem's use of `String.format` is the overhead of reparsing that
> format string over and over again. This would eliminate the runtime cost
> down to once-per-literal which sounds like a worthwhile endeavour, no?

See the comments on the amber-spec-experts today; we have this working 
already (the JEP alludes to this, but doesn't dwell on it much) but our 
approach is pretty different.  We get there without adding language 
complexity.

> A 'static method' in an interface (not a thing in java right now) seems
> like an answer here

This is one of those ideas that seems obviously sensible until you work 
the details and realize how `static` has messed up nearly every job it's 
ever been given, and we don't want to give it more jobs to mess up.  
(Our C# friends have exposed something like type classes with abstract 
statics in interfaces for C#10, but their language is different, and 
they're able to get away with it in ways we are not.)  There's an 
equally "obvious but wrong" path called "implements static", that 
dead-ends in the same place.  Really, you're waiting for type classes.

> * Right now, writing `Pattern.compile("foo|bar")`, or even
> `input.replaceAll("someregexp", "replacement")` anywhere in java code is a
> performance problem: That means the regexp needs to be parsed every time
> that code is executed.

I get it, but you're summoning the devil to make a bad bargain here.  
This isn't a feature about constantization of arbitrary expressions.  I 
get that today you're saying "it would be just good enough if I could 
have 10% of that feature", but I promise you, the other 90% will 
eventually whisper into your ear.  That's a whole feature area in itself 
-- and a much bigger one.




More information about the amber-dev mailing list