Proposal: optimization of Map.copyOf and Collectors.toUnmodifiableMap

Tue Jun 19 02:09:42 UTC 2018

On 6/17/18 4:23 PM, Peter Levart wrote:
> My attempt to optimize the Map.copyOf() was also using just public APIs (with 
> the addition of an internal class).

> Well, it does speed-up Map.copyOf() by 30%. But I agree there are other 
> approaches that could go even further. In particular when all intermediary 
> copies could be avoided.
> 
> Here's one approach (which still uses just public APIs) and avoids intermediary 
> copying:

You mentioned "using just public APIs" a couple times. Are you viewing that as a 
constraint? I don't think it is.

> http://cr.openjdk.java.net/~plevart/jdk-dev/UnmodifiableMap_copyOf/webrev.02/
> 
> Why choosing Map.forEach for dumping the content of the map instead of iteration 
> over entrySet? Because in some Map implementations this is the most direct 
> iteration without producing garbage (for example in IdentityHashMap), but in the 
> worst case (default Map method for example) it is delegated to iteration over 
> entrySet.

Sure, entrySet().toArray() does a lot of allocation and copying. Using forEach() 
can avoid this, but it doesn't necessarily preserve the right invariants. In 
particular, if the source is a concurrent map, the number of times forEach's 
BiConsumer is called might not match size(), if the map has changed size. So, 
the forEach approach has to deal with the possibility of it not being called 
exactly size() times, whereas we know that the array from toArray() can't change 
size (and that its contents are consistent, for some definition of consistent).

This change also publishes a reference to the Map under construction before its 
constructor has returned. I'm not sure of all the memory model implications of this.

This change also hands out an object that has access to the new Map instance's 
internals. You're aware of this, of course, because you snapshot the current 
thread and null it out later, saying

     // prevent naughty Map implementations from modifying MapN after
     // it is fully constructed

So there are potential security vulnerabilities here, which requires some 
serious thought. What you have *seems* reasonable, but my experience is that 
things that are subtle but that seem reasonable actually end up having security 
holes.

I'm trying to think of some alternatives.

The issue with the forEach() approach on an arbitrary map is that you have to 
hand out a BiConsumer, and malicious code can call it even after forEach() has 
returned. What's necessary is to have a way to shut off the BiConsumer before 
reading out what it's accumulated. I don't know of a simple, fast, and correct 
way to do this. (Choose any two!) The "enabled" state (whether a Thread instance 
or just a boolean) will probably have to be a volatile that must be checked 
every time. What are the performance implications? Anyway, while it seems 
promising, the forEach() approach seems like it leads off into the weeds.

Unless you can verify the trustworthiness of the source. Suppose you check the 
source map to see if its getClass() == HashMap.class. Then you can be reasonably 
assured that forEach() won't misuse the BiConsumer. You can probably also rely 
on size() being stable. (Probably also include LinkedHashMap.)

The toUnmodifiableMap collectors can benefit from this if they're changed to use 
Map::copyOf, as in your current webrev.

You could also provide a similar fast path for ConcurrentHashMap. It'd have to 
be resilient against size changes, but you can trust that it won't misuse the 
BiConsumer.

Finally, other map implementations will just have to suffer through the 
multi-copy slow path.

s'marks