Stream/IntermediateOp properties/flags

Paul Sandoz paul.sandoz at oracle.com
Tue Oct 2 05:19:38 PDT 2012


I updated the webrev:

  http://cr.openjdk.java.net/~psandoz/lambda/flags/webrev/

For some reason i previously used '-' instead of '_' to delineate digits of a number!

I flipped the switch for TreeUtils.collect to use TreeUtils.SizedCollectorTask (and fixed a bug in the task splitting), and implemented ToArrayOp.evalulateParallel.

It's hard to verify if the array optimisation path is enabled. I suppose performance tests could verify but it would be better to inject probes into certain methods while testing.

Paul.

On Oct 1, 2012, at 11:31 AM, Paul Sandoz <Paul.Sandoz at oracle.com> wrote:

> Hi,
> 
> Here is a patch that makes properties/flags for operations be invariant to the stream source:
> 
>  http://cr.openjdk.java.net/~psandoz/lambda/flags/webrev/
> 
> this is based on top of the previous patch for refactoring terminal operations [1].
> 
> This feature can:
> 
> - improve the invariance of pipeline helper to the stream source;
> 
> - support detached pipelines, where the properties of the pipeline can be pre-calculated independent of the source properties; and
> 
> - make cleaner the IntermediateOp interface. Implementations only need to declare the logical or of properties, rather than perform more bit twizzles, which while not complicated are easily prone to errors.
> 
> - enable analysis of operations in the pipeline (e.g. remove redundant operations, UniqOp can operate differently if it is told, statically, that the upstream stream is will be sorted)
> 
> --
> 
> To support properties independent of the stream source three pieces of information are required:
> 
> 1) the property is known e.g. the output stream is known to be sorted
> 2) the property is not known e.g. the output stream is not known to be sorted
>   (there is subtle distinction between "not known to be" and "known not to be". An operation
>    may monkey around with the elements such that they may no longer be distinct, e.g. MapOp,
>    but certain inputs, and/or the map function, may result in certain outputs still being distinct). 
> 3) the property is taken from upstream i.e. identity function
> 
> So two bits are required to represent this information. Given the number of properties we currently have (4) and a max of 16 properties fitting into an "int" i think there is ample space to cope with 2 bits instead of 1 bit.
> 
> A bit pattern of 0b01 represents a known property.
> A bit pattern of 0b10 represents a not known property.
> A bit pattern of 0b11 represents the "take whatever is upstream".
> 
> Then one can use a standard "& with the masks" then  "| with the values" pattern.
> 
> Given the bit patterns of properties it is possible to create a mask from the logical or of the properties themselves, thus enabling intermediate ops to simply declare the logical or of properties.
> 
> For now i have retained two methods on ops for comparison, e.g. in UniqueOp:
> 
>    @Override
>    public int getOpFlags() {
>        return StreamProperties.FLAG_IS_DISTINCT | StreamProperties.FLAG_NOT_SIZED;
>    }
> 
>    @Override
>    public int getStreamFlags(int upstreamFlags) {
>        // If the upstream is sorted, we need only cache last element
>        // If the upstream is unique, this is a no-op
>        return (upstreamFlags & StreamProperties.FLAG_MASK_SIZED & StreamProperties.FLAG_MASK_DISTINCT)
>               | StreamProperties.FLAG_IS_DISTINCT | StreamProperties.FLAG_NOT_SIZED;
>    }
> 
> 
> In AbstractPipeline the code is as follows to calculate the properties given the operations and pipeline:
> 
>            int opsFlags = StreamProperties.FLAG_MASK;
>            for (int i = from; i < to; i++) {
>                isIntermediateShortCircuit |= ops[i].isShortCircuit();
> 
>                // @@@ Declarative implementation, if IntermediateOp#getStreamFlags is removed
> //                int opFlags = ops[i].getOpFlags();
> //                opsFlags = (opsFlags & StreamProperties.getStreamFlagsMask(opFlags)) | opFlags;
>                opsFlags = ops[i].getStreamFlags(opsFlags);
>            }
> 
>            ...  
>            this.flags = sourceFlags & opsFlags;
> 
> 
> I am inclined to go with the simplicity of declarative approach for a slightly increased cost in the number bit twizzles, which i suspect is likely to add very little to the fixed cost of constructing a pipeline. A default method can be added to IntermediateOp if we really want to allow the choice:
> 
>  getStreamFlags(int upstreamFlags) default {
>    int flags = getOpFlags();
>    return (upstreamFlags & StreamProperties.getStreamFlagsMask(flags)) | flags;
>  }
> 
> Paul.
> 
> [1] http://cr.openjdk.java.net/~psandoz/lambda/ophelper/webrev/
> 
> 



More information about the lambda-dev mailing list