Stream/IntermediateOp properties/flags
Paul Sandoz
paul.sandoz at oracle.com
Tue Oct 2 05:19:38 PDT 2012
I updated the webrev:
http://cr.openjdk.java.net/~psandoz/lambda/flags/webrev/
For some reason i previously used '-' instead of '_' to delineate digits of a number!
I flipped the switch for TreeUtils.collect to use TreeUtils.SizedCollectorTask (and fixed a bug in the task splitting), and implemented ToArrayOp.evalulateParallel.
It's hard to verify if the array optimisation path is enabled. I suppose performance tests could verify but it would be better to inject probes into certain methods while testing.
Paul.
On Oct 1, 2012, at 11:31 AM, Paul Sandoz <Paul.Sandoz at oracle.com> wrote:
> Hi,
>
> Here is a patch that makes properties/flags for operations be invariant to the stream source:
>
> http://cr.openjdk.java.net/~psandoz/lambda/flags/webrev/
>
> this is based on top of the previous patch for refactoring terminal operations [1].
>
> This feature can:
>
> - improve the invariance of pipeline helper to the stream source;
>
> - support detached pipelines, where the properties of the pipeline can be pre-calculated independent of the source properties; and
>
> - make cleaner the IntermediateOp interface. Implementations only need to declare the logical or of properties, rather than perform more bit twizzles, which while not complicated are easily prone to errors.
>
> - enable analysis of operations in the pipeline (e.g. remove redundant operations, UniqOp can operate differently if it is told, statically, that the upstream stream is will be sorted)
>
> --
>
> To support properties independent of the stream source three pieces of information are required:
>
> 1) the property is known e.g. the output stream is known to be sorted
> 2) the property is not known e.g. the output stream is not known to be sorted
> (there is subtle distinction between "not known to be" and "known not to be". An operation
> may monkey around with the elements such that they may no longer be distinct, e.g. MapOp,
> but certain inputs, and/or the map function, may result in certain outputs still being distinct).
> 3) the property is taken from upstream i.e. identity function
>
> So two bits are required to represent this information. Given the number of properties we currently have (4) and a max of 16 properties fitting into an "int" i think there is ample space to cope with 2 bits instead of 1 bit.
>
> A bit pattern of 0b01 represents a known property.
> A bit pattern of 0b10 represents a not known property.
> A bit pattern of 0b11 represents the "take whatever is upstream".
>
> Then one can use a standard "& with the masks" then "| with the values" pattern.
>
> Given the bit patterns of properties it is possible to create a mask from the logical or of the properties themselves, thus enabling intermediate ops to simply declare the logical or of properties.
>
> For now i have retained two methods on ops for comparison, e.g. in UniqueOp:
>
> @Override
> public int getOpFlags() {
> return StreamProperties.FLAG_IS_DISTINCT | StreamProperties.FLAG_NOT_SIZED;
> }
>
> @Override
> public int getStreamFlags(int upstreamFlags) {
> // If the upstream is sorted, we need only cache last element
> // If the upstream is unique, this is a no-op
> return (upstreamFlags & StreamProperties.FLAG_MASK_SIZED & StreamProperties.FLAG_MASK_DISTINCT)
> | StreamProperties.FLAG_IS_DISTINCT | StreamProperties.FLAG_NOT_SIZED;
> }
>
>
> In AbstractPipeline the code is as follows to calculate the properties given the operations and pipeline:
>
> int opsFlags = StreamProperties.FLAG_MASK;
> for (int i = from; i < to; i++) {
> isIntermediateShortCircuit |= ops[i].isShortCircuit();
>
> // @@@ Declarative implementation, if IntermediateOp#getStreamFlags is removed
> // int opFlags = ops[i].getOpFlags();
> // opsFlags = (opsFlags & StreamProperties.getStreamFlagsMask(opFlags)) | opFlags;
> opsFlags = ops[i].getStreamFlags(opsFlags);
> }
>
> ...
> this.flags = sourceFlags & opsFlags;
>
>
> I am inclined to go with the simplicity of declarative approach for a slightly increased cost in the number bit twizzles, which i suspect is likely to add very little to the fixed cost of constructing a pipeline. A default method can be added to IntermediateOp if we really want to allow the choice:
>
> getStreamFlags(int upstreamFlags) default {
> int flags = getOpFlags();
> return (upstreamFlags & StreamProperties.getStreamFlagsMask(flags)) | flags;
> }
>
> Paul.
>
> [1] http://cr.openjdk.java.net/~psandoz/lambda/ophelper/webrev/
>
>
More information about the lambda-dev
mailing list