`and`

operations was recently discussed at length on my student, Lindsey Kuper’s, blog.This is a follow-up to that post, with the purpose of laying out a taxonomy of conjunction operations that might occur in a parallel language. (The same discussion applies equally to disjunction.) Below I classify these operations into five levels, from basic to full featured. A good context to imagine as we go through this is a computationally intensive search problem with lots of nested parallel conjunctions and disjunctions that expose opportunities for parallelism.

Before we get started, some ground rules. We need the operation to take some kind of deferred computations or thunks. Further, we assume these can be turned into

*futures*for exposing parallelism. In our case, we use Haskell and the

`Par`

monad so this means that we expect something like this:`result <- span=""> asyncAnd comp1 comp2 `

where `comp1`

and `comp2`

are monadic computations that, once executed, return a `Bool`

value. But the distinctions below could just as well apply to a plain, non-monadic `and`

operation in a lazy, pure language — for example, Haskell’s `(&&)`

operator.### Level One

**Strict**: Evaluate both`comp1`

and`comp2`

in parallel, and then return the logical and of the resulting booleans.

`and`

is parallel, but is also less work-efficient than the *sequential*left-to-right short-circuiting

`&&`

operation found in most programming languages. It always evaluates both computations fully. Thus, our next upgrade is short-circuiting.### Level Two

Here it helps to generalize a bit and consider an`and`

that takes a list of an arbitrary number of input computations:**Short-circuiting**: for a list of input computations, evaluate them using a machine-specific degree of parallelism, but also*stop*launching new computations whenever a`False`

is encountered in an already-completed computation.

*asymmetric*, much like left-to-right sequential

`(&&)`

. That is, some input computations have the ability to short-circuit (deschedule) others, but the relationship is not fair; it’s not all-to-all.You can build an N-way

`and`

meeting this limited criteria using (1) a level-one parallel `and`

together with (2) a short-circuiting sequential `and`

. The two operators could be combined by a heuristic strategy for determining how many parallel tasks are appropriate, and then "bottoming out" to the sequential version. Still, as long as the sequential `and`

is used, inputs are handled asymmetrically; and as long as a level-one parallel `and`

is used we will run unnecessary computations that can’t be descheduled or canceled. Indeed, this is a real tension: work-efficiency vs. parallelism.### Level Three

**Short-circuiting, symmetric**: all input computations are potentially evaluated in parallel, and any one of them returning`False`

quickly can potentially cause the result of the entire`and`

operation to become available immediately.

`and`

operation returns `False`

that remaining computations will be cancelled in bounded time. Thus our fourth level concerns cancellation.### Level Four

**Symmetric with cancellation**: When an early answer is given, all other threads/tasks are told to stop computing because their answers are no longer needed. There exists a bound on how much computation lingering child-computations of the`and`

may do after this point in time.

`and`

s are pretty rare in practical parallel languages and libraries, simply because [transitive] task cancellation is not a common feature of modern parallel runtime systems. I would be happy to be wrong about this! So please share any links to standard library documentation that mention level-four parallel `and`

s out there in the wild.### Level Five

Our final stop is a feature that is not strictly a property of the conjunction operator itself, but rather a complementary feature we find useful in some of our applications. It concerns the ability of canceled computations to do useful work, by storing partial results that are computed before cancellation occurs.In an unrestricted parallel language, canceled computations may have arbitrary side effects, which, in general,

*race*with the cancellation message itself. In a deterministic language such side effects cannot be allowed; rather, it is usually safe to cancel only "pure" computations. There is one exception, however, a

*memoized pure function*provides a "side effect free" way of storing partial work.

**Cancellation + memoization**: Canceled threads never provide a final result, but they can contribute to other computations by adding to their memo-tables.

Our recent draft paper on "taming the parallel effect zoo", discusses the combination of cancellation and memoization further.

Is there a level six? Is there anything else we might ask of our parallel logical operators?

## Final thoughts: Parallel conjunction/disjunction in GHC

GHC with "sparks" (futures) can do level-two`and`

s, for example by sparking the arguments to `(&&)`

. But short-circuiting must follow a predetermined order. Further, cancellation is not possible with sparks, because sparked computations cannot be removed from the spark pool.If one could use internal GHC functions to determine whether an object is a THUNK or CONSTR at runtime, it would then be possible to do more flexible short-circuiting, for example, sparking some elements of a

`[Bool]`

while executing others on the current thread, and inbetween executions, polling for completion of the already-sparked futures. Yet this road seems unappealing for a number of reasons.Alternatively, rather than using the spark mechanism, it is certainly possible to use

`IO`

and `MVar`

s to implement parallel-and in Haskell, as Conal Elliot does in this post, using an `unamb`

combinator. But `forkIO`

is vastly higher overhead than sparks, with our own `Par`

monad’s futures falling somewhere inbetween. Moreover, our desire to include transitive cancellation and memoization features motivates us to build these operators on a custom `Par`

monad rather than with `IO`

or sparks directly.
Cilk had an ad-hoc feature called "aborts" for realizing #4 which was added for CilkChess and cited in the writeup for the winning Cilk entry in the first-ever ICFP contest (http://people.csail.mit.edu/pousse/). If you squint right it also, to some extent, was #5 – searches would continue to create transposition table entries until they were terminated. Perhaps that counts as #6, but I think it really throws the orthogonality of memo table update into sharp relief.

ReplyDeleteI attempted an implementation of #3 in pH. The runtime machinery wasn't especially pretty or generalizable. It was at least local, which Cilk-style aborts most certainly aren't (and require runtime support or compiled-in flag checks all over the place). The Eager Haskell implementation started as a lazily restartable abort mechanism (stop where you are and create thunks for whatever is left to do).

Looking at the original post, and your use cases here, it seems like it might be instructive to think of the parallel AND problem as one that cares about a *pair* of values. One is the count of arriving writers. The other is the boolean value ordered T < F. We care about either the count being >= n or the value being >=F. Products should admit a natural lattice extension, so this *ought* to carry over gracefully, and I'm probably missing some crucial reason it doesn't.