fold
for humans™
FLoops:
FLoops.jl provides a macro
@floop
. It can be used to generate a fast generic sequential and parallel
iteration over complex collections.
Furthermore, the loop written in @floop
can be executed with any compatible
executors.
See FoldsThreads.jl for
various thread-based executors that are optimized for different kinds of
loops. FoldsCUDA.jl provides an
executor for GPU. FLoops.jl also provide a simple distributed executor.
Update notes
FLoops.jl 0.2 defaults to a parallel loop; i.e., it uses a parallel executor
(e.g., ThreadedEx
) when the executor is not specified and the explicit
sequential form @floop begin ... end
is not used.
That is to say, @floop
without @reduce
such as
@floop for i in eachindex(ys, xs)
ys[i] = f(xs[i])
end
is now executed in parallel by default.
Usage
Parallel loop
@floop
is a superset of Threads.@threads
(see below) and in particular
supports complex reduction with additional syntax @reduce
:
julia> using FLoops # exports @floop macro
julia> @floop for (x, y) in zip(1:3, 1:2:6)
a = x + y
b = x - y
@reduce s += a
@reduce t += b
end
(s, t)
(15, -3)
For more examples, see parallel loops tutorial.
Sequential (single-thread) loop
Simply wrap a for
loop and its initialization part with @floop begin ... end
:
julia> @floop begin
s = 0
for x in 1:3
s += x
end
end
s
6
For more examples, see sequential loops tutorial.
Threads.@threads
Advantages over @floop
is a superset of Threads.@threads
and has a couple of advantages:
@floop
supports various input collection types including arrays, dicts, sets, strings, and many iterators fromBase.Iterators
such aszip
andproduct
. More precisely,@floop
can generate high-performance parallel iterations for any collections that supports SplittablesBase.jl interface.- With
FoldsThreads.NondeterministicEx
,@floop
can even parallelize iterations over non-parallelizable input collections (although it is beneficial only for heavier workload). - FoldsThreads.jl provides multiple alternative thread-based executors (= loop execution backend) that can be used to tune the performance without touching the loop itself.
- FoldsCUDA.jl provides a simple GPU executor.
@reduce
syntax for supporting complex reduction in a forward-compatible manner- Note:
threadid
-based reduction (that is commonly used in conjunction with@threads
) may not be forward-compatible to Julia that supports migrating tasks across threads.
- Note:
- There is a trick for "changing" the effective number of threads without
restarting
julia
using thebasesize
option.
The relative disadvantages may be that @floop
is much newer than
Threads.@threads
and has much more flexible internals. These points can
contribute to undiscovered bugs.
How it works
@floop
works by converting the native Julia for
loop syntax to
foldl
defined by
Transducers.jl. Unlike
foldl
defined in Base
, foldl
defined by Transducers.jl is
powerful enough to cover the for
loop semantics and more.