Domainslib - Nested-parallel programming
Domainslib provides support for nested-parallel programming. Domainslib provides async/await mechanism for spawning parallel tasks and awaiting their results. On top of this mechanism, domainslib provides parallel iteration functions. At its core, domainslib has an efficient implementation of work-stealing queue in order to efficiently share tasks with other domains.
Here is a sequential program that computes nth Fibonacci number using recursion:
(* fib.ml *)
let n = try int_of_string Sys.argv.(1) with _ -> 1
let rec fib n = if n < 2 then 1 else fib (n - 1) + fib (n - 2)
let main () =
let r = fib n in
Printf.printf "fib(%d) = %d\n%!" n r
let _ = main ()
We can parallelise this program using Domainslib:
(* fib_par.ml *)
let num_domains = try int_of_string Sys.argv.(1) with _ -> 1
let n = try int_of_string Sys.argv.(2) with _ -> 1
(* Sequential Fibonacci *)
let rec fib n =
if n < 2 then 1 else fib (n - 1) + fib (n - 2)
module T = Domainslib.Task
let rec fib_par pool n =
if n > 20 then begin
let a = T.async pool (fun _ -> fib_par pool (n-1)) in
let b = T.async pool (fun _ -> fib_par pool (n-2)) in
T.await pool a + T.await pool b
end else
(* Call sequential Fibonacci if the available work is small *)
fib n
let main () =
let pool = T.setup_pool ~num_domains:(num_domains - 1) () in
let res = T.run pool (fun _ -> fib_par pool n) in
T.teardown_pool pool;
Printf.printf "fib(%d) = %d\n" n res
let _ = main ()
The parallel program scales nicely compared to the sequential version. The results presented below were obtained on a 2.3 GHz Quad-Core Intel Core i7 MacBook Pro with 4 cores and 8 hardware threads.
$ hyperfine './fib.exe 42' './fib_par.exe 2 42' \
'./fib_par.exe 4 42' './fib_par.exe 8 42'
Benchmark 1: ./fib.exe 42
Time (mean ± sd): 1.217 s ± 0.018 s [User: 1.203 s, System: 0.004 s]
Range (min … max): 1.202 s … 1.261 s 10 runs
Benchmark 2: ./fib_par.exe 2 42
Time (mean ± sd): 628.2 ms ± 2.9 ms [User: 1243.1 ms, System: 4.9 ms]
Range (min … max): 625.7 ms … 634.5 ms 10 runs
Benchmark 3: ./fib_par.exe 4 42
Time (mean ± sd): 337.6 ms ± 23.4 ms [User: 1321.8 ms, System: 8.4 ms]
Range (min … max): 318.5 ms … 377.6 ms 10 runs
Benchmark 4: ./fib_par.exe 8 42
Time (mean ± sd): 250.0 ms ± 9.4 ms [User: 1877.1 ms, System: 12.6 ms]
Range (min … max): 242.5 ms … 277.3 ms 11 runs
Summary
'./fib_par2.exe 8 42' ran
1.35 ± 0.11 times faster than './fib_par.exe 4 42'
2.51 ± 0.10 times faster than './fib_par.exe 2 42'
4.87 ± 0.20 times faster than './fib.exe 42'
More example programs are available here.
Installation
You can install this library using OPAM
.
$ opam switch create 5.0.0+trunk --repo=default,alpha=git+https://github.com/kit-ty-kate/opam-alpha-repository.git
$ opam install domainslib
Development
If you are interested in hacking on the implementation, then opam pin
this repository:
$ opam switch create 5.0.0+trunk --repo=default,alpha=git+https://github.com/kit-ty-kate/opam-alpha-repository.git
$ git clone https://github.com/ocaml-multicore/domainslib
$ cd domainslib
$ opam pin add domainslib file://`pwd`