• Stars
    star
    248
  • Rank 163,560 (Top 4 %)
  • Language
    Julia
  • License
    MIT License
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Client-Daemon workflow to run faster scripts in Julia

DaemonMode

Build Status with Github Action Documentation DOI

Introduction

Julia is a great language, but the Just-in-Time compiler implies that loading a package could takes a considerable time, this is called the first plot problem.

It is true that this time is only required for the first time (and there are options, like using and the package Revise). However, it is a great disadvantage when we want to use Julia to create small scripts.

This package solves that problem. Inspired by the daemon-mode of Emacs, this package uses a server/client model. This allows julia to run scripts a lot faster, because the package is maintained in memory between the runs of (to run the same script several times).

Introduced in JuliaCon 2020

This package has been mentioned in JuliaCon 2020, Thank you, Fredrik Ekre!

DaemonMode in JuliaCon

Presented in JuliaCon 2021

I gave a talk Faster scripts in Julia with DaemonMode.jl about the package and its advantages in the JuliaCon 2021 in the Talk in JuliaCon 2021. Now it is online at: Faster scripts in Julia with DaemonMode.jl.

Usage

  • The server, that is responsible for running all julia scripts:

    julia --startup-file=no -e 'using DaemonMode; serve()'
  • A client, that sends to the server the file to be run, and returns the output obtained (without --startup-file=no could be slow, use that option unless you know you want your .julia/config/startup.jl file run):

    julia --startup-file=no -e 'using DaemonMode; runargs()' program.jl <arguments>

    You can use an alias:

    alias juliaclient='julia --startup-file=no -e "using DaemonMode; runargs()"'

    then, instead of julia program.jl you can do juliaclient program.jl. The output should be the same, while running much faster.

Process

The process is the following:

  1. The client process sends the program program.jl with the required arguments to the server.

  2. The server receives the program name, and run it, returning the output to the client process.

  3. The client process receives the output and shows it to the console.

Example

Suppose that we have the script test.jl

using CSV, DataFrames

fname = only(ARGS)
df = CSV.File(fname) |> DataFrame
println(first(df, 3))

The normal method is:

$ time julia test.jl tsp_50.csv
...
3Γ—2 DataFrame
β”‚ Row β”‚ x        β”‚ y          β”‚
β”‚     β”‚ Float64  β”‚ Float64    β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.420169 β”‚ 0.628786   β”‚
β”‚ 2   β”‚ 0.892219 β”‚ 0.673288   β”‚
β”‚ 3   β”‚ 0.530688 β”‚ 0.00151249 β”‚

real	0m18.831s
user	0m18.670s
sys	    0m0.476s

Only loading the CSV, DataFrames, and reading a simple file takes 18 seconds on my computer. Every time that you run the program is going to take these 18 seconds.

using DaemonMode:

$ julia --startup-file=no -e 'using DaemonMode; serve()' &
$ time juliaclient test.jl tsp_50.csv
3Γ—2 DataFrames.DataFrame
β”‚ Row β”‚ x        β”‚ y          β”‚
β”‚     β”‚ Float64  β”‚ Float64    β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.420169 β”‚ 0.628786   β”‚
β”‚ 2   β”‚ 0.892219 β”‚ 0.673288   β”‚
β”‚ 3   β”‚ 0.530688 β”‚ 0.00151249 β”‚

real	0m18.596s
user	0m0.329s
sys	0m0.318s

But next time (and thereafter), it is a lot faster (I accept donations :-)):

$ time juliaclient test.jl tsp_50.csv
3Γ—2 DataFrames.DataFrame
β”‚ Row β”‚ x        β”‚ y          β”‚
β”‚     β”‚ Float64  β”‚ Float64    β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.420169 β”‚ 0.628786   β”‚
β”‚ 2   β”‚ 0.892219 β”‚ 0.673288   β”‚
β”‚ 3   β”‚ 0.530688 β”‚ 0.00151249 β”‚

real	0m0.355s
user	0m0.336s
sys	0m0.317s

A reduction from 18s to 0.3s, the *next run only take a 2% of the original time.

Also, you can change the file and the performance is maintained:

test2.jl:

using CSV, DataFrames

fname = only(ARGS)
df = CSV.File(fname) |> DataFrame
println(last(df, 10))
$ time juliaclient test2.jl tsp_50.csv
10Γ—2 DataFrames.DataFrame
β”‚ Row β”‚ x        β”‚ y        β”‚
β”‚     β”‚ Float64  β”‚ Float64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.25666  β”‚ 0.405932 β”‚
β”‚ 2   β”‚ 0.266308 β”‚ 0.426364 β”‚
β”‚ 3   β”‚ 0.865423 β”‚ 0.232437 β”‚
β”‚ 4   β”‚ 0.462485 β”‚ 0.049489 β”‚
β”‚ 5   β”‚ 0.994926 β”‚ 0.887222 β”‚
β”‚ 6   β”‚ 0.867568 β”‚ 0.302558 β”‚
β”‚ 7   β”‚ 0.475654 β”‚ 0.607708 β”‚
β”‚ 8   β”‚ 0.18198  β”‚ 0.592476 β”‚
β”‚ 9   β”‚ 0.327458 β”‚ 0.354397 β”‚
β”‚ 10  β”‚ 0.765927 β”‚ 0.806685 β”‚

real	0m0.372s
user	0m0.369s
sys	0m0.300s

Evaluate an expression on the server

Alternatively, a String can be passed to the server which is then parsed and evaluated in its global scope.

using DaemonMode

runexpr("using CSV, DataFrames")

fname = "tsp_50.csv";

runexpr("""begin
      df = CSV.File("$fname") |> DataFrame
      println(last(df, 3))
  end""")
3Γ—2 DataFrames.DataFrame
β”‚ Row β”‚ x        β”‚ y          β”‚
β”‚     β”‚ Float64  β”‚ Float64    β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.420169 β”‚ 0.628786   β”‚
β”‚ 2   β”‚ 0.892219 β”‚ 0.673288   β”‚
β”‚ 3   β”‚ 0.530688 β”‚ 0.00151249 β”‚

Avoid conflict of names

The function names could conflict with the variable and function name of new files, because they are constants. In order to avoid any possible problem DaemonMode run all files in its own module to avoid any conflict of names.

Thus, if we have two files like:

# conflict1.jl
f(x) = x + 1
@show f(1)

and

# conflict2.jl
f = 1
@show f + 1

The DaemonMode client could run each one of them after the other one without any problem.

Running several clients at the same time

In previous versions, the server run one task for each client. However, since v0.1.5 DaemonMode is able to run each client in parallel. However, you can run the server function with the parameter async=false to have the previous behaviour.

$  julia -e 'using DaemonMode; serve(async=false)'

With the optional parameter async=true to server, the server run each client in a new task.

$  julia -e 'using DaemonMode; serve(async=true)'

That command will allow to run different clients parallel, but it will use only one CPU.

If you want to use several threads, you can do:

$  julia -t auto -e 'using DaemonMode; serve(async=true)'

Auto allows DaemonMode to use all processors of the computer, but you can put -t 1, -t 2, ...

The async mode have several advantages:

  • You can run any new client without waiting the previous close.

  • If one process ask for close the Daemon, it will wait until all clients have been finished.

  • With several threads (indicated with -t), you can run several clients in different CPUs, without increasing the time for each client. If there is only one process, the processing time will be divided between the different clients.

The main drawback is that the @show and logs in console can be send to the last task.

Automatically reload the modified packages

DaemonMode would execute the codes that are directly passed to the server, so each time the codes are updated, you would get the up-to-date results. However, sometimes you may also be developing some packages in the same time, and want they got reloaded when modified. You can use Revise together with DaemonMode for this purpose. You only need to add using Revise, before starting the DaemonMode server:

julia --startup-file=no -e 'using Revise; using DaemonMode; serve()'

Features

  • Performance, because packages are maintained in memory. This is especially interesting with common external packages like CSV.jl, DataFrames.jl, ...

  • The code is run using the current directory as working directory.

  • Robust, if the file has an error, the server continues working (for other scripts, stops for your current one).

  • It accepts parameters without problems.

  • Run complete file and also specific code.

  • Run in multiple modules to avoid conflicts of names.

  • Error Stack as it was run directly.

  • logging output in console working nicely.

  • Return 1 when an error occurs.

  • Multi-threading version.

  • Fix redirect with several tasks.

  • Allow to use function exit in client.

  • Update isinteractive() to show that the run is run in a interactive way.

  • Compatible with Revise.

  • Apply the eval function (required for running MLJ code).

TODO

  • Automatically detect if the daemon is previously run to simplify its usage.

  • Remote version (in which the Server would be in a different computer of the client).

  • Automatic installation of required packages.

More Repositories

1

cec2013lsgo

Package for using CEC'2013 Large Scale Global Optimization benchmark in Python, this benchmark is used also in CEC'2014 and in CEC'2015
C++
26
star
2

shadeils

Source code of the SHADE with Iterative Local Search, an algorithm specially designed for for real-parameter optimization with high dimensionalidad (Large-Scale Global Optimization)
Python
16
star
3

cec2019comp100digit

Wrapper in Python for the CEC'2019 100 digit competition at http://cec2019.org/programs/competitions.html#cec-06
C++
13
star
4

cec2005real

Python version of the CEC'2005 Special Session benchmark
C++
8
star
5

cec2013single

Package for using CEC'2013 Real Single Objective Optimization in Python, deployed in pypi with the same name.
C
7
star
6

taller_pandoc

Taller de Pandoc
HTML
4
star
7

julia_presentacion

Introductory Talk about Julia language / PequeΓ±a presentaciΓ³n del Lenguaje Julia
Jupyter Notebook
4
star
8

MoodleQuestions.jl

Package for Julia to manager questions for the Moodle learning framework
Julia
4
star
9

curso_python

Python3 course (init level) in Spanish
Jupyter Notebook
3
star
10

es_intro_python

Introduction of Python for Business Intelligence (in Spanish)
Jupyter Notebook
3
star
11

template_course_pandoc

This repository create a template to use pandoc for generating slides (and PDF versions) for my courses.
TeX
2
star
12

pyreal

Implementation in Python of evolutionary algorithms
Python
2
star
13

tflsgo_comp

Website to automatically compare results of algorithms for real and large-scale global optimization
JavaScript
2
star
14

cec2017real

Repository with source code and tools for comparing metaheuristics using the CEC'2017 benchmark from https://github.com/P-N-Suganthan/CEC2017-BoundContrained (For teaching)
C
2
star
15

dmolina.github.com

Personal Webpage
HTML
2
star
16

lsgo_gpu

high-dimension version of benchmark LSGO (http://goanna.cs.rmit.edu.au/~xiaodong/cec13-lsgo/competition/).
C++
1
star
17

taskmanager

Task Manager
1
star
18

test_github_student

Repository for teaching to use Github.
1
star
19

ssgatests

Example of using BDD to test a non-deterministic steady-state genetic algorithm
Python
1
star
20

my-blog

My blog (using Hugo)
HTML
1
star
21

academic

Academic Information and blog (using Hugo)
HTML
1
star