• Stars
    star
    181
  • Rank 212,110 (Top 5 %)
  • Language
  • Created about 3 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A tutorial for setting a new machine with core data science tools

WIP 🚧 🏗 pre spellcheck

Hello 👋

After setting/reinstalling a couple of machines from scratch in the last few months, I decided for once and for all to document my default data science settings and tools I typically used.

💡 A pro tip 👉🏼 avoid dropping a cup of ☕️ on your machine 🤦🏻‍♂️

That includes installing programming languages such as R, Julia, and Python and their supporting IDEs RStudio and VScode. In addition, set the terminal, git, and install supporting tools such as iTerm2, oh-my-zsh, Docker, etc.

Update: This setting is up-to-date with macOS Ventura. However, most of the tools in this document should be OS agnostic (e.g., Windows, Linux, etc.) with some minor modifications.

This document covers the following:

Setting Git and SSH

This section focuses on the core git settings, such as global definitions and setting SSH with your Github account.

All the settings in the sections are done through the command line (unless mentioned otherwise).

Let's start by checking the git version running the following:

git --version

If this is a new computer or you did not set it before, it should prompt a window and ask you if you want to install the command line developer tools:

image

The command line developer tools is required to run git commands. Once installed, we can go back to the terminal and set the global git settings.

Set Git global options

Git enables setting both local and global options. The global options will be used as default settings any time triggering a new repository with the git init command. You can override the global settings on specific repo by using local settings. Below, we will define the following global settings:

  • Git user name
  • Git user email
  • Default branch name
  • Global git ignore file
  • Default editor (for merging comments)

Set git user name and email

Setting global user name and email by using the config --global command:

git config --global user.name "USER_NAME"
git config --global user.email "[email protected]"

Set default branch name

Next, let's set the default branch name as main using the init.defaultBranch argument:

git config --global init.defaultBranch main

Set global Git ignore file

The global .gitignore file enables you to set general ignore roles that will apply automatically to all repositories in your machine. This is useful when having repetitive cases of files you wish to ignore by default. A good example on Mac is the system file -.DS_Store, which is auto-generated on each folder, and you probably do not want to commit it. First, let's create the global .gitignore file using the touch command:

touch ~/.gitignore

Next, let's define this file as global:

git config --global core.excludesFile ~/.gitignore

Once the global ignore file is set, we can start adding the files we want git to ignore systematically. For example, let's add the .DS_Store to the global ignore file:

echo .DS_Store >> ~/.gitignore

Note: You want to be careful about the files you add to the global ignore file. Unless it is applicable to all cases, such as the .DS_Store example, you should not add it to the global settings and define it locally to avoid a git disaster.

Set default editor

Git enables you to set the default shell code editor to create and edit your commit messages with the core.editor argument. Git supports the main command line editors such as vim, emacs, nano, etc. I set main as vim:

git config --global core.editor "vim"

Review and modify global config settins

By default, all the global settins saved to the config file under the .ssh folder. You can review the saved settings, modify and add new ones manually by editing the config file:

vim ~/.gitconfig

Set SSH with Github

Setting SSH key required to sync your local git repositories with the origin. By default, when creating the SSH keys it writes the files under the .ssh folder, if exists, otherwise it writes it down under the root folder. It is more "clean" to have it under the .ssh folder, therefore, my settings below assume this folder exists.

Let's start by creating the .ssh folder:

mkdir ~/.ssh

The ssh-keyget command creates the SSH keys files:

To set SSH key on your local machine you need to use ssh-keyget:

ssh-keygen -t ed25519 -C "[email protected]"

Note: The -t argument defines the algorithm type for the authentication key, in this case I used ed25519 and the -C argument enables adding comment,in this case the user name email for reference.

After runngint the ssh-keygen command, it will prompt for setting file name and password (optional). By default it will save it under the root folder.

Note: this process will generate two files:

  • your_ssh_key is the private key, you should not expose it
  • your_ssh_key.pub is the public key which will be used to to set the SSH on Github

The next step is to register the key on your Github account. On your account main page go to the Settings menu and select on the main menu SSH and GPG keys (purple rectangle 👇🏼) and click on the New SSH key (yellow rectangle 👇🏼):

Screenshot_ssh1

Next, set the key name under the title text box (purple rectangle 👇🏼), and paste your public key to the key box (turquoise rectangle 👇🏼):

Screenshot_ssh2

Note: I set the machine nickname (e.g., MacBook Pro 2017, Mac Pro, etc.) as the key title to easily identify the relevant key in the future.

Next step is to update the config file on the ~/.ssh folder. You can edit the config file with vim:

vim ~/.ssh/config 

and add somewhere on the file the following code:

Host *
  AddKeysToAgent yes
  UseKeychain yes
  IdentityFile ~/.ssh/your_ssh_key

Where your_ssh_key is the private key file name

Last, run the following to load the key:

ssh-add --apple-use-keychain ~/.ssh/your_ssh_key

Resources

Install Command Lines Tools

This section covers core command lines tools.

Homebrew

The Homebrew (or brew) enables you to install CL packages and tools for Mac. To install brew run from the terminal:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

After finishing the installation, you may need to run the following commends (follow the instractions at the end of the installation):

(echo; echo ‘eval “$(/opt/homebrew/bin/brew shellenv)“’) >> /Users/USER_NAME/.zprofile
eval$(/opt/homebrew/bin/brew shellenv)

More info available: https://brew.sh/

jq

The jq is a lightweight and flexible command-line JSON processor. You can install it with brew:

brew install jq

Install Docker

There are multiple ways to spin a VM locally to run Docker. I typically use Docker Desktop, and for learning purposes (e.g., Kubernetes) I also install Minikube.

Install Docker Desktop

Go to Docker website and follow the intallation instractions according to your OS:

Install Minikube

Minikube enables you to set virtual environment to run Docker. This is mainly relevant if you are using macOS or Windows and want to run Docker via cli. To install Minikube you will need to install first kubectl, hyperkit. We will use brew to install all those components:

brew install kubectl
brew install hyperkit
brew install docker
brew install minikube

Lunching minikube with the start argument and setting the memory and cpu allocation:

> minikube start --memory 4096 --cpus 2 --driver hyperkit
😄  minikube v1.24.0 on Darwin 12.0.1
    ▪ MINIKUBE_ACTIVE_DOCKERD=minikube
✨  Using the hyperkit driver based on user configuration
👍  Starting control plane node minikube in cluster minikube
🔥  Creating hyperkit VM (CPUs=2, Memory=4096MB, Disk=20000MB) ...
🐳  Preparing Kubernetes v1.22.3 on Docker 20.10.8 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

Lunch Docker:

eval $(minikube -p minikube docker-env)

Check the Docker status:

> docker info
Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 15
  Running: 14
  Paused: 0
  Stopped: 1
 Images: 10
 Server Version: 20.10.8
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e25210fe30a0a703442421b0f60afac609f950a3
 runc version: 4144b63817ebcc5b358fc2c8ef95f7cddd709aa7
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.19.202
 Operating System: Buildroot 2021.02.4
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 3.847GiB
 Name: minikube
 ID: 2IME:DJBF:L32S:HA4Q:DFCX:2LRI:JBCQ:6ORQ:RHUE:Q4S6:7WYE:PUD7
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
  provider=hyperkit
 Experimental: false
 Live Restore Enabled: false
 Product License: Community Engine

Resources

Setting Terminal

This section focuses on installing and setting tools for working on the terminal.

Install iTerm2

The terminal is the built-in emulator on mac. I personally love to work with iTerm2 as it provides additional functionality and customization options. iTerm2 is available only for mac, and can be installed directly from the iTerm2 website or via homebrew:

> brew install --cask iterm2
.
.
.
==> Installing Cask iterm2
==> Moving App 'iTerm.app' to '/Applications/iTerm.app'
🍺  iterm2 was successfully installed!

Install zsh

The next step is to install Z shell or zsh. The zsh is shell flavor built on top of bash, providing a variety of add-in tools on the terminal. We will use homebrew again to install zsh:

> brew install zsh
.
.
.
==> Installing zsh
==> Pouring zsh--5.8_1.monterey.bottle.tar.gz
🍺  /usr/local/Cellar/zsh/5.8_1: 1,531 files, 14.7MB

Install and Set Oh-My-Zsh

After installing the zsh we will install oh-my-zsh, an open-source framework for managing zsh configuration. We wiil install it with the curl command:

 sh -c "$(curl -fsSL https://raw.github.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"

You can note that your terminal view changed (you may need to reset your terminal to see the changes) and the default command line cursor looks like:

~

The default setting of Oh My Zsh stored on ~/.zshrc and you can modify the default theme by editing the file:

vim ~/.zshrc

I use the powerlevel10k which can be install by cloning the Github repository (for oh-my-zsh):

git clone --depth=1 https://github.com/romkatv/powerlevel10k.git ${ZSH_CUSTOM:-$HOME/.oh-my-zsh/custom}/themes/powerlevel10k

And then change the theme setting on the ~/.zshrc by ZSH_THEME="powerlevel10k/powerlevel10k". After restarting the terminal, and reopening it you will a sequence of questions on that enables you to set the theme setting:

                            Install Meslo Nerd Font?

(y)  Yes (recommended).

(n)  No. Use the current font.

(q)  Quit and do nothing.

Choice [ynq]:

Note: the Meslo Nerd font is required to display symbles that is being used by the powerlevel10k theme

You can always modify your selection by using:

 p10k configure

The terminal after adding the powerlevel10k theme looks like:

Installing zsh-syntax-highlighting to add code highlight on the terminal:

brew install zsh-syntax-highlighting

After the installation is done you will need to clone the source code. I set the destination as home folder, defining the traget folder hidden:

git clone https://github.com/zsh-users/zsh-syntax-highlighting.git $HOME/.zsh-syntax-highlighting
echo "source $HOME/.zsh-syntax-highlighting/zsh-syntax-highlighting.zsh" >> ${ZDOTDIR:-$HOME}/.zshrc

After you reset your terminal, you should see be able to see the syntex highlight in green (in my case):

Resources

Setting VScode

VScode is a general-purpose IDE and my favorite development environment. VScode supports mutliple OS such as Lunix, MacOS, Windows, and Raspberry Pi.

Installing VScode is straightforward - go to the VScode website https://code.visualstudio.com/ and click on the Download button (purple rectangle 👇🏼):

Download the installation file and follow the instructions. Here are the default extensions settings:

"extensions": [
                "quarto.quarto",
                "ms-azuretools.vscode-docker",
                "ms-python.python",
                "rdebugger.r-debugger",
                "ms-vscode-remote.remote-containers",
                "yzhang.markdown-all-in-one",
                "reditorsupport.r",
                "redhat.vscode-yaml",
                "REditorSupport.r",
                "REditorSupport.r-lsp", 
                "RDebugger.r-debugger" 
            ]

Setting Python

This section focuses on setting up a Python environment.

Installing miniconda

Miniconda is a great tool to set local Python environments. Go to the Miniconda installer page and download the installing package based on your operating system and Python version to install the most recent version. Once Miniconda installed you can install Python packaes with conda:

conda install pandas

Likewise, you can use conda to create an environment:

conda create -n myenv python

Common conda commands

Get a list of environments:

conda info --envs

Create an environment and set the Python version:

conda create --name myenv python=3.9

Get package available versions:

conda search pandas

Activate an enviroment:

conda activate myenv

Get a list of installed packages in the environment:

conda list

Deactivate the enviroment:

conda deactivate

Shortcuts

This section covers the installation and setting of additional tools and features such as screen spliting, shortcuts, etc.

Install R and RStudio

To set in your machine R and RStudio you should start first with installing R from CRAN. Go to https://cran.r-project.org/ and select Download R for macOS and select the release you wish to install and download.

Note: For macOS, there are two versions, depending on the type of your machine CPU - one for Apple silicon arm64 and second for Intel 64-bit.

Once you finish to download the build you select open the pkg fild and start to install it:

image

Note: Older releases available on CRAN Archive.

Once R installed, you can install RStudio - go to https://posit.co website under Products tab and select RStudio IDE and select the version and download it:

After finish to download it move the application into the Application folder.

Set RStudio

Next, let's set the Global options -> go to Tools and then select Global options and update the following:

  • General:
    • Workspace - select Never to Save workspace to .RData on exit option
    • History - untick the first options - Always save history.... This will avoid saving the session on quit
  • Code:
    • Code snippet - under the Editing tab -> Snippet menu -> tick the Enable code snippets option and select Edit Snippets button to edit your snippits. My default snippets available here
    • Rainbow parentheses 🌈 - under the Display tab, tick the Rainbow parentheses box
  • Appearance:
    • select the font type and size, and editor theme (Merbivore Soft):

image

RStudio main shortcuts

  • Clear console - Ctrl + L
  • Clost current document - Cmd + W
  • Move focus to the Source panel - Cmd + 1
  • Move focus to the Console panel - Cmd + 2
  • Move tab left - Cmd + ]
  • Move tab right - Cmd + [
  • Move tab to first - Cmd + P
  • Move tab to last - Cmd + \
  • New Rmarkdown notebook - Cmd + R

Install XQuartz

The XQuartz is an open-source project that provides required for graphic applications (X11) for macOS (similar to the X.Org X Window System functionality). To install it go to https://www.xquartz.org/ - download and install it.

Install Orca

Orca is application for transferring plotly graphs into images. To install the app on macOS:

  • Go to the project Github page and download the most recent release (i.e., mac-release.zip)
  • Unzip the mac-release.zip file.
  • Double-click on the orca-X.Y.Z.dmg file. This will open an installation window.
  • Drag the orca icon into the Applications folder.
  • Open finder and navigate to the Applications/ folder.
  • Right-click on the orca icon and select Open from the context menu.
  • A password dialog will appear asking for permission to add orca to your system PATH.
  • Enter you password and click OK.
  • This should open an Installation Succeeded window.
  • Open a new terminal and verify that the orca executable is available on your PATH.
> which orca
/usr/local/bin/orca

Resources

Installing Julia

To install Julia, go to https://julialang.org/downloads/ to download the current stable version of Julia or older releases. On Mac, the next step after moving the dmg file to the Applications folder, is to add Julia to PATH:

sudo mkdir -p /usr/local/bin
sudo rm -f /usr/local/bin/julia
sudo ln -s /Applications/Julia-1.7.app/Contents/Resources/julia/bin/julia /usr/local/bin/julia

Note: That the Julia version on the code above should aligned with the one installed on your local machine. More info avilable here.

Setting Julia with VScode

WIP

Rectangle

Rectangle is a free and open-source tool for moving and resizing windows in Mac with keyboard shoortcuts. To install it go to https://rectangleapp.com and download it. Once installed you can modify the default setting:

Keyboard Shortcuts

  • Change language - if you are using more than one language, you can add a keyboard shortcut for switching between them. Go to System Preferences... -> keyboard and select the shortcut tab. Under the Input Sources tick the Select the previous input source option:

image

Note: that you can modify the keyboard shortcut by clicking shortcut definition in that row

Setting Postgres

PostgreSQL supprts most of the common OS such as Windows, macOS, Linux, etc.

To download go to Postgres project website and navigate to the Downlaod tab and select your OS, which will naviage it to the OS download page, and follow the instraction:

On mac I highly recommand to install PostgreSQL through the Postgres.app:

When opening the app, you should have a default server set to port 5432 (make sure that this port is available):

To launch the server click on the start button:

By default, the server will create three databases - postgres, YOUR_USER_NAME, and template1. You can add additional server (or remove) by clicking the + or - symbols on the left botton.

To run Postgres from the terminal you will have to set define the path of the app on your zshrc file (on mac) by adding the following line:

export PATH=$PATH:/Applications/Postgres.app/Contents/Versions/14/bin/

Where /Applications/Postgres.app/Contents/Versions/14/bin/ is the local path on my machine.

Alternativly, you can set the alias from the terminal by running the following"

echo "export PATH=$PATH:/Applications/Postgres.app/Contents/Versions/14/bin/" >> ${ZDOTDIR:-$HOME}/.zshrc

Clear port

If the port you set for the Postgres server is in use you should expect to get the following message when trying to start the server:

This mean that the port is either used by other Postgres server or other application. To check what ports in use and by which applications you can use the lsof function on the terimnal:

sudo lsof -i :5432                                                                                           COMMAND  PID     USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
postgres 124 postgres    7u  IPv6 0xc250a5ea155736fb      0t0  TCP *:postgresql (LISTEN)
postgres 124 postgres    8u  IPv4 0xc250a5ea164aa3b3      0t0  TCP *:postgresql (LISTEN)

Where the i argument enables to search by port number, in the example above by 5432. As can see from the output, the port is used by other Posrgres server. You can clear the port by using the pkill command:

sudo pkill -u postgres

Where the u arugment enbales to define the port you want to clear by the USER field, in this case postgres.

Note: Before you are clearing the port, make sure you do not need the applications on that port.

Resources

Installing Draw.io Desktop

The drawio-desktop is a desktop version of the diagrams app for creating diagrams and workflow charts. The desktop version, per the project repository, is designed to be completely isolated from the Internet, apart from the update process.

Image credit: https://www.diagrams.net/

To install the desktop version go to the project repository and select the version you wish to install under the releases section:

For macOS users, once download the dmp file and open it, move the build to the applications folder:

Resources

More Repositories

1

coronavirus

The coronavirus dataset
HTML
499
star
2

TSstudio

Tools for time series analysis and forecasting
R
421
star
3

deploy-flex-actions

Deploying flexdashboard on Github Pages with Docker and Github Actions
HTML
185
star
4

vscode-python

Setting Python Development Environment with VScode and Docker
145
star
5

atsaf

Applied Time Series Analysis and Forecasting
R
130
star
6

coronavirus_dashboard

The Coronavirus Dashboard
R
105
star
7

shinylive

A guide for deploying Shinylive Python application into Github Pages
HTML
101
star
8

USelectricity

Forecast the US demand for electricity
R
96
star
9

italy_dash

A summary dashboard of the covid19 cases in Italy
Dockerfile
75
star
10

MLstudio

The ML Studio Package
R
70
star
11

covid19Italy

Italy covid19 data
R
46
star
12

coronavirus-csv

CSV format for the coronavirus R package dataset
R
46
star
13

30DayChartChallenge

Code for 30DayChartChallenge
R
34
star
14

UKgrid

The UK National Grid historical demand for electricity
R
28
star
15

30DayMapChallenge

30 Day Map Challenge 2022
R
27
star
16

R-Ladies-Tunis-Docker-Workshop

R-Ladies Tunis Docker for R users workshop
Dockerfile
24
star
17

uselectricity-etl

Example for ETL process with R, Docker, and Github Actions (WIP...).
R
24
star
18

USgrid

The hourly demand and supply of electricity in the US
R
23
star
19

Introduction-to-JavaScript

Introduction to JavaScript - math operations, variables, functions, objects, etc.
TeX
16
star
20

uswildfire

US Wildfire Dashboard
Shell
15
star
21

gis-dataviz-workshop

Materials for R-Ladies Abuja geospatial visualization workshop
HTML
14
star
22

sfo

Monthly air passengers and landings at San Francisco International Airport (SFO)
R
13
star
23

Time-Series-Workshop

Bay Area useR Group Time Series Workshop
HTML
13
star
24

covid19sf

R package for tracking Covid19 cases in San Francisco
R
12
star
25

RamiKrispin

My README profile
9
star
26

Shiny-App

A shiny interface for ML models, data visualization etc.
R
8
star
27

USgas

Tracking US monthly consumption of natural gas
R
7
star
28

ai-dev-2024-ml-workshop

Materials for the AI Dev 2024 "Deploy and Monitor ML Pipelines with Open Source and Free Applications" workshop
Shell
7
star
29

halloween-time-series-workshop

Bay Area useR Group Halloween Time Series Workshop
HTML
7
star
30

shinylive-r

A guide for deploying Shinylive R application into Github Pages
6
star
31

EIAapi

Supporting tools for the Applied Time Series Analysis and Forecasting book
R
6
star
32

flexdashboard_example

An example for deployment of flexdashboard
5
star
33

Julia-tutorials

Julia's learning materials
Julia
2
star
34

forecastML

Time series forecasting with linear regression and machine learning methods
R
2
star
35

learningR

Learning R
2
star
36

visualization_final

2
star
37

rstudio-conf-ggplot2-workshop

Setting Docker environment for the Graphic Design with ggplot2 workshop at RStudio conf 2022
R
2
star
38

covid19-US

Dashboard to track the covid19 pandemic in the US
1
star
39

covid19wiki

Collections of covid19 tables sourced from Wiki pages
R
1
star
40

covid19county

R
1
star
41

math_expressions

Example of using mathematical expressions in a README file
1
star
42

docker

My Docker files
Shell
1
star
43

RamiKrispin.github.io

My blog
HTML
1
star
44

piecewise-regression

An Introduction to Piecewise Regression with Time Series
R
1
star
45

linkedin-dashboard

Example of LinkedIn Profile Engagement Dashboard
R
1
star
46

TStrain

Approaches and methods for training forecasting models
HTML
1
star
47

shiny-express-poc

Running Shiny Express App Inside a Container
JavaScript
1
star
48

rstudio-conf-2020-geospatial

This repo contains the materials from the geospatial training.
1
star
49

ts-cluster-analysis-r

Materials for the the Analyzing Time Series at Scale with Cluster Analysis in R Workshop
R
1
star