• Stars
    star
    220
  • Rank 180,422 (Top 4 %)
  • Language
    Rust
  • License
    Creative Commons ...
  • Created almost 2 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A work in progress to build out solutions in Rust for MLOPs

Rust CI/CD Pipeline

rust-mlops-template

A work in progress to build out solutions in Rust for MLOPs. This repo is more of a cookbook style. For a more gentle step by step guide to MLOps with Rust, please see my lecture notes as a Rust MDBook here.

8-3-modern-rust-development

Update, one of the more compelling projects to look at is https://github.com/huggingface/candle

Take the Coursera Course!

You can learn Rust from this Duke Coursera MLOps course:

Screenshot 2023-05-10 at 12 30 35 PM Take course here. Direct link: https://www.coursera.org/learn/devops-dataops-mlops-duke

Demo Hitlist (Will Solve hopefully almost every day/weekly)

Advanced Aspirational Demos

  • Building a database in Rust
  • Building a search engine in Rust
  • Building a web server in Rust
  • Building a batch processing systems in Rust
  • Build a command-line chat system
  • Build a locate clone
  • Build a load-testing tool

Motivation

One of the key goals of this project is to determine workflows that do not involve the #jcpennys (Jupyter, Conda, Pandas, Numpy, Sklearn) stack for #mlops. In particular I am not a fan of the conda installation tool (it is superfluous as I demonstrate in the Python MLOps Template) vs containerized workflows that use the Python Standard Library (Docker + pip + virtualenv) and this is a good excuse to find other solutions outside of that stack. For example:

  • Why not also find a more performant Data Frame library, faster speed, etc.
  • Why not have a compiler?
  • Why not have a simple packaging solution?
  • Why not have a very fast computational speed?
  • Why not be able to write both for the Linux Kernel and general purpose scripting?
  • Why not see if there is a better solution than Python (which is essentially two languages scientific python and regular Python)?
  • Python is one of the least green languages in terms of energy efficiency, but Rust is one of the best.

In The Beginning Was the Command-Line

What could #mlops and #datascience look like in 2023 without #jupyternotebook and "God Tools" as the center of the universe? It could be the command line. In the beginning, it was the command line, and it may be the best solution for this domain.

"What would the engineer say after you had explained your problem and enumerated all the dissatisfactions in your life? He would probably tell you that life is a very hard and complicated thing; that no interface can change that; that anyone who believes otherwise is a sucker; and that if you don't like having choices made for you, you should start making your own." -Neal Stephensen

Using Data (i.e. Data Science)

Getting Started

This repository is a GitHub Template and you can use it to create a new repository that uses GitHub Codespaces. It is pre-configured with Rust, Cargo and other useful extensions like GitHub Copilot.

Install and Setup

There are a few options:

Once you install you should check to see things work:

rustc --version

Other option is to run make rust-version which checks both the cargo and rust version. To run everything locally do: make all and this will format/lint/test all projects in this repository.

Rust CLI Tools Ecosystem

You can see there several tools which help you get things done in Rust:

rust-version:
	@echo "Rust command-line utility versions:"
	rustc --version 			#rust compiler
	cargo --version 			#rust package manager
	rustfmt --version			#rust code formatter
	rustup --version			#rust toolchain manager
	clippy-driver --version		#rust linter

Hello World Setup

This is an intentionally simple full end-to-end hello world example. I used some excellent ideas from @kyclark, author of the command-line-rust book from O'Reilly here. You can recreate on your own following these steps

Create a project directory

  • cargo new hello

This creates a structure you can see with tree hello

hello/
├── Cargo.toml
└── src
    └── main.rs
1 directory, 2 files

The Cargo.toml file is where the project is configured, i.e. if you needed to add a dependency. The source code file has the following content in main.rs. It looks a lot like Python or any other modern language and this function prints a message.

fn main() {
    println!("Hello, world MLOPs!");
}

To run the project you cd into hello and run cargo run i.e. cd hello && cargo run. The output looks like the following:

@noahgift ➜ /workspaces/rust-mlops-template/hello (main ✗) $ cargo run
   Compiling hello v0.1.0 (/workspaces/rust-mlops-template/hello)
    Finished dev [unoptimized + debuginfo] target(s) in 0.36s
     Running `target/debug/hello`
Hello, world MLOPs!

To run without all of the noise: cargo run --quiet. To run the binary created ./target/debug/hello

Run with GitHub Actions

GitHub Actions uses a Makefile to simplify automation

name: Rust CI/CD Pipeline
on:
  push:
    branches: [ "main" ]
  pull_request:
    branches: [ "main" ]
env:
  CARGO_TERM_COLOR: always
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v1
    - uses: actions-rs/toolchain@v1
      with:
          toolchain: stable
          profile: minimal
          components: clippy, rustfmt
          override: true
    - name: update linux
      run: sudo apt update 
    - name: update Rust
      run: make install
    - name: Check Rust versions
      run: make rust-version
    - name: Format
      run: make format
    - name: Lint
      run: make lint
    - name: Test
      run: make test
    

To run everything locally do: make all.

Simple Marco-Polo Game

Change into MarcoPolo directory and run cargo run -- play --name Marco and you should see the following output:

Polo

First Big Project: Deduplication Command-Line Tool

I have written command-line deduplication tools in many languages so this is what I choose to build a substantial example. The general approach I use is as follows:

  1. Walk the filesystem and create a checksum for each file
  2. If the checksum matches an existing checksum, then mark it as a duplicate file

Getting Started

  • Create new project: crate new dedupe
  • Check latest clap version: https://crates.io/crates/clap and put this version in the Cargo.toml The file should look similar to this.
[package]
name = "dedupe"
version = "0.1.0"
edition = "2021"

[dependencies]
clap = "4.0.32"

[dev-dependencies]
assert_cmd = "2"
  • Next up make a test directory: mkdir tests that is parallel to src and put a cli.rs inside
  • touch a lib.rs file and use this for the logic then run cargo run
  • Inside this project I also created a Makefile to easily do everything at once:
format:
	cargo fmt --quiet

lint:
	cargo clippy --quiet

test:
	cargo test --quiet

run:
	cargo run --quiet

all: format lint test run

Now as I build code, I can simply do: make all and get a high quality build.

Next, let's create some test files:

echo "foo" > /tmp/one.txt
echo "foo" > /tmp/two.txt
echo "bar" > /tmp/three.txt

The final version works: cargo run -- --path /tmp

@noahgift ➜ /workspaces/rust-mlops-template/dedupe (main ✗) $ cargo run -- --path /tmp
    Finished dev [unoptimized + debuginfo] target(s) in 0.03s
     Running `target/debug/dedupe --path /tmp`
Searching path: "/tmp"
Found 5 files
Found 1 duplicates
Duplicate files: ["/tmp/two.txt", "/tmp/one.txt"]

Next things to complete for dedupe (in another repo):

  • Switch to subcommands and create a search and dedupe subcommand
  • Add better testing with sample test files
  • Figure out how to release packages for multiple OS versions in GitHub

More MLOps project ideas

  • Query Hugging Face dataset cli
  • Summarize News CLI
  • Microservice Web Framework, trying actix to start, that has a calculator API
  • Microservice Web Framework deploys pre-trained model
  • Descriptive Statistics on a well known dataset using https://www.pola.rs/[Polars] inside a CLI
  • Train a model with PyTorch (probably via bindings to Rust)

Actix Microservice

[package]
name = "calc"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
actix-web = "4"
  • create a src/lib.rs and place inside
//calculator functions

//Add two numbers
pub fn add(a: i32, b: i32) -> i32 {
    a + b
}

//Subtract two numbers
pub fn subtract(a: i32, b: i32) -> i32 {
    a - b
}

//Multiply two numbers
pub fn multiply(a: i32, b: i32) -> i32 {
    a * b
}

//Divide two numbers
pub fn divide(a: i32, b: i32) -> i32 {
    a / b
}

In the main.rs put the following:

//Calculator Microservice
use actix_web::{get, web, App, HttpResponse, HttpServer, Responder};

#[get("/")]
async fn index() -> impl Responder {
    HttpResponse::Ok().body("This is a calculator microservice")
}

//library add route using lib.rs
#[get("/add/{a}/{b}")]
async fn add(info: web::Path<(i32, i32)>) -> impl Responder {
    let result = calc::add(info.0, info.1);
    HttpResponse::Ok().body(result.to_string())
}

//library subtract route using lib.rs
#[get("/subtract/{a}/{b}")]
async fn subtract(info: web::Path<(i32, i32)>) -> impl Responder {
    let result = calc::subtract(info.0, info.1);
    HttpResponse::Ok().body(result.to_string())
}

//library multiply route using lib.rs
#[get("/multiply/{a}/{b}")]
async fn multiply(info: web::Path<(i32, i32)>) -> impl Responder {
    let result = calc::multiply(info.0, info.1);
    HttpResponse::Ok().body(result.to_string())
}

//library divide route using lib.rs
#[get("/divide/{a}/{b}")]
async fn divide(info: web::Path<(i32, i32)>) -> impl Responder {
    let result = calc::divide(info.0, info.1);
    HttpResponse::Ok().body(result.to_string())
}

//run it
#[actix_web::main]
async fn main() -> std::io::Result<()> {
    HttpServer::new(|| {
        App::new()
            .service(index)
            .service(add)
            .service(subtract)
            .service(multiply)
            .service(divide)
    })
    .bind(("127.0.0.1", 8080))?
    .run()
    .await
}

Next, use a Makefile to ensure a simple workflow

format:
	cargo fmt --quiet

lint:
	cargo clippy --quiet

test:
	cargo test --quiet

run:
	cargo run 

all: format lint test run

Run make all then test out the route by adding two numbers at /add/2/2

actix-microservice

Hugging Face Example

hugging-face-summarize

  • Uses rust-bert crate
  • Create new project cargo new hfdemo and cd into it: cd hfdemo
  • Create a new library file: touch src/lib.rs
  • Add packages to Cargo.toml
[package]
name = "hfdemo"
version = "0.1.0"
edition = "2021"

[dependencies]
rust-bert = "0.19.0"
clap = {version="4.0.32", features=["derive"]}
wikipedia = "0.3.4"

The library code is in lib.rs and the subcommands from clap live in main.rs. Here is the tool in action:

@noahgift ➜ /workspaces/rust-mlops-template/hfdemo (main ✗) $ cargo run sumwiki --page argentina
    Finished dev [unoptimized + debuginfo] target(s) in 4.59s
     Running `target/debug/hfdemo sumwiki --page argentina`
Argentina is a country in the southern half of South America. It covers an area of 2,780,400 km2 (1,073,500 sq mi), making it the second-largest country in South America after Brazil. It is also the fourth-largest nation in the Americas and the eighth-largest in the world.

Hugging Face Q/A Example

cd into hfqa and run cargo run

```bash
cargo run --quiet -- answer --question "What is the best book from 1880 to read?" --context "The Adventures of Huckleberry Finn was released in 1880"
Answer: The Adventures of Huckleberry Finn

Screenshot 2023-01-07 at 8 52 35 AM

Hugging Face Lyrics Analysis using Zero Shot Classification with SQLite

hugging-face

Screenshot 2023-01-08 at 4 26 54 PM

@noahgift ➜ /workspaces/rust-mlops-template/sqlite-hf (main ✗) $ cargo run --quiet -- classify
Classify lyrics.txt
rock: 0.06948944181203842
pop: 0.27735018730163574
hip hop: 0.034089818596839905
country: 0.7835917472839355
latin: 0.6906086802482605

Print the lyrics:

cargo run --quiet -- lyrics | less | head
Lyrics lyrics.txt
Uh-uh-uh-uh, uh-uh
Ella despidió a su amor
El partió en un barco en el muelle de San Blas
El juró que volvería
Y empapada en llanto, ella juró que esperaría
Miles de lunas pasaron
Y siempre ella estaba en el muelle, esperando
Muchas tardes se anidaron
Se anidaron en su pelo y en sus labios

Hugging Face GPU Translation CLI

Full working example here: https://github.com/nogibjj/rust-pytorch-gpu-template/tree/main/translate

building-gpu-translator-hugging-face

Goal: Translate a spanish song to english

  • cargo new translate and cd into it fully working GPU Hugging Face Translation CLI in Rust

run it: time cargo run -- translate --path lyrics.txt

/*A library that uses Hugging Face to Translate Text
*/
use rust_bert::pipelines::translation::{Language, TranslationModelBuilder};
use std::fs::File;
use std::io::Read;

//build a function that reads a file and returns a string
pub fn read_file(path: String) -> anyhow::Result<String> {
    let mut file = File::open(path)?;
    let mut contents = String::new();
    file.read_to_string(&mut contents)?;
    Ok(contents)
}

//build a function that reads a file and returns an array of the lines of the file
pub fn read_file_array(path: String) -> anyhow::Result<Vec<String>> {
    let mut file = File::open(path)?;
    let mut contents = String::new();
    file.read_to_string(&mut contents)?;
    let array = contents.lines().map(|s| s.to_string()).collect();
    Ok(array)
}


//build a function that reads a file and translates it
pub fn translate_file(path: String) -> anyhow::Result<()> {
    let model = TranslationModelBuilder::new()
        .with_source_languages(vec![Language::Spanish])
        .with_target_languages(vec![Language::English])
        .create_model()?;
    let text = read_file_array(path)?;
    //pass in the text to the model
    let output = model.translate(&text, None, Language::English)?;
    for sentence in output {
        println!("{}", sentence);
    }
    Ok(())
}

Polars Example

cargo run -- sort --rows 10

Screenshot 2023-01-14 at 12 21 08 PM

You can see an example of how Polars can be used to sort a dataframe in a Rust cli program.

Parallelism

One of the outstanding features of Rust is safe, yet easy paralielism. This project demos parallelism by benchmarking a checksum of several files.

We can see how trivial it is to speed up a program with threads:

Here is the function for the serial version:

// Create a checksum of each file and store in a HashMap if the checksum already exists, add the file to the vector of files with that checksum
pub fn checksum(files: Vec<String>) -> Result<HashMap<String, Vec<String>>, Box<dyn Error>> {
    let mut checksums = HashMap::new();
    for file in files {
        let checksum = md5::compute(std::fs::read(&file)?);
        let checksum = format!("{:x}", checksum);
        checksums
            .entry(checksum)
            .or_insert_with(Vec::new)
            .push(file);
    }
    Ok(checksums)
}

cargo --quiet run -- serial

➜  parallel git:(main) ✗ time cargo --quiet run -- serial
Serial version of the program
d41d8cd98f00b204e9800998ecf8427e:
        src/data/subdir/not_utils_four-score.m4a
        src/data/not_utils_four-score.m4a
b39d1840d7beacfece35d9b45652eee1:
        src/data/utils_four-score3.m4a
        src/data/utils_four-score2.m4a
        src/data/subdir/utils_four-score3.m4a
        src/data/subdir/utils_four-score2.m4a
        src/data/subdir/utils_four-score5.m4a
        src/data/subdir/utils_four-score4.m4a
        src/data/subdir/utils_four-score.m4a
        src/data/utils_four-score5.m4a
        src/data/utils_four-score4.m4a
        src/data/utils_four-score.m4a
cargo --quiet run -- serial  0.57s user 0.02s system 81% cpu 0.729 total

vs threads

time cargo --quiet run -- parallel
Parallel version of the program
d41d8cd98f00b204e9800998ecf8427e:
        src/data/subdir/not_utils_four-score.m4a
        src/data/not_utils_four-score.m4a
b39d1840d7beacfece35d9b45652eee1:
        src/data/utils_four-score5.m4a
        src/data/subdir/utils_four-score3.m4a
        src/data/utils_four-score3.m4a
        src/data/utils_four-score.m4a
        src/data/subdir/utils_four-score.m4a
        src/data/subdir/utils_four-score2.m4a
        src/data/utils_four-score4.m4a
        src/data/utils_four-score2.m4a
        src/data/subdir/utils_four-score4.m4a
        src/data/subdir/utils_four-score5.m4a
cargo --quiet run -- parallel  0.65s user 0.04s system 262% cpu 0.262 total

Ok, so let's look at the code:

// Parallel version of checksum using rayon with a mutex to ensure
//that the HashMap is not accessed by multiple threads at the same time
pub fn checksum_par(files: Vec<String>) -> Result<HashMap<String, Vec<String>>, Box<dyn Error>> {
    let checksums = std::sync::Mutex::new(HashMap::new());
    files.par_iter().for_each(|file| {
        let checksum = md5::compute(std::fs::read(file).unwrap());
        let checksum = format!("{:x}", checksum);
        checksums
            .lock()
            .unwrap()
            .entry(checksum)
            .or_insert_with(Vec::new)
            .push(file.to_string());
    });
    Ok(checksums.into_inner().unwrap())
}

The main takeaway is that we use a mutex to ensure that the HashMap is not accessed by multiple threads at the same time. This is a very common pattern in Rust.

Logging in Rust Example

cd into clilog and type: cargo run -- --level TRACE

Screenshot 2023-01-02 at 8 58 38 AM

//function returns a random fruit and logs it to the console
pub fn random_fruit() -> String {
    //randomly select a fruit
    let fruit = FRUITS[rand::thread_rng().gen_range(0..5)];
    //log the fruit
    log::info!("fruit-info: {}", fruit);
    log::trace!("fruit-trace: {}", fruit);
    log::warn!("fruit-warn: {}", fruit);
    fruit.to_string()
}

AWS

Rust AWS S3 Bucket Metadata Information

Running an optimized version was able to sum all the objects in my AWS Account about 1 second: ./target/release/awsmetas3 account-size

bucket summarizer

Rust AWS Lambda

cd into rust-aws-lambda

Screenshot 2023-01-22 at 6 14 48 PM

To deploy: make deploy which runs: cargo lambda build --release

  • Test inside of AWS Lambda console
  • Test locally with:
cargo lambda invoke --remote \
  --data-ascii '{"command": "hi"}' \
  --output-format json \
  rust-aws-lambda

Result:

cargo lambda invoke --remote \
                --data-ascii '{"command": "hi"}' \
                --output-format json \
                rust-aws-lambda
{
  "msg": "Command hi executed.",
  "req_id": "1f70aff9-dc65-47be-977b-4b81bf83e7a7"
}

Client-Server Example

Example lives here: https://github.com/noahgift/rust-mlops-template/tree/main/rrgame

Current Status

  • Client server echo working

cargo run -- client --message "hi" cargo run -- server

Screenshot 2022-12-27 at 7 57 24 PM

Screenshot 2022-12-27 at 7 57 24 PM

A bigger example lives here: https://github.com/noahgift/rust-multiplayer-roulette-game

Containerized Rust Applications

FROM rust:latest as builder
ENV APP containerized_marco_polo_cli
WORKDIR /usr/src/$APP
COPY . .
RUN cargo install --path .
 
FROM debian:buster-slim
RUN apt-get update && rm -rf /var/lib/apt/lists/*
COPY --from=builder /usr/local/cargo/bin/$APP /usr/local/bin/$APP
ENTRYPOINT [ "/usr/local/bin/containerized_marco_polo_cli" ]

Containerized PyTorch Rust

cd into: pytorch-rust-docker

Here is the Dockerfile

FROM rust:latest as builder
ENV APP pytorch-rust-docker
WORKDIR /usr/src/$APP
COPY . .
RUN apt-get update && rm -rf /var/lib/apt/lists/*
RUN cargo install --path .
RUN cargo build -j 6
  • docker build -t pytorch-rust-docker .
  • docker run -it pytorch-rust-docker
  • Next inside the container run: cargo run -- resnet18.ot Walking_tiger_female.jpg

Screenshot 2023-01-05 at 10 12 29 AM

Tensorflow Rust Bindings

Screenshot 2023-01-02 at 5 59 48 PM

/*Rust Tensorflow Hello World */

extern crate tensorflow;
use tensorflow::Tensor;

fn main() {
    let mut x = Tensor::new(&[1]);
    x[0] = 2i32;
    //print the value of x
    println!("{:?}", x[0]);
    //print the shape of x
    println!("{:?}", x.shape());
    //create a multidimensional tensor
    let mut y = Tensor::new(&[2, 2]);
    y[0] = 1i32;
    y[1] = 2i32;
    y[2] = 3i32;
    y[3] = 4i32;
    //print the value of y
    println!("{:?}", y[0]);
    //print the shape of y
    println!("{:?}", y.shape());
}

PyTorch

Screenshot 2023-01-03 at 9 45 52 AM

Pre-trained model: cd into pytorch-rust-example then run: cargo run -- resnet18.ot Walking_tiger_female.jpg

PyTorch Binary with embedded pre-trained model

Using included model in binary. See this issue about including PyTorch with binary

Status: Works, but binary cannot pickup PyTorch, so still investigating solutions.

@noahgift ➜ /workspaces/rust-mlops-template/pytorch-binary-cli (main ✗) $ cargo run -- predict --image Walking_tiger_female.jpg 
    Finished dev [unoptimized + debuginfo] target(s) in 0.09s
     Running `target/debug/pytorch-binary-cli predict --image Walking_tiger_female.jpg`
Current working directory: /workspaces/rust-mlops-template/pytorch-binary-cli
Model path: ../model/resnet18.ot
Model size: 46831783
tiger, Panthera tigris                             90.42%
tiger cat                                           9.19%
zebra                                               0.21%
jaguar, panther, Panthera onca, Felis onca          0.07%
tabby, tabby cat                                    0.03%

Screenshot 2023-01-22 at 4 25 48 PM

Web Assembly in Rust

Cd into hello-wasm-bindgen and run make install the make serve

You should see something like this:

Screenshot 2023-01-09 at 11 42 29 AM

/* hello world Rust webassembly*/
use wasm_bindgen::prelude::*;

#[wasm_bindgen]
extern "C" {
    fn alert(s: &str);
}

//export the function to javascript
#[wasm_bindgen]
pub fn marco_polo(s: &str) {
    //if the string is "Marco" return "Polo"
    if s == "Marco" {
        alert("Polo");
    }
    //if the string is anything else return "Not Marco"
    else {
        alert("Not Marco");
    }
}

Kmeans Example

cd into linfa-kmeans and run cargo run -- cluster

Lasso Regression CLI

Screenshot 2023-01-15 at 9 53 25 AM

@noahgift ➜ /workspaces/rust-mlops-template/regression-cli (main ✗) $ cargo run -- train --ratio .9
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/regression-cli train --ratio .9`
Training ratio: 0.9
intercept:  152.1586901763224
params: [0, -0, 503.58067499818077, 167.75801599203626, -0, -0, -121.6828192430516, 0, 427.9593531331433, 6.412796328606638]
z score: Ok([0.0, -0.0, 6.5939908998261245, 2.2719123245079786, -0.0, -0.0, -0.5183690897253823, 0.0, 2.2777581181031765, 0.0858408096568952], shape=[10], strides=[1], layout=CFcf (0xf), const ndim=1)
predicted variance: -0.014761955865436382

Transcription with Whisper in Rust

Screenshot 2023-01-15 at 4 23 02 PM

Rust PyTorch Saturating GPU

Rust PyTorch MNIST Saturating GPU

Rayon Multi-threaded GPU Stress Test CLI

Stress Test CLI for both CPU and GPU PyTorch using Clap

  • cargo new stress cd into stress
  • To test CPU for PyTorch do: cargo run -- cpu
  • To test GPU for PyTorch do: cargo run -- gpu
  • To monitor CPU/Memory run htop
  • To monitor GPU run nvidia-smi -l 1
  • To use threaded GPU load test use: cargo run -- tgpu

stress-test-cuda-gpu-with-pytorch-rust

Full working example here: https://github.com/nogibjj/rust-pytorch-gpu-template/tree/main/stress

Rust Stable Diffusion Demo

You can create it this repo for more info: https://github.com/nogibjj/rust-pytorch-gpu-template#stable-diffusion-demo

After all the weights are downloaded run:

cargo run --example stable-diffusion --features clap -- --prompt "A very rusty robot holding a fire torch to notebooks" Screenshot 2023-01-16 at 5 57 59 PM

Stable Diffusion 2.1 Pegging GPU Screenshot 2023-01-17 at 9 30 47 AM

Rusty Robot Torching Notebooks sd_final

Randomly Select Rust Crates To Work On

cd into rust-ideas

cargo run -- --help cargo run -- popular --number 4 cargo run -- random

@noahgift ➜ /workspaces/rust-mlops-template/rust-ideas (main ✗) $ cargo run -- random
    Finished dev [unoptimized + debuginfo] target(s) in 0.09s
     Running `target/debug/rust-ideas random`
Random crate: "libc"

ONNX Example

cd into OnnxDemo and run make install then cargo run -- infer which invokes a squeezenet model.

Screenshot 2023-01-22 at 9 33 33 AM

Sonos ONNX

Verified this works and is able to invoke runtime in a portable binary: https://github.com/sonos/tract/tree/main/examples/pytorch-resnet

OpenAI

Switching to Rust API Example

Full working example link: https://github.com/nogibjj/assimilate-openai/tree/main/openai

  • install Rust via Rustup: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh Use Rust API for OpenAI (3rd party): https://github.com/deontologician/openai-api-rust

  • Create new project: cargo new openai and cd into it

  • make format then make lint then cargo run

Working Example:

(.venv) @noahgift ➜ /workspaces/assimilate-openai/openai (main) $ cargo run -- complete -t "The rain in spain"
    Finished dev [unoptimized + debuginfo] target(s) in 0.14s
     Running `target/debug/openai complete -t 'The rain in spain'`
Completing: The rain in spain
Loves gets you nowhere
The rain in spain

lib.rs

/*This uses Open AI to Complete Sentences */

//accets the prompt and returns the completion
pub async fn complete_prompt(prompt: &str) -> Result<String, Box<dyn std::error::Error>> {
    let api_token = std::env::var("OPENAI_API_KEY")?;
    let client = openai_api::Client::new(&api_token);
    let prompt = String::from(prompt);
    let result = client.complete_prompt(prompt.as_str()).await?;
    Ok(result.choices[0].text.clone())
}

Command-line Data Science with Rust (Action Items)

  1. go into dscli
  2. Figure the way to make Polars work with linfa
  3. How can I make a kmeans cluster using Polars

Containerized Actix Continuous Delivery to AWS App Runner

Screenshot 2023-01-31 at 1 47 32 PM

  1. cd into webdocker
  2. build and run container (can do via Makefile) or

docker build -t fruit . docker run -it --rm -p 8080:8080 fruit

  1. push to ECR
  2. Tell AWS App Runner to autodeploy

Mixing Python and Rust

Using Rust Module from Python

  • Pyo3 Try the getting started guide:
# (replace string_sum with the desired package name)
$ mkdir string_sum
$ cd string_sum
$ python -m venv .env
$ source .env/bin/activate
$ pip install maturin
  • Run maturin init and then run maturin develop or make develop
  • python
  • Run the following python code
import string_sum
string_sum.sum_as_string(5, 20)

The output should look like this: '25'

Using Python from Rust

Follow guide here: https://pyo3.rs/v0.18.0/

  • install sudo apt-get install python3-dev
  • cargo new pyrust and cd pyrust
  • tweak Cargo.toml and add pyo3
  • add source code to main.rs
  • make run
Hello vscode, I'm Python 3.9.2 (default, Feb 28 2021, 17:03:44) 
[GCC 10.2.1 20210110]

Q: Does the target binary have Python included? A: Maybe. It does appear to be able to run Python if you go to the target /workspaces/rust-mlops-template/pyrust/target/debug/pyrust

Follow up question, can I bring this binary to a "blank" codespace with no Python and what happens!

Day2: Using Rust with Python

Goal: Build a high-performance Rust module and then wrap in a Python command-line tool

Containerized Rust Examples

  • cargo new tyrscontainer and cd into tyrscontainer
  • copy a Makefile and Dockerfile from webdocker

Note that the rust build system container which is ~1GB is NOT in the final container image which is only 98MB.

FROM rust:latest as builder
ENV APP tyrscontainer
WORKDIR /usr/src/$APP
COPY . .
RUN cargo install --path .
 
FROM debian:buster-slim
RUN apt-get update && rm -rf /var/lib/apt/lists/*
COPY --from=builder /usr/local/cargo/bin/$APP /usr/local/bin/$APP
#export this actix web service to port 8080 and 0.0.0.0
EXPOSE 8080
CMD ["tyrscontainer"]

The final container is very small i.e. 94MB

strings               latest    974d998c9c63   9 seconds ago   94.8MB

The end result is that you can easily test this web service and push to a cloud vendor like AWS and AWS App Runner.

Open AI Raw HTTP Request Example

Code here: https://github.com/nogibjj/assimilate-openai/tree/main/rust-curl-openai

(.venv) @noahgift ➜ /workspaces/assimilate-openai/rust-curl-openai (main) $ cargo run
   Compiling reqwest v0.11.14
   Compiling rust-curl-openai v0.1.0 (/workspaces/assimilate-openai/rust-curl-openai)
    Finished dev [unoptimized + debuginfo] target(s) in 4.78s
     Running `target/debug/rust-curl-openai`
{"id":"cmpl-6rDd8mzOtMx7kKobqV0isiC7TkqU4","object":"text_completion","created":1678141798,"model":"text-davinci-003","choices":[{"text":"\n\nJupiter is the fifth planet from the Sun and the biggest one in our Solar System. It is very bright and can be seen in the night sky. It is named after the Roman god Jupiter. It is usually the third brightest thing you can see in the night sky after the Moon and Venus.","index":0,"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":151,"completion_tokens":62,"total_tokens":213}}

GCP Cloud Run

Jupyter Notebook and Rust

First we need to compile: cargo install evcxr_jupyter Next, lets do this: evcxr_jupyter --install tldr; it does work! but you must do the following: jupyter notebook --generate-config and then edit cross origin.

to run plotting tutorial do the following:

git clone https://github.com/38/plotters-doc-data

ONNX Series

Working PyTorch + Actix (looking into Distroless as well)

Screenshot 2023-04-03 at 2 09 38 PM

References

Build System

This build system is a bit unique because it recursives many Rust repos and tests them all!

Language References and Tutorials

End to End Examples

MLOps/ML Engineering and Data Science

Rust MLOps Platforms

Cloud Computing

AWS

Azure

Linux Kernel

Systems Tools

MLOps Inference in Pure Rust

Deep Learning

Search Engines

Web Microservices and Serverless

Data Frames

Authoring Tools

One goal is to reduce using Notebooks in favor of lightweight markdown tools (i.e. the goal is MLOps vs interactive notebooks)

Computer Vision

Linux Tools

Python and Rust integration

GUI

NLP

Onnx

Static Web

Pure Rust Machine Learning

Benchmarking

Delta Lake

Testing Tools

Containerized Rust

Embedded Rust

ZSH

Time Series Rust

Linux and GCC

C++ vs Rust

OpenAI

Popularity

Copilots effect on Programming

Rewrite Python to Rust

Training LLMs from Scratch

More Repositories

1

mlops-template

mlops template
Python
168
star
2

rust-data-engineering

Code for a Duke Coursera Rust-based data engineering course
Rust
45
star
3

rusty-deploy

MLOps Deploy Solutions with Rust
Makefile
35
star
4

rust-tutorial

tutorial for Rust for Enterprise MLOps book by O'Reilly
Python
26
star
5

rust-candle-demos

Demos using Rust Candle
Dockerfile
21
star
6

rust-pytorch-gpu-template

Rust PyTorch GPU configuration
Rust
21
star
7

assimilate-aws

A deep dive into programmatically mastering AWS
Python
14
star
8

devops-skills-with-GitHub

All of the tools for building devops workflows
Python
12
star
9

pytorch-fastapi-aws-apprunner

Example application of using Pytorch with fastapi and AWS App Runner
Python
12
star
10

rust-with-python

Using Rust with Python
Rust
11
star
11

coursera-applied-de-kubernetes-lab

Labs for Coursera Applied Kubernetes
Python
9
star
12

assimilate-databricks

A repo to assimilate databricks
Jupyter Notebook
9
star
13

Liten

Liten, an open source, fast, and accurate duplicate file finder for Macs
Objective-C
9
star
14

aws-lambda-rust

A repo of demos with AWS Lambda Rust
Rust
9
star
15

hello-rust

this is a rust project
Rust
8
star
16

52-weeks-rust

Trying out Rust
Makefile
8
star
17

hugging-face-cli-with-codespaces

Repo that allows me to build AI tools on top of Hugging Face
Python
7
star
18

heuristics

Copilot assisted algorithms and heuristics
Python
6
star
19

ludwig-getting-started

Declarative ML examples with Ludwig
Makefile
6
star
20

wiki-generative-summarization

Python
6
star
21

hugging-face

Makefile
5
star
22

gcp-cloud-cert-resources

Go
5
star
23

functions-from-zero

A repo to learn functions
Python
5
star
24

BallersDash

The NBA Statistics Dashboard is an innovative and user-focused project that harnesses daily data scraping to create a dynamic platform for sports bettors, NBA enthusiasts, and fantasy league participants.
Python
5
star
25

mastering-functions-2022

This is a repo for mastering Python functions
Jupyter Notebook
4
star
26

Web3-Rust-Chat-App_Scott-Zhanyi

Rust
4
star
27

Shopee-Product-Price-Match-Guarantee

Shopee - Price Match Guarantee: Match products with descriptions and images
Jupyter Notebook
4
star
28

Coursera-MLOPs-Foundations-Lab-1-CICD

Labs for Duke Coursera Course on MLOps Foundations
Dockerfile
4
star
29

sensible

A sensible Python logging configuration
Python
4
star
30

mlops-presentation-11-2022

This is a repo for demonstrating mlops best practices
Python
4
star
31

github-actions-rust-example

Build and test a Rust Project
Rust
3
star
32

3rdeye

Mystical stats for git
R
3
star
33

python-template

Python Template for GitHub Codespaces
Dockerfile
3
star
34

assimilate-pytorch

This is a repo for assimilating pytorch
Python
3
star
35

coursera-applied-data-eng-projects

Project for the Duke Coursera Applied Data Engineering Specialization
Python
3
star
36

pyli

Deduper
Python
3
star
37

hugging-face-tutorials

tutorials on Hugging Face
Jupyter Notebook
3
star
38

github-certifications

Notes on GitHub Certifications
3
star
39

master-python-functions-dec-2022

master python functions 2022
Python
3
star
40

go-template

This is a template for building go language applications
Go
3
star
41

Beibei_Du_IDS721_Projet1

This project aims to do a cloud continuous delivery of Microservices using Rust Language
Jupyter Notebook
3
star
42

copilot-codespace-demo

trying out features of copilot in CodeSpaces
3
star
43

Coursera-MLOPs-Foundations-Lab-2-poker-simulator

Poker Simulator
Python
3
star
44

mlrun-tutorials

mlrun tutorials using code spaces
Jupyter Notebook
3
star
45

Shunian-Chen-CI-CD

Python
2
star
46

IDS706_w11_DatabricksETL_Individual_hzx

Individual Project #3: Databricks ETL (Extract Transform Load) Pipeline
Python
2
star
47

IDS706_Project1_Beibei

This is the repository for IDS706 FA22 Project1
Jupyter Notebook
2
star
48

gcp-ml-cert

Code Examples for the Google ML Certification
Jupyter Notebook
2
star
49

csharp-template

Day 1 of live coding with C# and .NET
C#
2
star
50

containers-and-kubernetes-for-data-ml

A Repository for doing kubernetes work for data and ML
2
star
51

julia-getting-started

playing around with Julia
Julia
2
star
52

Jiaxin-P2-Microservice-Rust

Rust
2
star
53

MBTI-Personality-Test

Jupyter Notebook
2
star
54

IDS706-Zilin

Dockerfile
2
star
55

IDS705_ML_Team9

Jupyter Notebook
2
star
56

IDS706-Final_Project-Group2

Beibei Du
Jupyter Notebook
2
star
57

song4

Data Engineering (Individual Project #4)
Python
2
star
58

assimilate-zig

Assimilate the Zig Language
Makefile
2
star
59

music-reco-rust-cli-with-spotify-api

Rust
2
star
60

rust-sagemaker-mlops

Rust for AWS Sagemaker MLOps
Makefile
2
star
61

nm132-chat-application

Chat application in Rust
Rust
2
star
62

Project-2---NC-Stock-Market-Analysis

Analysis of Stock Market data for project 2.
Jupyter Notebook
2
star
63

SQL_DY

Jupyter Notebook
2
star
64

project2_VioletPang

Python
2
star
65

NLP-CP2077-Sentiment-Analysis

TBA
Jupyter Notebook
2
star
66

data-science-projects

This is a data science projects repo
Jupyter Notebook
2
star
67

project-3-SQL-yayun

This is a repo for Data engineering project 3 related to SQL
Python
2
star
68

p2_Fangting

Shell
2
star
69

assimilate-hugging-face

Assimilate Hugging Face Repository for series
Python
2
star
70

IDS706_Final_Project_klap

Jupyter Notebook
2
star
71

kh495-cli

KH495 IDS721 Spring 2023 Project 1 - Rust CLI Tool
Rust
2
star
72

compile-python

This is a repo for compiling and installing python from scratch
Makefile
2
star
73

DY_Expected_Goal_Referee_project

Jupyter Notebook
2
star
74

Beibei_Du_IDS706_Project2

Jupyter Notebook
2
star
75

IDS706_FinalProject

Jupyter Notebook
2
star
76

IDS706_Fall2023_Final_Team_Project

Jupyter Notebook
2
star
77

Steam_Review__Analyzer

The following is the repository for the Steam Review Analyzer. This is a Docker contained Microservice, that takes user specification, and displays a Dashboard filled with game analysis based off their Steam Reviews.
Jupyter Notebook
2
star
78

nba-cli-tool

This project is a demonstration of a robust Command Line Interface (CLI) tool designed for effortless data handling (ETL) and statistical analysis in NBA sports betting.
Python
2
star
79

assimilate-pygame

A series on Pygame
Python
2
star
80

GenAIHackathon-29

Python
1
star
81

python-ruff-template

Rust-based Ruff Linter
Dockerfile
1
star
82

yifan_proj2

tmp
Python
1
star
83

Final_Group_Project

This is a repository for IDS 706 Final Team Project. Produced by Kelly Tong, Cassie Kang, Katherine Tian.
Python
1
star
84

oreilly-book-enterprise-mlops

A central location to share recipes around MLOps for the O'Reilly book implementing Enterprise MLOps
1
star
85

SA2C-Recommender-System-NC-2

Working on SA2C Recommender System Testing
Python
1
star
86

Final_LG_JL_KM

This is our final project for DE
Python
1
star
87

aws-scripts

Various AWS automation scripts
1
star
88

IDS706_Fall2023_Project_3_Databricks_ETL_Pipeline

Jupyter Notebook
1
star
89

IDS706-Databricks-Mlflow-XS110

Jupyter Notebook
1
star
90

Data_eng_project1-Pragya

Write a Big Data Script that uses the Pandas API for Spark or Dask
Python
1
star
91

IDS706_Cirun_Test

Python
1
star
92

CodeSpaces_Example

Python
1
star
93

DevOps-2023

This is a GitHub Codespaces DevOps tutorial over several hours
Python
1
star
94

Project-2-Elisa-Chen

This is the repo that contains the source code for a CLI tool for data partitioning.
Shell
1
star
95

gitmath

Performs Code Churn Analysis on Git using R
R
1
star
96

song2

Individual Project 2 (Data Engineering Systems)
Shell
1
star
97

nlp_cloud_example

Using cloud resources to run NLP Tasks
Jupyter Notebook
1
star
98

DukeOpenAIHack

Duke OpenAI Hackathon
Jupyter Notebook
1
star
99

CD_FastAPI_AWS_DY

Project 4: Ancient Chinese Poem Line Generator
Python
1
star
100

assimilate-github

Assimilate GitHub
Rust
1
star