• Stars
    star
    8,665
  • Rank 4,223 (Top 0.09 %)
  • Language
    Python
  • License
    MIT License
  • Created about 10 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

code for Data Science From Scratch book

Data Science from Scratch

Here's all the code and examples from the second edition of my book Data Science from Scratch. They require at least Python 3.6.

(If you're looking for the code and examples from the first edition, that's in the first-edition folder.)

If you want to use the code, you should be able to clone the repo and just do things like

In [1]: from scratch.linear_algebra import dot

In [2]: dot([1, 2, 3], [4, 5, 6])
Out[2]: 32

and so on and so forth.

Two notes:

  1. In order to use the library like this, you need to be in the root directory (that is, the directory that contains the scratch folder). If you are in the scratch directory itself, the imports won't work.

  2. It's possible that it will just work. It's also possible that you may need to add the root directory to your PYTHONPATH, if you are on Linux or OSX this is as simple as

export PYTHONPATH=/path/to/where/you/cloned/this/repo

(substituting in the real path, of course).

If you are on Windows, it's potentially more complicated.

Table of Contents

  1. Introduction
  2. A Crash Course in Python
  3. Visualizing Data
  4. Linear Algebra
  5. Statistics
  6. Probability
  7. Hypothesis and Inference
  8. Gradient Descent
  9. Getting Data
  10. Working With Data
  11. Machine Learning
  12. k-Nearest Neighbors
  13. Naive Bayes
  14. Simple Linear Regression
  15. Multiple Regression
  16. Logistic Regression
  17. Decision Trees
  18. Neural Networks
  19. [Deep Learning]
  20. Clustering
  21. Natural Language Processing
  22. Network Analysis
  23. Recommender Systems
  24. Databases and SQL
  25. MapReduce
  26. Data Ethics
  27. Go Forth And Do Data Science

More Repositories

1

fizz-buzz-tensorflow

fizz buzz in tensorflow
Jupyter Notebook
865
star
2

hackernews

There are way too many stories on Hacker News, and there's no option for "show me only the stories that Joel would like". So I built one. (Maybe "cobbled together" is more appropriate.)
Ruby
382
star
3

joelnet

live coding deep learning library
Python
323
star
4

autograd

coding an autograd from scratch
167
star
5

stupid-itertools-tricks-pydata

code for my "stupid itertools tricks" talk from pydata seattle 2015
Python
147
star
6

learning-my-kid-to-code

trying to get my kid excited about code by writing small programs together
Python
51
star
7

twitter-globe

tweets on a globe
HTML
48
star
8

fizzbuzz

code for the book "Ten Essays on Fizz Buzz"
Python
43
star
9

shirts

T-Shirts, Feminism, Parenting, and Data Science
Python
37
star
10

kaggle-toxic-allennlp

AllenNLP model for the Kaggle toxic comments challenge
Python
32
star
11

advent2020

advent of code 2020
Python
29
star
12

advent2019

advent of code 2019
Python
23
star
13

streamlit-allennlp

allennlp + streamlit demo
Python
21
star
14

science-questions

end-to-end data product for generating random science quizzes
PureScript
19
star
15

advent2018

solutions for advent of code 2018
Python
17
star
16

streamlit-games

streamlit games
Python
15
star
17

polyglot-twitter-bot

code for writing twitter bots in several languages
PureScript
14
star
18

fun-with-trump-tweets

code for Seattle Twitter-Dev Meetup, October 2016
HTML
13
star
19

advent2021

advent of code 2021
Python
13
star
20

unredact-mueller-report-using-BERT

unredact mueller report using BERT
12
star
21

chain-py

Fluent sequence operations in Python
Python
12
star
22

odscnet

repo for my ODSC West 2017 Talk: "Livecoding Madness: Let's Build a Deep Learning Library"
Python
12
star
23

spot-it

generate spot-it cards
Haskell
10
star
24

doing-data-science-in-the-time-of-chatgpt

talk for meetup
Python
10
star
25

oscon-2018

livecoding talk for oscon 2018
Python
10
star
26

lm-explorer

interactive explorer for language models
Python
9
star
27

kexp

scraping where the music matters
Jupyter Notebook
9
star
28

advent2017

advent of code 2017
Python
8
star
29

fire

code for my Ignite Strata talk on Secrets of Fire Truck Society
Python
8
star
30

puppynet

repo for "Livecoding Madness" deep learning library for PuPPy meetup
Python
7
star
31

data

datasets for data science from scratch
HTML
6
star
32

flask-plus-mithril

just a toy example of a mithril front end paired with a flask back end
HTML
5
star
33

rag-from-household-objects

create a RAG from ordinary household objects
Python
4
star
34

pux-it

a "Spot It" clone, sort of, built using purescript-pux
PureScript
4
star
35

advent2023

advent of code 2023
Python
4
star
36

dumb-language-model

just a dumb language model in a streamlit app
Python
4
star
37

advent2022

Python
3
star
38

clickventure

a react-based implementation of a ClickHole-style ClickVenture
JavaScript
3
star
39

project-euler-haskell

project euler in haskell
Haskell
3
star
40

llm-simplicity

talk for sdsc 2023
Python
3
star
41

dsfs-function-index-cycle-js

reimplementation of Data Science from Scratch function index in cycle.js
JavaScript
3
star
42

proof-of-concept-delight

code for my talk at the 2020 NLP summit
Python
3
star
43

did-you-know

Add a random "did you know" to a website
JavaScript
2
star
44

posterization-k-means

Posterization using k means clustering
Python
2
star
45

constructive-mathematics-clojure

like "Constructive Mathematics in F#", except in Clojure
Clojure
2
star
46

codefellows-data-science-week

HTML
2
star
47

this-number-does-not-exist

using ML to generate numbers that don't actually exist
CSS
2
star
48

numbers-game

game for my toddler to help her learn numbers and letters and colors
JavaScript
2
star
49

posterization-pyladies

image posterization using k-means clustering
Python
2
star
50

collaborative-regression

collaborative regression
Python
1
star
51

myrepo

test repo for startup engineering class
1
star
52

presidents

Python
1
star
53

dsfs-function-index

a single-page app that exposes a filterable index of the functions in Data Science from Scratch
PureScript
1
star
54

typeshift

code for my pyconby 2021 talk
Python
1
star
55

todo-cycle-js

a less-than-full-featured, less-than-beautiful todo app in cycle.js
JavaScript
1
star
56

drunken-avenger

the pelican setup for my blog, as well as all the content
JavaScript
1
star
57

rts_and_endorsements

1
star
58

gpt-experiments

gpt experiments
JavaScript
1
star
59

stuff

Python
1
star
60

constructive-mathematics-fsharp

constructive mathematics in F#
F#
1
star
61

thinking-spreadsheet

files for Thinking Spreadsheet
1
star
62

talk-proposals

talk proposals
1
star
63

hello-github-actions

1
star
64

solvington

solvington website
HTML
1
star