• Stars
    star
    106
  • Rank 325,871 (Top 7 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 7 years ago
  • Updated almost 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The overall objective of this toolkit is to provide and offer a free collection of data analysis and machine learning that is specifically suited for doing data science. Its purpose is to get you started in a matter of minutes. You can run this collections either in Jupyter notebook or python alone.

Complete-Data-Science-Toolkits

The overall objective of this toolkit is to provide and offer a free collection of data analysis and machine learning that is specifically suited for doing data science. Its purpose is to get you started in a matter of minutes. You can run this collections either in Jupyter notebook or python alone.

Features

Machine Learning

  • Cross-Validation
  • Evaluating Classification Metrics
  • Evaluating Clustering Metrics
  • Evaluating Regression Metrics
  • Grid Search
  • Preprocessing Encoding Categorical Features
  • Preprocessing Binarization
  • Preprocessing Imputing Missing Values
  • Preprocessing Normalization
  • Preprocessing StandardScaler
  • Randomized Parameter Optimization

Numpy

  • Adding, Removing, and Splitting Arrays
  • Sorting arrays
  • Matrix object
  • Statistics Vector Math
  • Structured Arrays
  • Import, Export, Slicing, Indexing
  • Data to from string

Pandas

  • Complete pandas
  • Groupby in Pandas
  • Mapping
  • Filtering
  • Applying

Visualization

  • BarPlots
  • Customization Matplotlib
  • Working with Image
  • Working with text

Naming Conventions

  • The naming convections I followed is:
  • [yyyy-mm-dd-in-project-name-library].extention
  • yyyy = stands for year
  • mm = stands for month
  • dd = stands for day
  • in = my initial, for example: Saleban Olow = so
  • library = numpy, pandas, sklearn, matplotlib
  • project-name = each project name
  • extention = .ipynb, .py, .html
  • Example: 2017-25-11-so-cross-validation-sklearn.ipynb

Code Samples:

Cross Validation

from sklearn.model_selection import cross_val_score
model = SVC(kernel='linear', C=1)
# let's try it using cv
scores = cross_val_score(model, X, y, cv=5)

Grid Search

from sklearn.grid_search import GridSearchCV
params = {"n_neighbors": np.arange(1,5), "metric": ["euclidean", "cityblock"]}
grid = GridSearchCV(estimator=knn, param_grid=params)
grid.fit(X_train, y_train)
print(grid.best_score)
print(grid.best_estimator_.n_neighbors)

Preprocessing Imputing Missing Values

from sklearn.preprocessing import Imputer
impute = Imputer(missing_values = 0, strategy='mean', axis=0)
impute.fit_transform(X_train)

Randomized Parameter Optimization

from sklearn.grid_search import RandomizedSearchCV
params = {"n_neighbors" : range(1,5), "weights": ["uniform", "distance"]}
rsearch = RandomizedSearchCV(estimator=knn, param_distributions=params, cv=4, n_iter=8, random_state=5)
rsearch.fit(X_train, y_train)
print(rsearch.best_score_)

Model fitting supervised and unsupervised learning

#supervised learning
from sklearn import neighbors
knn = neighbors.KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
#unsupervised learning
from sklearn.decomposition import PCA
pca = PCA(n_components=0.95)
pca_model = pca.fit_transform(X_train)

Working with numpy arrays

import numpy as np 
#appends values to end of arr
np.append(arr, values)
#inserts values into arr before index 2
np.insert(arr, 2, values)

Indexing and Slicing arrays

import numpy as np 
#return the element at index 5
arr = np.array([[1,2,3,4,5,6,7]])
arr[5]
#returns the 2D array element on index 
arr[2,5]
#assign array element on index 1 the value 4
arr[1] = 4
#assign array element on index [1][3] the value 10
arr[1,3] = 10

Creating DataFrame

import pandas as pd 
#specify values for each rows and columns
df = pd.DataFrame(
	[[4,7,10],
	 [5,8,11],
	 [6,9,12]],
	 index=[1,2,3],
	 columns=['a','b','c'])

groupby pandas

import pandas as pd 
import pandas as pd 
#return a groupby object, grouped by values in column named 'cities'
df.groupby(by="Cities")

handling missing values

import pandas as pd 
#drop rows with any column having NA/null data.
df.dropna()
#replace all NA/null data with value
df.fillna(value)

Melt function

import pandas as pd 
#most pandas methods return a DataFrame so that
#this improves readability of code
df = (pd.melt(df)
	  .rename(columns={'old_name':'new_name', 'old_name':'new_name'})
	  .query('new_name >= 200')
)

Save plot

mport matplotlib.pyplot as plt 
#saves plot/figure to image
plt.savefig('pic_name.png')

Marker, lines

import matplotlib.pyplot as plt 
#add * for every data point
plt.plot(x,y, marker='*')
#adds dot for every data point
plt.plot(x,y, marker='.')

Figures, Axis

import matplotlib.pyplot as plt 
#a container that contains all plot elements
fig = plt.figures()
#Initializes subplot
fig.add_axes()
#A subplot is an axes on a grid system, rows-cols num
a = fig.add_subplot(222)
#adds subplot
fig, b = plt.subplots(nrows=3, ncols=2)
#creates subplot
ax = plt.subplots(2,2)

Working with text plot

import matplotlib.pyplot as plt 
#places text at coordinates 1/1
plt.text(1,1, 'Example text', style='italic')
#annotate the point with coordinates xy with text 
ax.annotate('some annotation', xy=(10,10))
#just put math formula
plt.title(r'$delta_i=20$',fontsize=10)

More Repositories

1

E-commerce-Spring

Spring boot e-commerce application
Java
30
star
2

sqData

πŸ“Š Simple SQL Client for lightweight data analysis using Reactjs framework. Demo
JavaScript
12
star
3

Lending-Club-Data-Analysis

Complete data analysis and machine learning models through lending club dataset
Jupyter Notebook
4
star
4

expense-tracker-app

Expense app built with tRPC, TypeScript, TailwindCSS and NextJS. #trpc #typescript #tailwindcss #zod #nextjs
TypeScript
4
star
5

financial-blog

Exploring MERN Stack (MongoDB, Express.js, React.js, Node.js) while building financial blog, mainly working with APIs
JavaScript
2
star
6

goboard

Python Data Analysis Dashboard using Public Dataset, Django
Python
2
star
7

DAT-210

Jupyter Notebook
1
star
8

data-structures-algorithms-for-somali-devs

A Collection of Algorithms And Data Structures in Python for Somali developers
Python
1
star
9

SumPro

Extractive Summarization built for chrome extension
JavaScript
1
star
10

Data-Preprocessing

This is just a data preprocessing steps that need to be done before model building.
Jupyter Notebook
1
star
11

aoshima-SEIS-739

JavaScript
1
star
12

Olow_Data_Machine_Learning

This course is brought to you by olow data...
Python
1
star
13

Generative-BSP

Advanced Algorithms - CSI-480
Python
1
star
14

url-shortener-go-redis-nextjs

URL Shortener using Golang, Redis, and NextJS
JavaScript
1
star
15

olow304

1
star
16

logic-kart

e-commerce project using node.js, express and mongodb
JavaScript
1
star
17

Iceberg-Challenge-CNN-Model

let's predict whether an image is "iceberg" or "ship"
Jupyter Notebook
1
star
18

Spring-rest-API-and-Reactjs-

Spring rest api application implemented reactjs on the frontend.
JavaScript
1
star
19

snaits-web

JavaScript
1
star