• Stars
    star
    116
  • Rank 303,894 (Top 6 %)
  • Language
    Jupyter Notebook
  • License
    Other
  • Created about 6 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Some fundamental machine learning and data-analysis techniques are explained through realistic examples.

Machine Learning and Data Analysis

This repo contains introduction and examples of some of the most important machine learning and data-analysis techniques.

Filenames are preceded by DDMMYY. For descriptions and more check the Wiki Page.

Dedicated Deep Learning Repository similar to this is here.


Libraries

Python NumPy Pandas scikit-learn TensorFlow SciPy pymc3


PCA_Muller.py 190818: Principal component analysis example with breast cancer data-set.

270918: RidgeandLin.py, LassoandLin.py: Lasso and Ridge regression examples.

081018: bank.csv, data set of selling products of a portuguese company to random customers over phone call(s). Data-set description is available here.

161018: gender_purchase.csv, data-set of two columns describing customers buying a product depending on gender.

111118: winequality-red.csv, red wine data set, where the output is the quality column which ranges from 0 to 10.

121118: pipelineWine.py, A simple example of applying pipeline and gridsearchCV together using the red wine data.

24112018: lagmult.py, This program just demonstrate a simple constrained optimization problem using figures.

11122018: Consumer_Complaints_short.csv, 3 columns describing the complaints, product_label and category. Complete file can be obtained from Govt.data.

13122018: Text-classification_compain_suvo.py, Classify the consumer complaints data, which is already described above.

1912018: SVMdemo.py*, this program shows the effect of using RBF kernel to map from 2d space to 3d space. Animation requires ffmpeg in unix system.

05032019: IBM_Python_Web_Scrapping.ipynb, Deals with basic web scrapping, string handling, image manipulation.

06042019: datacleaning, Folder containing files and images related to data cleaning with pandas.

08062010: DBSCAN_Complete, Folder containing files and images related to application of DBSCAN algorithm to cluster Weather Stations in Canada.

13072019: SVM_Decision_Boundary, Pipeline + GridSearchCV were performed to find best-fit parameters for SVM and then decision function contours of SVM classifier for binary classification are plotted.

28122019: DecsTree, Folder contains notebook using a decision tree classifier on the Bank Marketing Data-Set.

07032020: Conjugate Prior, Folder contains a notebook where concept of conjugate prior is discussed including an introduction to PyMC3.

29052020: ExMax_Algo, Folder contains a notebook completely explaining the Expectation Maximization algorithm.

11092020: AdaptiveLoss.ipynb, File contains description and a simple implemetation of robust and adaptive loss function. Original Paper by J. Barron. More details on TDS.

31092020: pima_diabetes.ipynb, file contains description of data preparation and choosing best machine learning algorithm for binary classification task. Little more details on kaggle kernel.

15112020: terrorism_kaggle.ipynb, Notebook contains elaborate examples on how to think about problems and interpret large scale data using Global Terrorism Database. Apart from Pandas Groupby, Crosstab methods I have also used Folium, Basemap libraries for visualizing Leaflet map and 2D data on maps respectively. More on The Startup.

15022021: FocalLoss_Ex.ipynb, Notebook contains explanation on detail of how Focal Loss works. Please read the original Focal Loss paper. Example of implementing Focal Loss using Tensorflow is also shown. For more detail check the post on TDS.

19062021: Augly_Try.ipynb, Notebook contains examples of image augmentation using Facebook's Augly Library. For more detail check the notebook and TDS post.

24122021: NB_LogisticReg.ipynb, Notebook clearly explains connection between Gaussian Naive Bayes and Logistic Regression and determine parameters of Logistic Regression starting from GNB. The notebook is self-explanatory but you can also check the TDS post.


License

Distributed under Apache License. Read LICENSE.md for detail.


Contacts

Saptashwa.