R, Python and Mathematica Codes in Data Science
Welcome to my GitHub repo.
I am a Data Scientist and I code in R, Python and Wolfram Mathematica. Here you will find some Machine Learning, Deep Learning, Natural Language Processing and Artificial Intelligence models I developed.
Outputs of the models can be seen at my portfolio: https://drive.google.com/file/d/0B0RLknmL54khdjRQWVBKeTVxSHM/view?usp=sharing
Mathematica Codes
MNIST_HOT.5.FULL: is a solution for the MNIST dataset in Mathematica, with 96.51% accuracy, based on difference of pixels.
Mathematica - Artificial Intelligence Simulating Interactions in Social Networks: is a model that simulates human interactions in a social network using cellular automata and agent-based modeling. Each agent has 3 possible choices for interation and a memory. The code has 14 pages with a big loop included in one line of code.
Mathematica - Facial Recognition in Movement: This code operationalizes facial recognition in a downloaded YouTube video. The output is also a video with the result of face recognition (YouTube link of the output is included in code page)
Mathematica - Monte Carlo Simulation: is an animated model of a Markov Chain Monte Carlo Simulation for autonomous driving. A video of the dynamic output was also generated and link for the YouTube video is included in code page.
Mathematica - Social Network Surveillance: is a model that tracks individuals in a social network, tracks also his connections and future interactions.
Python Codes
Keras version used in models: keras==1.1.0 | LSTM 0.2
Python - Autoencoder MNIST: is an autoencoder model for classification of images developed with Keras, for the MNIST dataset, with model Checkpoint as a callback to save weights.
Python - Autoencoder for Text Classification: is an autoencoder model for classification of text made with Keras, also with model Checkpoint.
Python - Deep Learning with Lasagne: is a deep neural network developed with Lasagne, where you can see values of weights in each layer, including bias.
Python - Face Recognition: is a model using OpenCV to detect faces.
Python - Image Extraction from Twitter: is a model that extracts pictures and their links from Twitter webpages, plotting with matplotlib.
Python - Keras Convolutional Neural Network: is a CNN developed to classify the MNIST dataset with an accuracy greater than 99%.
Python - Keras Deep Regressor: is a deep Neural Network for prediction of a continuous output made with Keras, learning rate scheduler according to derivative of error, random initial weights, with loss history.
Python - Keras LSTM Network: is a Recurrent Neural Network (LSTM) to predict and generate text.
Python - Keras Multi Layer Perceptron: is a MLP model, Neural Networks made with Keras with loss history, scheduled learning rate according to derivative of error for prediction and classification.
Python - Machine Learning: is a Principal Components Analysis followed by a Linear Regression.
Python - NLP Doc2Vec: is a Natural Language Processing model where I asked a Wikipedia webpage a question and 4 possible answers were semantically chosen from the tokenized and vectorized webpage, using KNN and cosine distance.
Python - NLP Semantic Analysis: is a Natural Language Processing model that classifies a given sentence according to semantic similarity to other sentences, using cosine distance.
Python - NLP Word2Vec: is a model developed from scratch to measure cosine similarity among words.
Python - Reinforcement Learning: is a model based on simple rules and Game Theory where agents attitude change according to payoff achieved. Can be adapted for tit-for-tat strategy, always cooperate, always defeat and other strategies. Rewards were placed in the payoff matrix.
Python - Social Networks: is a model that draws social networks configuration and connections.
Python - Support Vector Machines: is a Machine Learning model that classifies the Iris dataset with SVM and plots it.
Python - Theano Deep Learning: is a Neural Network with two hidden layers using Theano.
R Codes
R - Churn of Customers: is a model that uses a logistic regression associated with a threshold to predict which customers present the greater risk to be lost.
R - Data Cleaning + Multinomial Regression: is a model that presents data cleaning and a multinomial regression using package nnet to classify customers according to their level of loyalty.
R - Face Recognition: is a code to detect faces and objects in R.
R - Geolocation Brazil: is a file for geo-spatial localization, brazilian map.
R - Geolocation USA: is also a file for geo-spatial localization, USA map.
R - Geolocation World: is a file for geo-spatial localization, world map, zoom available, customizable icons.
R - Gradient Descent Logistic: is a model that performs a gradient descent to define a threshold for the sigmoid function in a Logistic Regression. Boosting was implemented and ROC curves compared.
R - H2O Deep Learning: is a Neural Network model developed to predict recommendations and word-of-mouth advertising.
R - Imbalanced classes is a model for employee churn, where features have no correlation with target variable and also there are imbalanced classes in the proportion 1/20. A logistic regression from scratch is applied, a hill climbing gradient is used to define the best threshold for the logistic function and after that, boosting was compared regarding AUC in a ROC plot.
Logistic Regression + Gradient Descent + Boosting is a model where features have no correlation with target variable. Logistic Regression with Gradient Descent was applied, and then Boosting.
R - MNIST: is a solution for the MNIST dataset, developed from scratch.
R - Markov Chains: is a simple visualization of Markov Chains and probabilities associated.
R - NeuralNet: is a Neural Network model developed to predict and classify word-of-mouth advertising.
R - Ridge Regression: is a model with Ridge Regularization made from scratch to prevent overfitting.
R - Deep Learning: is a Neural Network model with 2 hidden layers for prediction of a continuous variable.