mrc03/Internship-Assignment

Stars
1
Language
Jupyter Notebook
Created almost 5 years ago
Updated almost 5 years ago

mrc03/Internship-Assignment

mrc03

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

IBM-HR-Analytics-Employee-Attrition-Performance

The IBM HR Analytics Employee Attrition & Performance dataset from the Kaggle. I have first performed Exploratory Data Analysis on the data using various libraries like pandas,seaborn,matplotlib etc.. Then I have plotted used feature selection techniques like RFE to select the features. The data is then oversampled using the SMOTE technique in order to deal with the imbalanced classes. Also the data is then scaled for better performance. Lastly I have trained many ML models from the scikit-learn library for predictive modelling and compared the performance using Precision, Recall and other metrics.

Jupyter Notebook

Flower-Recognition-Kaggle-CNN-Keras

The dataset is Flower Recognition on Kaggle. The dataset consists of 4232 images each of different pixel values. Each of the image can be classified into either of 5 types-> 'Daisy','Rose' etc... . I have trained Convolutional Neural Network written in Keras to predict the flower on the validation set. Also used ImageDataGenerator to augment the training set and avoid overfitting problem .

Jupyter Notebook

Red-Wine-Quality-Accuracy-0.9175-

The Red Wine Quality dataset from kaggle. Data is provided of the composition of the wine having different chemicals. I have used pandas to manipulate the data and seaborn to visualize the data. Finally I have made predictions on the wine quality by using various models from the scikit-learn.

Jupyter Notebook

Cats-vs-Dogs-CNN-Keras

The famous Cats-vs-Dogs dataset. I have used a self laid ConvNet to classify the image into 2 classes either a Dog or a Cat. The images used are of 100*100 pixels each. The images are first converted to the numpy array of pixel values using the python ZipFile module. The images are then divided into the training ,cross-validation,testing set containing 20000 , 5000 , 12500 images respectively. Also I have used data augmentation technique to avoid chances of overfitting the model. Finally I achieved a decent accuracy of about 88 % on the validation set.

Jupyter Notebook

Pokemon-Data-Exploration-Visualization

Pokemon with stats.Data analysis and exploration is performed on the dataset. Visualization is done using the libraries seaborn,matplotlib. Bar plot,box plot,swarm plot,scatter plot,violin plot, heat map etc... were used to analyze the data.

Jupyter Notebook

Movie-Reviews-NLTK-Sentiment-Analysis-

The Movie Reviews dataset. The dataset is imported from the NLTK libray. It has 1000 positive and 1000 negative reviews. I have first imported the dataset into a pandas data frame which makes it easier to do the processing. The next step is to analyze the (+) and ( - ) reviews. I have also preprocessed the dataset using Lemmatizing and other standard NLP techniques. To extract the features from the text I have used the Tfidf vectorizer from the scikit. Lastly I have used various modellig algos from scikit to train on this data.

Jupyter Notebook

Housing-Prices-EDA-and-Regression-Models

The famous Housing Price Advanced Regression competition on Kaggle. The dataset contains of training and testing sets each with about 1.46K rows and 81 features pertaining to a house. I have first performed an exhaustive EDA to identify the underlying trends in the data. I have also removed outliers to make the regression models more robust. Also proper missing values treatment has been done with imputation being done wherever needed. Lastly I have deployed various regression models like Lasso,Ridge etc... from scikit and have also tuned their parameters from the GridSearchCV module. Finally achieved a RMSE of little more than 0.12 which is pretty decent.

Jupyter Notebook

Topic-Modelling-using-LDA-and-LSA-in-Sklearn

I have performed topic modelling on the dataset : "A Million News Headlines' on the kaggle. I have first pre-processed and cleaned the data. Then I have used the implementations of the LDA and the LSA in the sklearn library. Also the distribution of words in a topic is shown.

Jupyter Notebook

Word-Embeddings-in-Gensim-and-Keras

A simple implementation of word embeddings in Gensim and Keras libraries. I have implemented famous Word2Vec in Gensim library. As an alternative I have also used Keras embedding layer to generate the word embeddings.

Jupyter Notebook

Spooky-Author-Identification

The notebook on famous Kaggle competition : Spooky Author Identification. The task is to identify the authors from their respective texts or work. I have first cleaned and pre-processed the text using standard NLP techniques like tokenization , stemming or lemmatization , stop-word removal etc.... I have also tried to create some meta features or hand-crafted features based on the author writing pattern. Then I have used the traditional BOW approach with TFIDF Vectorizer and the Count Vectorizer and then deployed ML algos like LogisticRegression and Naive Bayes which are well suited for text data. For me tfidf on count vectorizer gave best results till now ; My submission scored a multi-class log loss of 0.46 on kaggle private LB which is quite decent.

Jupyter Notebook

Gender-Recognition-by-Voice-Val.-Acc.-0.9908-

The Gender Recognition by Voice dataset from kaggle. The dataset consists of 3168 voice samples each of which has 20 different acoustic properties and the target variable is the 'gender' or the 'label'. I have done exhaustive EDA to analyze the data and the underlying trends. Also the outliers have been detected and removed for better performance. I have also done significant feature engineering by adding couple of new relevant features. Also I have normalized the data for better performance. Lastly I have used many classification algos. from the scikit to predict the 'gender' from the voice sample. For me SVM gives highest accuracy of about a little more than 99.1 %.

Jupyter Notebook

Amazon-Fine-Food-Reviews-Analysis

The famous Amazon fine food reviews dataset on Kaggle for text classification. I have performed sentiment analysis on the dataset using different techniques. Please see readme for details.

Jupyter Notebook

The-Iris-Species-Dataset

The famous Iris Species Dataset from Kaggle. I have normalized the features and also seen their distribution. I have also deployed many algos from scikit to predict on the dataset.

Jupyter Notebook

Titanic-Survivor-Prediction

The Titanic: Machine Learning from Disaster competiton. With data being provided of varoius passengers traveling on the ship I have used libraries like numpy,pandas to manipulate , explore and analyze the data and libraries like matplotlib and seaborn to visualise the data. Lastly I have used various machine learning models to make predictions on the formerly cleaned and preprocessed data. Then I used GridSearchCV to optimise the parameters of the various models

Jupyter Notebook

BIKE-SHARING-DEMAND-RMSLE-0.3194-

Bike Sharing Demand.

Jupyter Notebook

Calculator

A SIMPLE CALCULATOR APPLICATION IN ANDROID. I HAVE USED JAVA AND XML FOR THE LAYOUT. THE CALCULATOR HAS BASIC OPERATIONS AND SOME OTHER UTILITY FUNCTIONS.

MNIST-DIGIT-RECOGNIZER-USING-ConvNet-KERAS-Accuracy-0.9943-

The MNIST DIGIT RECOGNIZER COMPETITION ON KAGGLE. The training dataset consists of 42000 rows each of 784 pixel values thus representing 28 x 28 sized 42000 images of different digits from 0 to 9 . I have trained Convolutional Neural Networks written in Keras to train the model and predicted on the 28000 images of the test dataset, Also achieved 99.43% accuracy on Kaggle with 20 epochs . Also used ImageDataGenerator to augment the training set and avoid overfitting problem .

Jupyter Notebook

SAD_PROJECT

A blood bank mobile application where the user can register and login. A blood donor can register with the application and earn points. The receiver can search for donors and either call donor or locate him on the Google Maps. The application uses Java , XML and the Firebase API as backend and Google Maps API to locate the donor on the Google Maps.

TicTacToe

Tic Tac Toe is simple tic tac toe game developed on the android platform. The application was developed in just 2 hours for the International Organisation of Software Developers(IOSD) Hackathon.

Appdichat

Appdichat is mobile chatting application with many features. You can register and login with the application. Also you can send friend request to your friends and accept or decline the requests received. You can also chat in real time with your friends and also see a list of the people using the application. A user can also build his profile by setting his display profile picture and set the status. Also you can view the profile of all your friends and know the number of mutual friends. The application uses Java, XML and the Firebase realtime database.

SqlitePractice

Basic-Guide-to-Natural-Language-Processing-with-NLTK-and-Spacy

A basic guide to implement fundamental NLP techniques like text normalization, text similarity etc... through NLTK library and Spacy.

Jupyter Notebook

Project

The Project is an Android application that displays the level of various gases in the atmosphere. The volume of gases in the atmosphere is stored in an Excel file. The data values stored in an Excel file is updated periodcally with data fetched from the sensors.The application reads the contents of the file and displays the results fetched in the application.

CPUSchedulerApp

A small Android application for the Operating System Lab Project. The application implements a short term scheduler. The user enters the details of the various processes and then chooses one among the many options for the CPU scheduling algorithm. The application then displays the sequence in which the processes will be executed and also various other quantities like waiting time,turn around time for a process etc..

Object-Recognition-CIFAR-10-CNN-Keras

The famous CIFAR-10 dataset. The dataset contains of images of different objects like airplane, horse ,ship etc... that needs to be classified. The training set contains of 50000 images of 32*32 pixels each. Similarly the validation set contains 10000 images of 32*32 pixels too. I have used a self laid ConvNet to correctly classify the images into 10 classes each pertaining to one object. I have also used data augmentation using the ImageGenerator class provided in the Keras library to further increase the size of the training set and thus reduce overfitting chances. Finally I have used the ConvNet to make predictions onto the validation set and achieved a decent accuracy of near about 86%.

Jupyter Notebook