Lokesh Palacharla (@lokyGit)
  • Stars
    star
    6
  • Global Rank 1,300,168 (Top 45 %)
  • Followers 2
  • Following 2
  • Registered almost 5 years ago
  • Most used languages
    R
    75.0 %
    Clojure
    25.0 %
  • Location 🇺🇸 United States
  • Country Total Rank 217,931
  • Country Ranking
    Clojure
    2,481
    R
    3,991

Top repositories

1

ionosphere-signals-prediction

This project is about analyzing Ionosphere data and measuring the accuracies of the electromagnetic signal data. The radar statistics were gathered by an arrangement in Goose Bay, Labrador. This system involves a phased array of 16 high-frequency transmitters with an aggregate transferred power on the order of 6.4 kilowatts. Expected waves were handled by exercising an autocorrelation function whose arguments are the time of a pulse and the pulse number. There were 17 pulse numbers for the Goose Bay system. Two attributes per pulse number describe instances in this database. This dataset describes high-frequency antenna returns from high energy particles in the atmosphere, and whether the return shows structure or not. The problem is a binary classification that contains 351 instances and 35 numerical attributes. The majority of the data in this set are continuous data points which range between -1 and 1, with one binomial variable which defines the type of the electromagnetic signals. The objective of the project is to measure the accuracies of ‘good’ instances and ‘bad’ cases by feeding the dataset to the machine learning models mentioned below and report some of the measures to improve the overall performance of the models. Predicting the good and bad signals is very important as these signals propagate through distant places and contribute in providing better communication and help in improving the navigation. We will predict the good and bad signal results using 3 methods - KNN, GLM and decision tree and then use ensemble techniques to improve the accuracy of the model. In the ensemble technique, we will use the stacking method. We observed that generalized linear model has better classification rate among the rest and after implementing stacking technique we were able to improve the overall performance of the stacked models. Introduction Source Information: -- Donor: Vince Sigillito ([email protected]) -- Date: 1989 -- Source: Space Physics Group, Applied Physics Laboratory, Johns Hopkins University, MD 20723 The first 34 columns are continuous numerical data which represent 17 pulse numbers of received electromagnetic signals. There are two attributes per pulse number, which is the time of the pulse and the pulse number. The 35th column is categorical data "good" or "bad". "good" means those radar showing evidence of some type of structure in the ionosphere. “bad" implies those radar does not indicate their signals pass through the ionosphere. Implementation of the Project First, we install the necessary packages and load the required libraries as mentioned below and then we read the dataset in R. We convert the last column label feature from character to factor. Next, to identify the important features we applied fitted Boruta model with the data and found out that column two i.e, V2 is not important and therefore, we removed V2 from the dataset and Created the significant dataset with important variables only. Then we split the dataset to train dataset and test dataset. Once, we have the training and test datasets we made use of knn() available in Class library for implementing KNN algorithm and glm() to implement logistic regression and rpart () to implement decision tree methods on our dataset. We chose these methods for our prediction and data analysis as we have binomial variable data with a binomial output. Because the above-mentioned algorithms perform better while dealing with categorical data points, we decided to implement the aforesaid classification methods. After completing with our modelling, we decided to improve the resulted accuracies of the models by implementing ensemble technique and we chose stacking for this case because it’s designed to combine model outputs of different types.
R
3
star
2

metabase-mongobi-connector

This repo contains the procedure for building a BI connector of Metabase which can be used to connect to the MongoDB Atlas cluster.
Clojure
1
star
3

neural-networks

This project is to train a feedforward neural network model to execute a binary coded decimal (BCD) adder problem. Ideally, we need to create a script to transform the 8-bit input into a 5-bit output which includes the carry forward bit. We are allowed to work on the design of the neural network model by experimenting with different combinations of hidden layers and neurons. We are expected to start simple and then increase the complexity as necessary.
R
1
star
4

naivebayes-decisionboundaries

Naïve Bayes is a popular classification algorithm which is preferably used while dealing with datasets with categorical features. It can also be used with continuous variables but preferred when the inputs are categorical. Naïve Bayes works on an assumption that the existence of a feature in a particular class is independent of the existence of other features in all the other classes. With such assumptions made, this algorithm is referred to as Naïve. For the continuous datapoints it makes an assumption that the data is normally distributed. Naïve Bayes algorithm can outperform many other classification algorithms when working with huge datasets because of its simple functioning. The below equation is a simple Bayesian inference for finding out the posterior probabilities for winning the blackjack when getting an Ace card. P(A) = the probability of getting an Ace card P(BJ) = the probability of winning blackjack P(A | BJ) = P(BJ | A)*P(A)/P(BJ) Where, P(A | BJ) is the posterior probability P(BJ | A) is the likelihood P(A) is class prior probability P(BJ) is predictor prior probability.
R
1
star