• Stars
    star
    119
  • Rank 297,930 (Top 6 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 6 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Forecast stock prices using machine learning approach. A time series analysis. Employ the Use of Predictive Modeling in Machine Learning to Forecast Stock Return. Approach Used by Hedge Funds to Select Tradeable Stocks

STOCK-RETURN-PREDICTION-USING-KNN-SVM-GUASSIAN-PROCESS-ADABOOST-TREE-REGRESSION-AND-QDA HitCount

Forecasts stock prices using classical machine learning techniques- A time series analysis & modeling. Employ the Use of Predictive Modeling in Machine Learning to Forecast Stock Return. Approach Used by Hedge Funds to Select Tradeable Stocks

Objective:

      Predict stock stock price using Technical Indicators as predictors (features).
      Use Supervised Machine Learning Approach to predict stock prices.
      Employ the use of pipeline and GridSearch to select the best model
      Use Final Model to Predict Stock Returns.
      Show plots of stock Return
      Write Deployable script.

Note:

      That Every stock has different behaviour and so at every point we may
      have different best performing algorithm. For instance, after much 
      testing Ranform Forest Algorithm perform better for predicting Apple 
      Stocks than any other algo. Guassian process classifier performed 
      better than every other algo at predicting IBM stocks etc.

Indicators/Predictors Used:

    Moving Averages(Also called Rolling mean)
    Commodity Channel Index
    Momentum
    Stochastic Oscillator(D and K)
    Force Index
    Mass Index

    # You can add ass many predictors are desired.
    # Most importantly if you have to do this, you may
    have to consider a feature selection using XGBoost.

How to Use

      >git clone https://github.com/kennedyCzar/STOCK-RETURN-PREDICTION-USING-KNN-SVM-GUASSIAN-PROCESS-ADABOOST-TREE-REGRESSION-AND-QDA
      Unpak the Files in a project folder
      
      Add File Path to Environment Variable using Spyder PythonPath
      
      Click on Synchronize with Environment.
      
      Restart Spyder.
      
      Report Issue

Output

plot of Feature Importance Image of FeatureImportance Gold Stock Retuns Image of Regression General Motors stock returns Image of Regression Apple stock returns Image of Regression Tesla Stock Returns Image of Regression

Performing optimization...

      Estimation grid_RandomForestClassifier
      Best params: {'clf__criterion': 'gini', 'clf__max_depth': 8, 
      'clf__min_samples_leaf': 8, 'clf__min_samples_split': 9}
      Best training accuracy: 0.855755894590846
      Test set accuracy score for best params: 0.8546042003231018

      Estimation grid_RandomForestClassifier_PCA
      Best params: {'clf__criterion': 'entropy', 'clf__max_depth': 7, 
      'clf__min_samples_leaf': 6, 'clf__min_samples_split': 3}
      Best training accuracy: 0.7489597780859917
      Test set accuracy score for best params: 0.691437802907916

      Estimation grid_KNN
      Best params: {'clf__n_neighbors': 10}
      Best training accuracy: 0.8037447988904299
      Test set accuracy score for best params: 0.778675282714055

      Estimation grid_KNN_PCA_
      Best params: {'clf__n_neighbors': 9}
      Best training accuracy: 0.7149791955617198
      Test set accuracy score for best params: 0.6882067851373183

      Estimation grid_SVC
      Best params: {'clf__C': 5, 'clf__gamma': 0.0001, 'clf__kernel': 'linear'}
      Best training accuracy: 0.8411927877947295
      Test set accuracy score for best params: 0.851373182552504

      Estimation grid_SVC_PCA
      Best params: {'clf__C': 1, 'clf__gamma': 1, 'clf__kernel': 'rbf'}
      Best training accuracy: 0.7323162274618585
      Test set accuracy score for best params: 0.6865912762520194

      Estimation grid_GaussianProcessClassifier
      Best params: {'clf__kernel': 1**2 * RBF(length_scale=1)}
      Best training accuracy: 0.8585298196948682
      Test set accuracy score for best params: 0.8675282714054927

      Estimation grid_GaussianProcessClassifier_PCA
      Best params: {'clf__kernel': 1**2 * RBF(length_scale=1)}
      Best training accuracy: 0.7295423023578363
      Test set accuracy score for best params: 0.7011308562197092

      Estimation grid_LogisticRegression
      Best params: {'clf__C': 0.1, 'clf__penalty': 'l1', 'clf__solver': 'liblinear'}
      Best training accuracy: 0.8349514563106796
      Test set accuracy score for best params: 0.8432956381260097

      Estimation grid_LogisticRegression_PCA
      Best params: {'clf__C': 0.1, 'clf__penalty': 'l1', 'clf__solver': 'liblinear'}
      Best training accuracy: 0.7267683772538142
      Test set accuracy score for best params: 0.7059773828756059

      Estimation grid_DecisionTreeClassifier
      Best params: {'clf__max_depth': 3}
      Best training accuracy: 0.8280166435506241
      Test set accuracy score for best params: 0.8481421647819063

      Estimation grid_DecisionTreeClassifier_PCA
      Best params: {'clf__max_depth': 6}
      Best training accuracy: 0.7246879334257975
      Test set accuracy score for best params: 0.6978998384491115

      Estimation grid_AdaBoostClassifier
      Best params: {'clf__n_estimators': 8}
      Best training accuracy: 0.8141470180305131
      Test set accuracy score for best params: 0.8222940226171244

      Estimation grid_AdaBoostClassifier_PCA
      Best params: {'clf__n_estimators': 22}
      Best training accuracy: 0.6768377253814147
      Test set accuracy score for best params: 0.6348949919224556

      Estimation grid_GaussianNB
      Best params: {'clf__priors': None}
      Best training accuracy: 0.7441054091539528
      Test set accuracy score for best params: 0.7544426494345718

      Estimation grid_GaussianNB_PCA
      Best params: {'clf__priors': None}
      Best training accuracy: 0.7205270457697642
      Test set accuracy score for best params: 0.7075928917609047

      Estimation grid_QuadraticDiscriminantAnalysis
      Best params: {'clf__priors': None}
      Best training accuracy: 0.7933425797503467
      Test set accuracy score for best params: 0.7883683360258481

      Estimation grid_QuadraticDiscriminantAnalysis_PCA
      Best params: {'clf__priors': None}
      Best training accuracy: 0.7191400832177531
      Test set accuracy score for best params: 0.7075928917609047

       Classifier with best test set accuracy: grid_GaussianProcessClassifier

Conclusion

You must note that this strategy is trading is a low frequency approach and this 
fits to make steady income over a period of time.
For high Frequency Trading the result of the return is quite high.

GOLD happens to give the most return on applied strategy(As shown in
the graphs above).
Also worthy of mention is the fact that, Random Forest Classifier + PCA 
in most cases performed better for stocks prices with both unsteady and steady rise. 
Followed Next to Adaboost, then Gradientbost Classifier.
In any case, the performance of an algorithm depends on the structure of 
the underlying prices. Its behaviour over a time series.
For different stocks different agorithm perform best.

contributions welcome

More Repositories

1

FORECASTING-1.0

Predictive algorithm for forecasting the mexican stock exchange. Machine Learning approach to forecast price and Indicator behaviours of MACD, Bollinger and SuperTrend strategy
Python
36
star
2

URI-URL-CLASSIFICATION-USING-RECURRENT-NEURAL-NETWORK-SVM-AND-RANDOMFOREST

URI-URL Classification using Recurrent Neural Network, Support Vector and RandomForest. The Implementation results follows with classification report, confusion matrix and precision_recall_fscore_support for each validation result of a 10-fold crossval
Python
19
star
3

ALGORITHM-TRADING-AND-STOCK-PREDICTION-USING-MACHINE-LEARNING

ALGORITHM TRADING AND STOCK PREDICTION USING MACHINE LEARNING
Python
15
star
4

NLP-PROJECT-BOOK-INSIGHTS-WITH-PLOTLY

Plotly-Dash NLP project. Document similarity measure using Latent Dirichlet Allocation, principal component analysis and finally follow with KMeans clustering. Project is completed with dynamic visual interaction.
Python
11
star
5

TRANSFER-LEARNING-AND-OPTIMAL-TRANSPORT

Transfer Learning and Optimal Transport. A demonstration of Subspace Alignment algorithm and Entropy Regularized Optimal Transport (Sinkhorn's Algorithm) on Office/Caltech dataset.
Python
10
star
6

ADVANCE-MACHINE-LEARNING-KERNEL-METHOD

Advance machine Learning: Kernel methods implemented for PCA, KMeans, Logistic Regression, Support Vector Machine (SVM) and Support Vector Data Description (SVDD)
Python
9
star
7

Active-learning-and-online-learning-machine-learning-algorithms.

Passive-Aggressive Algorithm and Active Passive-Aggressive Online Algorithm. Kernel Passive-Aggressive Algorithm.
Terra
7
star
8

HIGH-DIMENSIONAL-DATA-CLUSTERING

Implementation of hierarchical clustering on small n-sample dataset with very high dimension. Together with the visualization results implemented in R and python
Python
6
star
9

ADVANCE-ALGORITHMS-TRAVELLING-SALESMAN-PROBLEM

An implementation of the travelling salesman problem using Brute-force, Branch-and-bound, removing-edges, MST-approximationn, Nearest_neighbour(greedy), Dynamic Programming, Randomized approach, Genetic programming
Java
5
star
10

blockchain_with_python

Step 1: Building a Blockchain Open up your favourite text editor or IDE. We’ll only use a single file, but if you get lost, you can always refer to the source code. Representing a Blockchain We’ll create a Blockchain class whose constructor creates an initial empty list (to store our blockchain), and another to store transactions. Here’s the blueprint for our class:
Python
4
star
11

EditDistanceAdvanceAlgoProject

Minimum Edit Distance (Advance Algorithm Project)- Implementing Dynamic, Greedy, Branch and Bound, K-strip Algo
Python
3
star
12

POSTGRE-DATABASE-MANAGEMENT-USING-DJANGO-REST-API

Django application for automatically populating a postgre database and manipulating the frontend of the application. Users can easily search keywords from the frondend and save query result in different formats.
HTML
3
star
13

SEMI-SUPERVISED-NAIVE-BAYES-FOR-TEXT-CLASSIFICATION

Semi-supervised machine learning for text classification. Increased accuracy of 99% on unlabelled data.
Python
2
star
14

EIGEN-FREQUENCY-CLUSTERING-USING-KMEANS-DBSCAN-PCA-HDBSCAN

EIGEN FREQUENCY CLUSTERING USING [KMEANS] [KMEANS & PCA ] [DBSCAN] [HDBSCAN]
Python
2
star
15

n-th_monsien_number

find the n-th Monisen number. A number M is a Monisen number if M=2**P-1 and both M and P are prime numbers. For example, if P=5, M=2**P-1=31, 5 and 31 are both prime numbers, so 31 is a Monisen number. Put the 6-th Monisen number into a single text file and submit online.
Python
1
star