random (@cacoderquan)
  • Stars
    star
    23
  • Global Rank 587,022 (Top 21 %)
  • Followers 4
  • Following 2
  • Registered over 9 years ago
  • Most used languages
    Python
    100.0 %
  • Location ๐Ÿ‡ญ๐Ÿ‡ฐ Hong Kong
  • Country Total Rank 1,364
  • Country Ranking
    Python
    547

Top repositories

1

Predict-financial-recession

The major goal of this project is to predict financial re- cession given the frequencies of the top 500 word stems in the reports of financial companies. After applying various learning models, we can see that the prediction of financial recession by the bag of words has an accuracy of more than 90%. Hence, there is indeed a correlation between the two. Moreover, we have compared different learning models (ensemble methods with Decision Tree, SVM, and KNN) with various parameters to find the best model with a relatively high average accuracy and low variance of accuracy by cross-validation on the training data set. In addition, we have also tried several pre-processing methods (tf-idf, feature selection, and centroid-based clustering) to improve the accuracy of the learning models. In the end, the best model is Gradient Boosting with Decision Tree using the pre-processed tf-idf data set.
Python
14
star
2

Sentiment-Analysis-on-the-Rotten-Tomatoes-movie-review-dataset

The Rotten Tomatoes movie review corpus is a collection of movie reviews collected by Pang and Lee in [2]. This corpus has been analysed in [3] where each sentence is parsed into its tree structure and each node is assigned a fine-grained sentiment label ranging from 1 โˆ’ 5 where the numbers represent very negative, negative, neutral, positive and very positive respectively. In this paper we use this data on ath000 phrases and all the methods in this paper are assessed by training on a random subset of phrases (and their subphrases) of size approximately 4/5 of the data set and testing using the remaining 1/5. The idea is to use the non-associative functions and the parser trees structures to modify the feature vectors.
Python
7
star
3

Visualization-of-Latent-Factors-from-Movies

The goal is to visualize and interpret a 2-dimensional latent features for movies of the given data-set. We are given the categorizations of about 1600 movies into 19 genres, and ratings of some users to a specific movie. We applied matrix factorization to the sparse ratings matrix (since not all users are going to rate all movies) to look for latent factor matrices of the movies and users. Then we used the principal component analysis (PCA) to analyze the latent factor matrix of movies and projected each movie to the two strongest latent factors. Finally, we did visualization and interpretation on each category and compared the category average of the two major latent factors.
Python
1
star