• Stars
    star
    3
  • Rank 3,963,521 (Top 79 %)
  • Language
  • Created about 7 years ago
  • Updated about 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The purpose of this study is to look at the distribution of ratings, movie and users over time, impact of user mood on average rating score and average rating score of genre over time. The analysis is divided into 4 di↵erent 5-year batches to run analysis on sections of data. It was found, the growth, trend and level are stable after the first 5 periods (i.e. after the year 2000). With frequency of rating showing high correlation to new movies and users added, trend for rating over time shows combining e↵ect of growth in user and movie base . Further, weekday-weekend analysis show most of the ratings (approx.70%) are happening over the weekdays. For average rating score, a notable observation is, the shift in the rating pattern for the last batch(latest batch, 2011-2015). In this batch approximately 50% of the rating scores are average and the 25% each for poor and high rating scores in comparison to the other batches where it was 80-20 between average and high/poor rating scores. In the genre analysis it was found 9.4% times users rated genre below 3, 17.5% times for high and 70% times average.

More Repositories

1

als-recommender-pyspark

Recommender System is an information filtering tool that seeks to predict which product a user will like, and based on that, recommends a few products to the users. For example, Amazon can recommend new shopping items to buy, Netflix can recommend new movies to watch, and Google can recommend news that a user might be interested in. The two widely used approaches for building a recommender system are the content-based filtering (CBF) and collaborative filtering (CF).
Jupyter Notebook
33
star
2

Named-Entity-Recognition

Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
Jupyter Notebook
29
star
3

bow_tfidf

This project follows the traditional techniques like the Bag of Words and tf-idf to represent words in a corpus in a numeric format for multilabel classification.
Jupyter Notebook
7
star
4

Naive-Bayes-Spam-Classifier-on-PySpark

Spam detection is one of the major applications of Machine Learning in the interwebs today. Most of the email service providers have spam detection built in to automatically classify such mails as 'Junk Mail'.
Jupyter Notebook
6
star
5

collaborate-github

In this article we will walk through the steps involved in collaborating over vcs to version control and proof read their codes.
Jupyter Notebook
2
star
6

SageMaker-in-5-steps

Sagemaker provides tools to build, train, tune, deploy, and manage large-scale machine learning (ML) models, simpler. In this article, we will be looking at each of these steps.
Jupyter Notebook
1
star
7

EBay_Sales_Analysis

Jupyter Notebook
1
star
8

covid-spread-bokeh

This project aims to visualise covid spread in UK using a python visualisation package Bokeh.
1
star