• Stars
    star
    154
  • Rank 242,095 (Top 5 %)
  • Language
    Python
  • Created almost 7 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Predicting Amsterdam house / real estate prices using Ordinary Least Squares-, XGBoost-, KNN-, Lasso-, Ridge-, Polynomial-, Random Forest-, and Neural Network MLP Regression (via scikit-learn)

Predicting Amsterdam house / real estate prices using Ordinary Least Squares-, XGBoost-, KNN-, Lasso-, Ridge-, Polynomial-, Random Forest-, and Neural Network MLP Regression (via scikit-learn)

Approach:

  • load Pandas DataFrame containing (Dec-17) housing data retrieved by means of the following scraper, supplemented with longitude and latitude coordinates mapped to zip code (via GeoPy
  • do some simple data exploration / visualisation
  • remove non-numeric data, NaNs, and outliers (everything above 3 x standard dev of y)
  • define explanatory variables (surface,latitude,and longitude) and independent variable (price EUR)
  • split the data in train and test sets (+ normalise independent variables where required)
  • find the optimal model parameters using scikit-learn's GridSearchCV
  • fit the model using GridSearchCV's optimal parameters
  • evaluate estimator performance by means of 5 fold 'shuffled' nested cross-validation
  • predict cross validated estimates of y for each data point and plot on scatter diagram vs true y

Packages required

Scores (5 fold nested 'shuffled'cross-validation - Rsquared)

1. XGBoost Regression

  • Parameters: max_depth: 5, min_child_weight: 6, gamma: 0.01, colsample_bytree: 1, subsample: 0.7
  • Score: 0.887

2. Random Forest Regression

  • Parameters: max_depth: 6, max_feat: None, n_estimators: 10
  • Score: 0.839

3. Polynomial Regression

  • Parameters: degrees: 2
  • Score: 0.731

4. Neural Network MLP Regression

  • Parameters: act: relu, alpha: 0.01, hidden_layer_size: (10,10), learning_rate: invscal
  • Score: 0.715

5. KNN Regression

  • Parameters: n_neighbours: 10
  • Score: 0.711

6. Ordinary Least-Squares Regression

  • Parameters: None
  • Score: 0.694

7. Ridge Regression

  • Parameters: alpha: 0.01
  • Score: 0.694

8. Lasso Regression

  • Parameters: alpha 0.01
  • Score: 0.693

Sample data input (Pandas DataFrame)

   surface  rooms_new  zipcode_new  price_new   latitude  longitude
0    138.0        4.0         1060     420000  40.804672 -73.963420
1    130.0        5.0         1087     550000  52.355590   5.000561
2    116.0        5.0         1061     425000  52.373044   4.837568
3     92.0        5.0         1035     349511  52.416895   4.906767
4    127.0        4.0         1013    1050000  52.396789   4.876607

Scatter plot - Surface vs. Asking Price (EUR)

alt text

XGBoost - Predicted prices vs. True price (EUR)

alt text

Random Forest - Predicted prices vs. True price (EUR)

alt text

Polynomial - Predicted prices vs. True price (EUR)

alt text

Neural Network MLP - Predicted prices vs. True price (EUR)

alt text

KNN - Predicted prices vs. True price (EUR)

alt text

OLS - Predicted prices vs. True price (EUR)

alt text

Lasso - Predicted prices vs. True price (EUR)

alt text

Ridge - Predicted prices vs. True price (EUR)

alt text

More Repositories

1

incremental_training

Repo that relates to the Medium blog 'Keeping your ML model in shape with Kafka, Airflow' and MLFlow'
Python
120
star
2

Python_Portfolio__VaR_Tool

Python-based portfolio / stock widget which sources data from Yahoo Finance and calculates different types of Value-at-Risk (VaR) metrics and many other (ex-post) risk/return characteristics both on an individual stock and portfolio-basis, stand-alone and vs. a benchmark of choice (constructed with wxPython)
Python
116
star
3

dockerized_data_science_playground

Multi-docker container data science / engineering playground (w/ Kafka, Airflow, MLFlow, Tensorflow-Keras / SKLearn) for simulating a microservices-oriented architecture
Dockerfile
10
star
4

Solvency_II_Equity_Risk_Capital_Charge

Python script for calculating the (type I) equity risk solvency capital charge ("SCR") under Solvency II
Python
7
star
5

Hyperopt

Repo that relates to the Medium blog 'Using Bayesian Optimization to reduce the time spent on hyperparameter tuning'
Jupyter Notebook
7
star
6

Solvency_II_Spread_Risk_Capital_Charge

Python script for calculating the spread risk solvency capital charge ("SCR") for a bond portfolio under Solvency II (along the standard formula)
Python
6
star
7

Django-local-community-football-platform

Web-based local community football platform built on the Django web framework (with the help of Python, Bootstrap3, GeoPy, and GoogleMaps-API)
HTML
4
star
8

Mean_Variance_Portfolio_Optimization_with_Carbon_Intensity_Constraints

Python script for single period Mean-Variance Optimization (MVO) with scope 1+2 carbon intensity constraints
Python
4
star
9

Amsterdam-Airport-Schiphol-Flight-Data-App

Basic API-sourced python-based flight data widget for retrieving arrival and departures data for Amsterdam Airport Schiphol (along a GUI constructed with wxPython)
Python
1
star