Repository for Medium articles
- jupyter_kafka - How to do a sentiment analysis in realtime using the Jupyter notebook, Kafka and NLTK
- kafka_nlp - Building a realtime NLP pipeline using Kafka and spaCy
- livy_batch_emr - How to do better deployments of spark jobs to aws emr using apache livy
- pandas_validation - How to do column validation with pandas
- porto_seguro_spark - Safe driver prediction using PySpark and Logistic Regression
- pyspark-project-template - How to setup the Python and Spark environment for development, with good software engineering practices
- titanic_spark - Realtime prediction using Spark Structured Streaming, XGBoost and Scala
- titanic_xgboost - PySpark ML and XGBoost full integration tested on the Kaggle Titanic dataset
- scala_notebook_test - How to run Scala and Spark in the Jupyter notebook
- mlflow-automl - How to build an integration between AutoML and MLFlow
- realtime_kafka - Building a real-time prediction pipeline using Spark Structured Streaming and Microservices
- realtime_fraud_detection - How to build a real-time fraud detection pipeline using Faust and MLFlow
- terraform_eks_spark - How to run a PySpark job in Kubernetes (AWS EKS)
- s3a_spark - How to read parquet data from S3 using the S3A protocol and temporary credentials in PySpark
- pyflink_riverml - Building a Credit Card Fraud Detection Online Training Pipeline with River ML and Apache Flink
- cdktf_azure_sentiment_ml - Building a Serverless Azure ML Service Using Cognitive and CDKTF
- aks_seldon - Building a Health Entity labelling service using Azure Kubernetes Service, Seldon Core and Azure Cognitive
- neo4j_companies_knn - Predicting similar political donors for UK parties using graph data
- xgboost_docker - PySpark ML and XGBoost setup using a docker image
- xgboost_pyspark - PySpark integration with the native python package of XGBoost
You can stay up to date with the latest stories I post on medium here