Introduction
Set of predictive and exploratory machine learning tools with Spark and Python
Philosophy
- Simple to use
- Input output in CSV format
- Metadata defined in simple JSON file
- Extremely configurable with tons of configuration knobs
Solution
- Exploratry Analytic
- KNN Cluster
- Naive Bayes
- Discrimininant analysis
- Nearest Neighbor
- Decision Tree and Random Forest
- SVM
- Association Mining
- Reinforcement learning
- Multi Arm Bandit
- Stochastic Optimization
- Feedforward Network
- LSTM
- Autoencoder
- Deep Reinforcement Learning
- NLP and Neural Language Model
- Graph Convolution Network
- MLOps
Blogs
The following blogs of mine are good source of details of avenir. These are the only source of detail documentation
- http://pkghosh.wordpress.com/2014/03/12/using-mutual-information-to-find-critical-factors-in-hospital-readmission/
- http://pkghosh.wordpress.com/2014/01/09/boost-lead-generation-with-online-reinforcement-learning/
- http://pkghosh.wordpress.com/2013/11/06/retarget-campaign-for-abandoned-shopping-carts-with-decision-tree/
- http://pkghosh.wordpress.com/2013/10/06/predicting-customer-loyalty-trajectory/
- http://pkghosh.wordpress.com/2013/08/25/bandits-know-the-best-product-price/
- http://pkghosh.wordpress.com/2013/06/29/learning-but-greedy-gambler/
- http://pkghosh.wordpress.com/2013/04/15/smarter-email-marketing-with-markov-model/
- http://pkghosh.wordpress.com/2013/03/18/analytic-is-your-doctors-friend/
- http://pkghosh.wordpress.com/2013/02/19/stop-the-customer-separation-pain-bayesian-classifier/
- http://pkghosh.wordpress.com/2013/01/31/explore-with-cramer-index/
- https://pkghosh.wordpress.com/2015/07/06/customer-conversion-prediction-with-markov-chain-classifier/
- https://pkghosh.wordpress.com/2015/05/11/is-bigger-data-better-for-machine-learning/
- https://pkghosh.wordpress.com/2015/12/13/association-mining-with-improved-apriori-algorithm/
- https://pkghosh.wordpress.com/2016/03/14/is-neural-network-better-off-with-big-data/
- https://pkghosh.wordpress.com/2016/04/13/customer-churn-prediction-with-svm-using-scikit-learn/
- https://pkghosh.wordpress.com/2016/06/14/inventory-forecasting-with-markov-chain-monte-carlo/
- https://pkghosh.wordpress.com/2016/07/30/customer-segmentation-based-on-online-behavior-using-scikitlearn/
- https://pkghosh.wordpress.com/2016/10/27/supplier-fulfillment-forecasting-with-continuous-time-markov-chain-using-spark/
- https://pkghosh.wordpress.com/2017/04/30/predicting-call-hangup-in-customer-service-calls-with-decision-tree-and-random-forest/
- https://pkghosh.wordpress.com/2017/06/26/project-assignment-optimization-with-simulated-annealing-on-spark/
- https://pkghosh.wordpress.com/2017/09/18/handling-rare-events-and-class-imbalance-in-predictive-modeling-for-machine-failure/
- https://pkghosh.wordpress.com/2017/10/09/combating-high-cardinality-features-in-supervised-machine-learning/
- https://pkghosh.wordpress.com/2018/02/21/optimizing-discount-price-for-perishable-products-with-thompson-sampling-using-spark/
- https://pkghosh.wordpress.com/2018/03/19/handling-categorical-feature-variables-in-machine-learning-using-spark/
- https://pkghosh.wordpress.com/2018/04/18/predicting-crm-lead-conversion-with-gradient-boosting-using-scikitlearn/
- https://pkghosh.wordpress.com/2018/05/14/auto-training-and-parameter-tuning-for-a-scikitlearn-based-model-for-leads-conversion-prediction/
- https://pkghosh.wordpress.com/2018/06/18/leave-one-out-encoding-for-categorical-feature-variables-on-spark/
- https://pkghosh.wordpress.com/2018/07/18/improving-elastic-search-query-result-with-query-expansion-using-topic-modeling/
- https://pkghosh.wordpress.com/2019/02/10/supervised-machine-learning-parameter-search-and-tuning-with-simulated-annealing/
- https://pkghosh.wordpress.com/2019/05/07/synthetic-training-data-generation-for-machine-learning-classification-problems-using-ancestral-sampling/
- https://pkghosh.wordpress.com/2019/06/27/six-unsupervised-extractive-text-summarization-techniques-side-by-side/
- https://pkghosh.wordpress.com/2019/08/07/encoding-high-cardinality-categorical-variables-with-feature-hashing-on-spark/
- https://pkghosh.wordpress.com/2019/08/26/missing-value-imputation-with-restricted-boltzmann-machine-neural-network/
- https://pkghosh.wordpress.com/2019/10/23/automated-machine-learning-with-hyperopt-and-scikitlearn-without-writing-python-code/
- https://pkghosh.wordpress.com/2019/11/22/machine-learning-model-interpretation-and-prescriptive-analytic-with-lime/
- https://pkghosh.wordpress.com/2020/01/21/evaluation-of-time-series-predictability-with-kaboudan-metric-using-prophet/
- https://pkghosh.wordpress.com/2020/02/24/model-drift-detection-with-kolmogorov-smirnov-statistic-on-spark/
- https://pkghosh.wordpress.com/2020/03/26/building-scikitlearn-random-forest-model-and-tuning-parameters-without-writing-python-code/
- https://pkghosh.wordpress.com/2020/05/11/monte-carlo-simulation-library-in-python-with-project-cost-estimation-as-an-example/
- https://pkghosh.wordpress.com/2020/06/08/deep-reinforcement-learning-with-rllib-and-tensorflow-for-price-optimization/
- https://pkghosh.wordpress.com/2020/07/13/learn-about-your-data-with-about-seventy-data-exploration-functions-all-in-one-python-class/
- https://pkghosh.wordpress.com/2020/07/28/semantic-search-with-pre-trained-neural-transformer-model-using-document-sentence-and-token-level-embedding/
- https://pkghosh.wordpress.com/2020/08/18/predicting-individual-viral-infection-using-contact-data-with-lstm-neural-network/
- https://pkghosh.wordpress.com/2020/10/28/causal-inference-with-deep-learning-using-manufacturing-supply-chain-optimization-as-an-example/
- https://pkghosh.wordpress.com/2020/11/26/meeting-schedule-optimization-with-genetic-algorithm-in-python/
- https://pkghosh.wordpress.com/2021/02/26/detecting-and-measuring-human-bias-in-machine-learning-models/
- https://pkghosh.wordpress.com/2021/03/25/robustness-measurement-of-machine-learning-models-with-examples-in-python/
- https://pkghosh.wordpress.com/2021/05/25/data-driven-causal-relationship-discovery-with-python-example-code/
- https://pkghosh.wordpress.com/2021/07/21/duplicate-data-detection-with-neural-network-and-contrastive-learning/
- https://pkghosh.wordpress.com/2021/10/16/class-separation-based-machine-learning-model-performance-metric/
- https://pkghosh.wordpress.com/2021/11/30/machine-learning-model-performance-robustness-based-on-local-neighborhood-performance/
- https://pkghosh.wordpress.com/2021/12/30/conformal-prediction-for-a-neural-regression-model/
- https://pkghosh.wordpress.com/2022/01/26/remedial-action-recommendation-with-machine-learning-and-genetic-algorithm/
- https://pkghosh.wordpress.com/2022/02/25/out-of-distribution-data-detection-in-deployed-machine-learning-models/
- https://pkghosh.wordpress.com/2022/03/28/gig-economy-workforce-scheduling-with-reinforcement-learning/
Getting started
Project's resource directory has various tutorial documents for the use cases described in the blogs.
Configuration
All configuration parameters are described in the wiki page https://github.com/pranab/avenir/wiki/Configuration
Build
Please refer to resource/dependency.txt for build time and run time dependencies
For Hadoop 1
- mvn clean install
For Hadoop 2 (non yarn)
- git checkout nuovo
- mvn clean install
For Hadoop 2 (yarn)
- git checkout nuovo
- mvn clean install -P yarn
Help
Please feel free to email me at [email protected]
Contribution
Contributors are welcome. Please email me at [email protected]