• This repository has been archived on 20/Jul/2020
  • Stars
    star
    171
  • Rank 222,266 (Top 5 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created over 7 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

autosklearn-zeroconf is a fully automated binary classifier. It is based on the AutoML challenge winner auto-sklearn. Give it a dataset with known outcomes (labels) and it returns a list of predicted outcomes for your new data. It even estimates the precision for you! The engine is tuning massively parallel ensemble of machine learning pipelines for best precision/recall.

What is autosklearn-zeroconf

The autosklearn-zeroconf file takes a dataframe of any size and trains auto-sklearn binary classifier ensemble. No configuration is needed as the name suggests. Auto-sklearn is the recent AutoML Challenge winner more @microsoft.com.

As a result of using automl-zeroconf running auto-sklearn becomes a "fire and forget" type of operation. It greatly increases the utility and decreases turnaround time for experiments.

The main value proposition is that a data analyst or a data savvy business user can quickly run the iterations on the data (actual sources and feature design) side and on the ML side not a bit has to be changed. So it's a great tool for people not doing hardcore data science full time. Up to 90% of (marketing) data analysts may fall into this target group currently.

How Does It Work

To keep the training time reasonable autosklearn-zeroconf samples the data and tests all the models from autosklearn library on it once. The results of the test (duration) is used to calculate the per_run_time_limit, time_left_for_this_task and number of seeds parameters for autosklearn. The code also converts the pandas dataframe into a form that autosklearn can handle (categorical and float datatypes).

Algoritms included

bernoulli_nb, extra_trees, gaussian_nb, adaboost, gradient_boosting, k_nearest_neighbors, lda, liblinear_svc, multinomial_nb, passive_aggressive, random_forest, sgd

plus samplers, scalers, imputers (14 feature processing methods, and 3 data preprocessing methods, giving rise to a structured hypothesis space with 100+ hyperparameters)

Running autosklearn-zeroconf

To run autosklearn-zeroconf start

python bin/zeroconf.py -d your_dataframe.h5
from command line. The script was tested on Ubuntu and RedHat. It won't work on any WindowsOS because auto-sklearn doesn't support Windows.

Data Format

The code uses a pandas dataframe format to manage the data. It is stored in the HDF5 .h5 file for convenience. (Python module "tables")

Example

As an example you can run autosklearn-zeroconf on a "Census Income" dataset https://archive.ics.uci.edu/ml/datasets/Adult.

python ./bin/zeroconf.py -d ./data/Adult.h5

And then to evaluate the prediction stored in zerconf-result.csv against the test dataset file adult.test.withid

python ./bin/evaluate-dataset-Adult.py

Installation

The script itself needs no installation, just copy it with the rest of the files in your working directory. Alternatively you could use git clone

sudo apt-get update && sudo apt-get install git && git clone https://github.com/paypal/autosklearn-zeroconf.git

Happy path installation on Ubuntu 18.04LTS

sudo apt-get update && sudo apt-get install git gcc build-essential swig python-pip virtualenv python3-dev
git clone https://github.com/paypal/autosklearn-zeroconf.git
pip install virtualenv
virtualenv zeroconf -p /usr/bin/python3.6
source zeroconf/bin/activate
curl https://raw.githubusercontent.com/paypal/autosklearn-zeroconf/master/requirements.txt | xargs -n 1 -L 1 pip install
git clone https://github.com/paypal/autosklearn-zeroconf.git
cd autosklearn-zeroconf/ && python ./bin/zeroconf.py -d ./data/Adult.h5 2>/dev/null

License

autosklearn-zeroconf is licensed under the BSD 3-Clause License (Revised)

Example of the output

python zeroconf.py -d ./data/Adult.h5 2>/dev/null | grep [ZEROCONF]

2017-10-11 10:52:15,893 - [ZEROCONF] - zeroconf.py - INFO - Program Call Parameter (Arguments and Parameter File Values):
2017-10-11 10:52:15,893 - [ZEROCONF] - zeroconf.py - INFO -    basedir: /home/ulrich/PycharmProjects/autosklearn-zeroconf
2017-10-11 10:52:15,893 - [ZEROCONF] - zeroconf.py - INFO -    data_file: /home/ulrich/PycharmProjects/autosklearn-zeroconf/data/Adult.h5
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    id_field: cust_id
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    max_classifier_time_budget: 1200
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    max_sample_size: 100000
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    memory_limit: 15000
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    parameter_file: /home/ulrich/PycharmProjects/autosklearn-zeroconf/parameter/default.yml
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    proc: zeroconf.py
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    resultfile: /home/ulrich/PycharmProjects/autosklearn-zeroconf/data/zeroconf-result.csv
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    runid: 20171011105215
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    runtype: Fresh Run Start
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    target_field: category
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    workdir: /home/ulrich/PycharmProjects/autosklearn-zeroconf/work/20171011105215
2017-10-11 10:52:15,944 - [ZEROCONF] - zeroconf.py - INFO - Read dataset from the store
2017-10-11 10:52:15,945 - [ZEROCONF] - zeroconf.py - INFO - Values of y [  0.   1.  nan]
2017-10-11 10:52:15,945 - [ZEROCONF] - zeroconf.py - INFO - We need to protect NAs in y from the prediction dataset so we convert them to -1
2017-10-11 10:52:15,946 - [ZEROCONF] - zeroconf.py - INFO - New values of y [ 0.  1. -1.]
2017-10-11 10:52:15,946 - [ZEROCONF] - zeroconf.py - INFO - Filling missing values in X with the most frequent values
2017-10-11 10:52:16,043 - [ZEROCONF] - zeroconf.py - INFO - Factorizing the X
2017-10-11 10:52:16,176 - [ZEROCONF] - x_y_dataframe_split - INFO - Dataframe split into X and y
2017-10-11 10:52:16,178 - [ZEROCONF] - zeroconf.py - INFO - Preparing a sample to measure approx classifier run time and select features
2017-10-11 10:52:16,191 - [ZEROCONF] - zeroconf.py - INFO - train size:21815
2017-10-11 10:52:16,191 - [ZEROCONF] - zeroconf.py - INFO - test size:10746
2017-10-11 10:52:16,192 - [ZEROCONF] - zeroconf.py - INFO - Reserved 33% of the training dataset for validation (upto 33k rows)
2017-10-11 10:52:16,209 - [ZEROCONF] - max_estimators_fit_duration - INFO - Constructing preprocessor pipeline and transforming sample data
2017-10-11 10:52:18,712 - [ZEROCONF] - max_estimators_fit_duration - INFO - Running estimators on the sample
2017-10-11 10:52:18,729 - [ZEROCONF] - zeroconf.py - INFO - adaboost starting
2017-10-11 10:52:18,734 - [ZEROCONF] - zeroconf.py - INFO - bernoulli_nb starting
2017-10-11 10:52:18,761 - [ZEROCONF] - zeroconf.py - INFO - extra_trees starting
2017-10-11 10:52:18,769 - [ZEROCONF] - zeroconf.py - INFO - decision_tree starting
2017-10-11 10:52:18,780 - [ZEROCONF] - zeroconf.py - INFO - gaussian_nb starting
2017-10-11 10:52:18,800 - [ZEROCONF] - zeroconf.py - INFO - bernoulli_nb training time: 0.06455278396606445
2017-10-11 10:52:18,802 - [ZEROCONF] - zeroconf.py - INFO - gradient_boosting starting
2017-10-11 10:52:18,808 - [ZEROCONF] - zeroconf.py - INFO - k_nearest_neighbors starting
2017-10-11 10:52:18,809 - [ZEROCONF] - zeroconf.py - INFO - decision_tree training time: 0.03273773193359375
2017-10-11 10:52:18,826 - [ZEROCONF] - zeroconf.py - INFO - lda starting
2017-10-11 10:52:18,845 - [ZEROCONF] - zeroconf.py - INFO - liblinear_svc starting
2017-10-11 10:52:18,867 - [ZEROCONF] - zeroconf.py - INFO - gaussian_nb training time: 0.08569979667663574
2017-10-11 10:52:18,882 - [ZEROCONF] - zeroconf.py - INFO - multinomial_nb starting
2017-10-11 10:52:18,905 - [ZEROCONF] - zeroconf.py - INFO - passive_aggressive starting
2017-10-11 10:52:18,943 - [ZEROCONF] - zeroconf.py - INFO - random_forest starting
2017-10-11 10:52:18,971 - [ZEROCONF] - zeroconf.py - INFO - sgd starting
2017-10-11 10:52:19,012 - [ZEROCONF] - zeroconf.py - INFO - lda training time: 0.17656564712524414
2017-10-11 10:52:19,023 - [ZEROCONF] - zeroconf.py - INFO - multinomial_nb training time: 0.13777780532836914
2017-10-11 10:52:19,124 - [ZEROCONF] - zeroconf.py - INFO - liblinear_svc training time: 0.27405595779418945
2017-10-11 10:52:19,416 - [ZEROCONF] - zeroconf.py - INFO - passive_aggressive training time: 0.508676290512085
2017-10-11 10:52:19,473 - [ZEROCONF] - zeroconf.py - INFO - sgd training time: 0.49777913093566895
2017-10-11 10:52:20,471 - [ZEROCONF] - zeroconf.py - INFO - adaboost training time: 1.7392246723175049
2017-10-11 10:52:20,625 - [ZEROCONF] - zeroconf.py - INFO - k_nearest_neighbors training time: 1.8141863346099854
2017-10-11 10:52:22,258 - [ZEROCONF] - zeroconf.py - INFO - extra_trees training time: 3.4934401512145996
2017-10-11 10:52:22,696 - [ZEROCONF] - zeroconf.py - INFO - random_forest training time: 3.7496204376220703
2017-10-11 10:52:24,215 - [ZEROCONF] - zeroconf.py - INFO - gradient_boosting training time: 5.41023063659668
2017-10-11 10:52:24,230 - [ZEROCONF] - max_estimators_fit_duration - INFO - Test classifier fit completed
2017-10-11 10:52:24,239 - [ZEROCONF] - zeroconf.py - INFO - per_run_time_limit=5
2017-10-11 10:52:24,239 - [ZEROCONF] - zeroconf.py - INFO - Process pool size=2
2017-10-11 10:52:24,240 - [ZEROCONF] - zeroconf.py - INFO - Starting autosklearn classifiers fiting on a 67% sample up to 67k rows
2017-10-11 10:52:24,252 - [ZEROCONF] - train_multicore - INFO - Max time allowance for a model 1 minute(s)
2017-10-11 10:52:24,252 - [ZEROCONF] - train_multicore - INFO - Overal run time is about 10 minute(s)
2017-10-11 10:52:24,255 - [ZEROCONF] - train_multicore - INFO - Multicore process 2 started
2017-10-11 10:52:24,258 - [ZEROCONF] - train_multicore - INFO - Multicore process 3 started
2017-10-11 10:52:24,276 - [ZEROCONF] - spawn_autosklearn_classifier - INFO - Start AutoSklearnClassifier seed=2
2017-10-11 10:52:24,278 - [ZEROCONF] - spawn_autosklearn_classifier - INFO - Start AutoSklearnClassifier seed=3
2017-10-11 10:52:24,295 - [ZEROCONF] - spawn_autosklearn_classifier - INFO - Done AutoSklearnClassifier seed=3
2017-10-11 10:52:24,297 - [ZEROCONF] - spawn_autosklearn_classifier - INFO - Done AutoSklearnClassifier seed=2
2017-10-11 10:52:26,299 - [ZEROCONF] - spawn_autosklearn_classifier - INFO - Starting seed=2
2017-10-11 10:52:27,298 - [ZEROCONF] - spawn_autosklearn_classifier - INFO - Starting seed=3
2017-10-11 10:56:30,949 - [ZEROCONF] - spawn_autosklearn_classifier - INFO - ####### Finished seed=2
2017-10-11 10:56:31,600 - [ZEROCONF] - spawn_autosklearn_classifier - INFO - ####### Finished seed=3
2017-10-11 10:56:31,614 - [ZEROCONF] - train_multicore - INFO - Multicore fit completed
2017-10-11 10:56:31,626 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - Building ensemble
2017-10-11 10:56:31,626 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - Done AutoSklearnClassifier - seed:1
2017-10-11 10:56:54,017 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - Ensemble built - seed:1
2017-10-11 10:56:54,017 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - Show models - seed:1
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - [(0.400000, SimpleClassificationPipeline({'classifier:__choice__': 'adaboost', 'one_hot_encoding:use_minimum_fraction': 'True', 'preprocessor:select_percentile_classification:percentile': 85.5410729966473, 'classifier:adaboost:n_estimators': 88, 'one_hot_encoding:minimum_fraction': 0.01805038589303469, 'rescaling:__choice__': 'minmax', 'balancing:strategy': 'weighting', 'preprocessor:__choice__': 'select_percentile_classification', 'classifier:adaboost:max_depth': 1, 'classifier:adaboost:learning_rate': 0.10898092508755285, 'preprocessor:select_percentile_classification:score_func': 'chi2', 'imputation:strategy': 'most_frequent', 'classifier:adaboost:algorithm': 'SAMME.R'},
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - dataset_properties={
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'task': 1,
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'signed': False,
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'sparse': False,
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'multiclass': False,
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'target_type': 'classification',
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'multilabel': False})),
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - (0.300000, SimpleClassificationPipeline({'classifier:__choice__': 'random_forest', 'classifier:random_forest:min_weight_fraction_leaf': 0.0, 'one_hot_encoding:use_minimum_fraction': 'True', 'classifier:random_forest:criterion': 'gini', 'classifier:random_forest:min_samples_leaf': 4, 'classifier:random_forest:max_depth': 'None', 'classifier:random_forest:min_samples_split': 16, 'classifier:random_forest:bootstrap': 'False', 'one_hot_encoding:minimum_fraction': 0.1453954841364777, 'rescaling:__choice__': 'none', 'balancing:strategy': 'none', 'preprocessor:__choice__': 'select_percentile_classification', 'preprocessor:select_percentile_classification:percentile': 96.35414862145892, 'preprocessor:select_percentile_classification:score_func': 'chi2', 'imputation:strategy': 'mean', 'classifier:random_forest:max_leaf_nodes': 'None', 'classifier:random_forest:max_features': 3.342759426984195, 'classifier:random_forest:n_estimators': 100},
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - dataset_properties={
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'task': 1,
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'signed': False,
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'sparse': False,
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'multiclass': False,
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'target_type': 'classification',
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'multilabel': False})),
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - (0.200000, SimpleClassificationPipeline({'classifier:extra_trees:min_weight_fraction_leaf': 0.0, 'classifier:__choice__': 'extra_trees', 'classifier:extra_trees:n_estimators': 100, 'classifier:extra_trees:bootstrap': 'True', 'preprocessor:extra_trees_preproc_for_classification:min_samples_split': 5, 'classifier:extra_trees:min_samples_leaf': 10, 'rescaling:__choice__': 'minmax', 'classifier:extra_trees:max_depth': 'None', 'preprocessor:extra_trees_preproc_for_classification:bootstrap': 'True', 'preprocessor:extra_trees_preproc_for_classification:criterion': 'gini', 'classifier:extra_trees:max_features': 4.413198608615693, 'classifier:extra_trees:criterion': 'gini', 'preprocessor:extra_trees_preproc_for_classification:n_estimators': 100, 'classifier:extra_trees:min_samples_split': 16, 'one_hot_encoding:use_minimum_fraction': 'False', 'balancing:strategy': 'weighting', 'preprocessor:__choice__': 'extra_trees_preproc_for_classification', 'preprocessor:extra_trees_preproc_for_classification:min_samples_leaf': 1, 'preprocessor:extra_trees_preproc_for_classification:max_features': 1.4824479003506632, 'imputation:strategy': 'median', 'preprocessor:extra_trees_preproc_for_classification:min_weight_fraction_leaf': 0.0, 'preprocessor:extra_trees_preproc_for_classification:max_depth': 'None'},
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - dataset_properties={
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'task': 1,
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'signed': False,
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'sparse': False,
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'multiclass': False,
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'target_type': 'classification',
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'multilabel': False})),
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - (0.100000, SimpleClassificationPipeline({'classifier:extra_trees:min_weight_fraction_leaf': 0.0, 'classifier:__choice__': 'extra_trees', 'classifier:extra_trees:n_estimators': 100, 'classifier:extra_trees:bootstrap': 'True', 'preprocessor:extra_trees_preproc_for_classification:min_samples_split': 16, 'classifier:extra_trees:min_samples_leaf': 10, 'rescaling:__choice__': 'minmax', 'classifier:extra_trees:max_depth': 'None', 'preprocessor:extra_trees_preproc_for_classification:bootstrap': 'True', 'preprocessor:extra_trees_preproc_for_classification:criterion': 'gini', 'classifier:extra_trees:max_features': 4.16852017424403, 'classifier:extra_trees:criterion': 'gini', 'preprocessor:extra_trees_preproc_for_classification:n_estimators': 100, 'classifier:extra_trees:min_samples_split': 16, 'one_hot_encoding:use_minimum_fraction': 'False', 'balancing:strategy': 'weighting', 'preprocessor:__choice__': 'extra_trees_preproc_for_classification', 'preprocessor:extra_trees_preproc_for_classification:min_samples_leaf': 1, 'preprocessor:extra_trees_preproc_for_classification:max_features': 1.5781770540350555, 'imputation:strategy': 'median', 'preprocessor:extra_trees_preproc_for_classification:min_weight_fraction_leaf': 0.0, 'preprocessor:extra_trees_preproc_for_classification:max_depth': 'None'},
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - dataset_properties={
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'task': 1,
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'signed': False,
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'sparse': False,
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'multiclass': False,
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'target_type': 'classification',
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'multilabel': False})),
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - ]
2017-10-11 10:56:54,613 - [ZEROCONF] - zeroconf.py - INFO - Validating
2017-10-11 10:56:54,613 - [ZEROCONF] - zeroconf.py - INFO - Predicting on validation set
2017-10-11 10:56:57,373 - [ZEROCONF] - zeroconf.py - INFO - ########################################################################
2017-10-11 10:56:57,374 - [ZEROCONF] - zeroconf.py - INFO - Accuracy score 84%
2017-10-11 10:56:57,374 - [ZEROCONF] - zeroconf.py - INFO - The below scores are calculated for predicting '1' category value
2017-10-11 10:56:57,379 - [ZEROCONF] - zeroconf.py - INFO - Precision: 64%, Recall: 77%, F1: 0.70
2017-10-11 10:56:57,379 - [ZEROCONF] - zeroconf.py - INFO - Confusion Matrix: https://en.wikipedia.org/wiki/Precision_and_recall
2017-10-11 10:56:57,386 - [ZEROCONF] - zeroconf.py - INFO - [7058 1100]
2017-10-11 10:56:57,386 - [ZEROCONF] - zeroconf.py - INFO - [ 603 1985]
2017-10-11 10:56:57,392 - [ZEROCONF] - zeroconf.py - INFO - Baseline 2588 positives from 10746 overall = 24.1%
2017-10-11 10:56:57,392 - [ZEROCONF] - zeroconf.py - INFO - ########################################################################
2017-10-11 10:56:57,404 - [ZEROCONF] - x_y_dataframe_split - INFO - Dataframe split into X and y
2017-10-11 10:56:57,405 - [ZEROCONF] - zeroconf.py - INFO - Re-fitting the model ensemble on full known dataset to prepare for prediciton. This can take a long time.
2017-10-11 10:58:39,836 - [ZEROCONF] - zeroconf.py - INFO - Predicting. This can take a long time for a large prediction set.
2017-10-11 10:58:45,221 - [ZEROCONF] - zeroconf.py - INFO - Prediction done
2017-10-11 10:58:45,223 - [ZEROCONF] - zeroconf.py - INFO - Exporting the data
2017-10-11 10:58:45,267 - [ZEROCONF] - zeroconf.py - INFO - ##### Zeroconf Script Completed! #####
2017-10-11 10:58:45,268 - [ZEROCONF] - zeroconf.py - INFO - Clean up / Delete work directory: /home/ulrich/PycharmProjects/autosklearn-zeroconf/work/20171011105215

Process finished with exit code 0
python evaluate-dataset-Adult.py 
[ZEROCONF]  # 00:37:43 #
[ZEROCONF] ######################################################################## # 00:37:43 #
[ZEROCONF] Accuracy score 85% # 00:37:43 #
[ZEROCONF] The below scores are calculated for predicting '1' category value # 00:37:43 #
[ZEROCONF] Precision: 65%, Recall: 78%, F1: 0.71 # 00:37:43 #
[ZEROCONF] Confusion Matrix: https://en.wikipedia.org/wiki/Precision_and_recall # 00:37:43 #
[ZEROCONF] [[10835  1600] # 00:37:43 #
[ZEROCONF]  [  860  2986]] # 00:37:43 #
[ZEROCONF] Baseline 3846 positives from 16281 overall = 23.6% # 00:37:43 #
[ZEROCONF] ######################################################################## # 00:37:43 #
[ZEROCONF]  # 00:37:43 #

Workarounds

these are not related to the autosklearn-zeroconf or auto-sklearn but rather general issues depending on your python and OS installation

xgboost issues

complains about ELF header

pip uninstall xgboost; pip install --no-cache-dir -v xgboost==0.4a30

can not find libraries

conda install libgcc # for xgboost

alternatively search for them with

sudo find / -name libgomp.so.1
/usr/lib/x86_64-linux-gnu/libgomp.so.1

and explicitly add them to the libraries path

export LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libstdc++.so.6":"/usr/lib/x86_64-linux-gnu/libgomp.so.1"; python zeroconf.py Titanic.h5 2>/dev/null|grep ZEROCONF

Also see automl/auto-sklearn#247

Install auto-sklearn

# A compiler (gcc) is needed to compile a few things the from auto-sklearn requirements.txt
# Chose just the line for your Linux flavor below

# On Ubuntu
sudo apt-get install gcc build-essential swig

# On CentOS 7-1611 http://www.osboxes.org/centos/ https://drive.google.com/file/d/0B_HAFnYs6Ur-bl8wUWZfcHVpMm8/view?usp=sharing
sudo yum -y update 
sudo reboot
sudo yum install epel-release python34 python34-devel python34-setuptools
sudo yum -y groupinstall 'Development Tools'

# auto-sklearn requires swig 3.0 
wget downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.gz -O swig-3.0.12.tar.gz
tar xf swig-3.0.12.tar.gz 
cd swig-3.0.12 
./configure --without-pcre
make
sudo make install
cd ..

sudo easy_install-3.4 pip
# if you want to use virtual environments
sudo pip3 install virtualenv
virtualenv zeroconf -p /usr/bin/python3.4
source zeroconf/bin/activate

curl https://raw.githubusercontent.com/paypal/autosklearn-zeroconf/master/requirements.txt | xargs -n 1 -L 1 pip install

Contributors

Egor Kobylkin, Ulrich Arndt

More Repositories

1

glamorous

DEPRECATED: πŸ’„ Maintainable CSS with React
JavaScript
3,640
star
2

junodb

JunoDB is PayPal's home-grown secure, consistent and highly available key-value store providing low, single digit millisecond, latency at any scale.
Go
2,565
star
3

accessible-html5-video-player

Accessible HTML5 Video Player
JavaScript
2,451
star
4

react-engine

a composite render engine for universal (isomorphic) express apps to render both plain react views and react-router views
JavaScript
1,449
star
5

squbs

Akka Streams & Akka HTTP for Large-Scale Production Deployments
Scala
1,433
star
6

PayPal-node-SDK

node.js SDK for PayPal RESTful APIs
JavaScript
1,279
star
7

paypal-checkout-components

please submit Issues about the PayPal JS SDK here: https://github.com/paypal/paypal-js/issues
JavaScript
1,270
star
8

gatt

Gatt is a Go package for building Bluetooth Low Energy peripherals
Go
1,135
star
9

PayPal-iOS-SDK

Accept credit cards and PayPal in your iOS app
Objective-C
974
star
10

gnomon

Utility to annotate console logging statements with timestamps and find slow processes
JavaScript
932
star
11

PayPal-Android-SDK

Accept PayPal and credit cards in your Android app
Java
824
star
12

bootstrap-accessibility-plugin

Accessibility Plugin for Bootstrap 3 and Bootstrap 3 as SubModule
HTML
789
star
13

PayPal-Python-SDK

Python SDK for PayPal RESTful APIs
Python
702
star
14

AATT

Automated Accessibility Testing Tool
JavaScript
601
star
15

PayPal-Ruby-SDK

Ruby SDK for PayPal RESTful APIs
Ruby
593
star
16

ipn-code-samples

PHP
561
star
17

seifnode

C++
545
star
18

PayPal-NET-SDK

.NET SDK for PayPal's RESTful APIs
C#
535
star
19

PayPal-Java-SDK

Java SDK for PayPal RESTful APIs
Java
535
star
20

data-contract-template

Template for a data contract used in a data mesh.
460
star
21

Checkout-PHP-SDK

PHP SDK for Checkout RESTful APIs
PHP
418
star
22

hera

High Efficiency Reliable Access to data stores
Go
289
star
23

SeLion

Enabling Test Automation in Java
Java
281
star
24

nemo-core

Selenium-webdriver based automation in node.js
JavaScript
261
star
25

support

An evented server framework designed for building scalable and introspectable services, built at PayPal.
Python
261
star
26

PayPal-Cordova-Plugin

PayPal SDK Cordova/Phonegap Plugin
Objective-C
248
star
27

gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage
Scala
245
star
28

scala-style-guide

Style Guidelines for PayPal Scala Applications
240
star
29

merchant-sdk-php

PHP SDK for integrating with PayPal's Express Checkout / MassPay / Web Payments Pro APIs
PHP
230
star
30

paypal-js

Loading wrapper and TypeScript types for the PayPal JS SDK
TypeScript
229
star
31

paypal-rest-api-specifications

This repository contains the specification files for PayPal REST APIs.
192
star
32

resteasy-spring-boot

RESTEasy Spring Boot Starter
Java
188
star
33

Checkout-Java-SDK

PayPal Checkout Java SDK
Java
182
star
34

skipto

SkipTo is a replacement for your old classic "Skipnav" link. Once installed on a site, the script dynamically determines the most important places on the page and presents them to the user in a drop-down menu.
HTML
152
star
35

TLS-update

Documentation & tools for the upcoming TLSv1.2 required update
Java
148
star
36

Checkout-NET-SDK

.NET SDK for Checkout RESTful APIs
C#
139
star
37

cascade

Common Libraries & Patterns for Scala Apps @ PayPal
Scala
129
star
38

merchant-sdk-ruby

Ruby
110
star
39

heap-dump-tool

Tool to sanitize data from Java heap dumps.
Java
110
star
40

NNAnalytics

NameNodeAnalytics is a self-help utility for scouting and maintaining the namespace of an HDFS instance.
Java
110
star
41

paypal-smart-payment-buttons

Smart Payment Buttons
JavaScript
108
star
42

yurita

Anomaly detection framework @ PayPal
Scala
106
star
43

InnerSourceCommons

DEPRECATED - old repo for InnerSourceCommons website. Moved to https://github.com/InnerSourceCommons/innersourcecommons.org
JavaScript
105
star
44

adaptivepayments-sdk-php

PHP SDK for integrating with PayPal's AdaptivePayments API
PHP
101
star
45

fullstack-phone

A dual-module phone number system with dynamic regional metadata ☎️
JavaScript
90
star
46

sdk-core-php

for classic PHP SDKs.
PHP
87
star
47

paypal-here-sdk-android-distribution

Add credit card (swipe & key-in) capabilities to your Android app
Java
84
star
48

merchant-sdk-dotnet

C#
83
star
49

paypal-here-sdk-ios-distribution

Add credit card (tap, insert, swipe & key-in) capabilities to your iOS app
Objective-C
82
star
50

payflow-gateway

Repository to store the Payflow Gateway and PayPal Payments Pro SDKs.
C#
80
star
51

sdk-packages

Binary packages for deprecated SDKs.
77
star
52

android-checkout-sdk

Kotlin
77
star
53

Iguanas

Iguanas is a fast, flexible and modular Python package for generating a Rules-Based System (RBS) for binary classification use cases.
Jupyter Notebook
74
star
54

paypal-android

One merchant integration point for all of PayPal's services
Kotlin
72
star
55

legalize.js

JavaScript object validation for browsers + node
JavaScript
70
star
56

paypalcheckout-ios

Need to add Native Checkout to your iOS Application? We can help!
Ruby
70
star
57

paypal-sdk-client

Shared config for PayPal/Braintree client SDKs
JavaScript
65
star
58

load-watcher

Load watcher is a cluster-wide aggregator of metrics, developed for Trimaran: Real Load Aware Scheduler in Kubernetes.
Go
63
star
59

dce-go

Docker Compose Executor to launch pod of docker containers in Apache Mesos.
Go
63
star
60

merchant-sdk-java

Java SDK for integrating with PayPal's Express Checkout / MassPay / Web Payments Pro APIs
Java
62
star
61

sdk-core-java

for classic Java SDKs.
Java
61
star
62

paypal-ios

One merchant integration point for all of PayPal's services
Swift
59
star
63

gorealis

Version 1 of a Go library for interacting with the Aurora Scheduler
Go
58
star
64

scorebot

CSS
57
star
65

PPExtensions

Set of iPython and Jupyter extensions to improve user experience
Python
50
star
66

paypal-checkout-demo

Demo app for paypal-checkout
JavaScript
49
star
67

dione

Dione - a Spark and HDFS indexing library
Scala
49
star
68

Payouts-PHP-SDK

PHP SDK for Payouts RESTful APIs
PHP
49
star
69

pdt-code-samples

Visual Basic
48
star
70

butterfly

Application transformation tool
Java
47
star
71

Payouts-NodeJS-SDK

NodeJS SDK for Payouts RESTful APIs
JavaScript
47
star
72

digraph-parser

Java parser for digraph DSL (Graphviz DOT language)
Java
44
star
73

paypalhttp_php

PHP
43
star
74

tech-talks

Place for all PayPalX presentations, tech talks, and tutorials, and the sample code and apps used in those.
ColdFusion
38
star
75

Illuminator

iOS Automator
Swift
38
star
76

paypal-sdk-release

Unified SDK wrapper module for tests, shared build config, and deploy
JavaScript
37
star
77

PayPal-REST-API-issues

Issue tracking for REST API bugs, features, and documentation requests.
37
star
78

paypal-messaging-components

PayPal JavaScript SDK - messaging components
JavaScript
37
star
79

ionet

ionet is a bridge between the Go stdlib's net and io packages
Go
37
star
80

paypal-access

Examples and code for PayPal Access
Python
36
star
81

horizon

An SBT plugin to help with building, testing, analyzing and releasing Scala
Scala
35
star
82

Payouts-Java-SDK

Java SDK for Payouts RESTful APIs
Java
35
star
83

genio

Genio is an extensible tool that can generate code to consume APIs in multiple programming languages based on different API specification formats.
Ruby
35
star
84

mirakl-hyperwallet-connector

The Hyperwallet Mirakl Connector (HMC) is a self-hosted solution that mediates between a Mirakl marketplace solution and the Hyperwallet (PayPal) payout platform.
Java
34
star
85

openapilint

Node.js linter for OpenAPI specs
JavaScript
31
star
86

paypal-sdk-constants

JavaScript
27
star
87

sdk-core-ruby

Core Library for PayPal Ruby SDKs
Ruby
27
star
88

go.crypto

Go crypto packages
Go
26
star
89

PayPal-PHP-SDK

PHP SDK for PayPal RESTful APIs
PHP
26
star
90

nemo-view

View interface for the Nemo automation framework
JavaScript
26
star
91

Gibberish-Detector-Java

A small program to detect gibberish using a Markov Chain
Java
26
star
92

nemo-accessibility

Automate Accessibility testing within your environment (Localhost)
JavaScript
25
star
93

Payouts-Python-SDK

Python SDK for Payouts RESTful APIs
Python
25
star
94

here-sideloader-api-samples

Sideloader API samples that enable to integrate PayPal Here into other apps
Objective-C
24
star
95

couchbasekafka

Couchbase Kafka Adapter
Java
24
star
96

baler

Bundle assets into iOS static libraries
Python
22
star
97

invoice-sdk-php

PHP SDK for integrating with PayPal's Invoicing API
PHP
21
star
98

Payouts-DotNet-SDK

DotNet SDK for Payouts RESTful APIs
C#
20
star
99

paypal-funding-components

PayPal JavaScript SDK Funding Components
JavaScript
20
star
100

squbs-scala-seed.g8

Scala giter8 Template for Squbs
Scala
20
star