• Stars
    star
    171
  • Rank 222,266 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 6 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

My toolbox for data analysis. :)

Kuma-san's Toolkit 2020

γ€€ γ€€ γ€€β”Όβ•‚β”Ό
γ€€ γ€€ βˆ©οΌΏβ”ƒοΌΏβˆ©
    |γƒŽ      ヽ
   /   ●    ● |
  |     (_●_) γƒŸ        < There is absolutely no warranty. >
 彑`     |βˆͺ|  ο½€ο½€οΌΌ 
/ οΌΏοΌΏ   γƒ½γƒŽ /Β΄>  )
(οΌΏοΌΏοΌΏοΌ‰    / (_/

Usage

Copy all the files to your working directory via:

git clone https://github.com/analokmaus/kuma_utils.git

See tutorial notebooks below.

WIP

  1. Multi-node DDP

Environment

Create a new environment and:

pip install -r reqirements.txt

Optional requirements

xfeat

pip install -q https://github.com/pfnet-research/xfeat/archive/master.zip

Category Encoders

pip install category_encoders

PyTorch

For mixed precision training, you must install version >= 1.6.0 . Follow official instructions.

Pytorch/XLA

Follow official instructions.

japanize-matplotlib

pip install japanize-matplotlib

Directory

┣ visualization
┃   ┣ explore_data              - Simple exploratory data analysis.
┃
┣ preprocessing
┃   ┣ xfeat                     - xfeat modifications.
┃   ┃   ┣ TargetEncoder
┃   ┃   ┣ Pipeline
┃   ┣ DistTransformer           - Distribution transformer for numerical features. 
┃   ┣ LGBMImputer               - Regression imputer for missing values using LightGBM.
┃
┣ training
┃   ┣ Trainer                   - Amazing wrapper for scikit-learn API models.
┃   ┣ CrossValidator            - Amazing cross validation wrapper.
┃   ┣ LGBMLogger                - Logger callback for LightGBM/XGBoost/Optuna.
┃   ┣ StratifiedGroupKFold      - Stratified group k-fold split.
┃   ┣ optuna                    - optuna modifications.
┃       ┣ lightgbm               - Optune lightgbm integration with modifiable n_trials.
┃
┣ metrics                       - Universal metrics
┃   ┣ SeAtFixedSp               - Sensitivity at fixed specificity.
┃   ┣ RMSE
┃   ┣ AUC
┃   ┣ Accuracy
┃   ┣ QWK
┃
┣ torch
    ┣ lr_scheduler
    ┃   ┣ ManualScheduler
    ┃   ┣ CyclicCosAnnealingLR
    ┃   ┣ CyclicLinearLR
    ┃   
    ┣ optimizer
    ┃   ┣ SAM
    ┃ 
    ┣ modules
    ┃   ┃ (activation)
    ┃   ┣ Mish
    ┃   ┃ (pooling)
    ┃   ┣ AdaptiveConcatPool2d/3d
    ┃   ┣ GeM
    ┃   ┃ (attention)
    ┃   ┣ CBAM2d
    ┃   ┃ (normalization)
    ┃   ┣ GroupNorm1d/2d/3d
    ┃   ┣ convert_groupnorm     - Convert all BatchNorm to GroupNorm.
    ┃   ┣ etc...
    ┃ 
    ┣ TorchTrainer              - PyTorch Wrapper.
    ┣ EarlyStopping             - Early stopping callback for TorchTrainer.
    ┣ SaveEveryEpoch            - Save snapshot at the end of every epoch.
    ┣ SaveSnapshot              - Checkpoint callback.
    ┣ SaveAverageSnapshot       - Moving average snapshot callback.
    ┣ TorchLogger               - Logger
    ┣ TensorBoardLogger         - TensorBoard Logger
    ┣ SimpleHook                - Simple train hook for almost any tasks (see tutorial).
    ┣ TemperatureScaler         - Probability calibration for pytorch models.

Tutorial

License

The source code in this repository is released under the MIT license.