• Stars
    star
    252
  • Rank 161,312 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created almost 5 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

pca: A Python Package for Principal Component Analysis.

Python Pypi Docs LOC Downloads Downloads License Github Forks Open Issues Project Status DOI Medium Colab GitHub repo size Donate

pca A Python Package for Principal Component Analysis. The core of PCA is build on sklearn functionality to find maximum compatibility when combining with other packages. But this package can do a lot more. Besides the regular pca, it can also perform SparsePCA, and TruncatedSVD. Depending on your input data, the best approach will be choosen.

Other functionalities of PCA are:

  • Biplot to plot the loadings
  • Determine the explained variance
  • Extract the best performing features
  • Scatter plot with the loadings
  • Outlier detection using Hotelling T2 and/or SPE/Dmodx

⭐️ Star this repo if you like it ⭐️


Read the Medium blog for more details.

1. What are PCA loadings and how to effectively use Biplots?

2. Outlier Detection Using Principal Component Analysis and Hotelling’s T2 and SPE/DmodX Methods

3. Quantitative comparisons between t-SNE, UMAP, PCA, and Other Mappings.


Documentation pages

On the documentation pages you can find detailed information about the working of the pca with many examples.


Installation

pip install pca
Import pca package
from pca import pca

Quick start Make biplot

Plot Explained variance 3D plots

Normalizing out the 1st and more components from the data. This is usefull if the data is seperated in its first component(s) by unwanted or biased variance. Such as sex or experiment location etc.

Make the biplot. It can be nicely seen that the first feature with most variance (f1), is almost horizontal in the plot, whereas the second most variance (f2) is almost vertical. This is expected because most of the variance is in f1, followed by f2 etc.

Explained variance

Biplot in 2d and 3d. Here we see the nice addition of the expected f3 in the plot in the z-direction.

biplot

biplot3d

To detect any outliers across the multi-dimensional space of PCA, the hotellings T2 test is incorporated. This basically means that we compute the chi-square tests across the top n_components (default is PC1 to PC5). It is expected that the highest variance (and thus the outliers) will be seen in the first few components because of the nature of PCA. Going deeper into PC space may therefore not required but the depth is optional. This approach results in a P-value matrix (samples x PCs) for which the P-values per sample are then combined using fishers method. This approach allows to determine outliers and the ranking of the outliers (strongest tot weak). The alpha parameter determines the detection of outliers (default: 0.05).


Citation

Please cite in your publications if this is useful for your research (see citation).

Maintainers


Support

Your ❤️ is important to keep maintaining this package. You can support in various ways, have a look at the sponser page. Report bugs, issues and feature extensions at github page.

Buy Me a Coffee at ko-fi.com

More Repositories

1

bnlearn

Python library for learning the graphical structure of Bayesian networks, parameter learning, inference and sampling methods.
Jupyter Notebook
410
star
2

distfit

distfit is a python library for probability density fitting.
Jupyter Notebook
321
star
3

findpeaks

The detection of peaks and valleys in a 1d-vector or 2d-array (image)
Python
179
star
4

d3graph

Creation of interactive networks using d3 Javascript
Jupyter Notebook
149
star
5

clustimage

clustimage is a python package for unsupervised clustering of images.
Jupyter Notebook
74
star
6

hgboost

hgboost is a python package for hyper-parameter optimization for xgboost, catboost or lightboost using cross-validation, and evaluating the results on an independent validation set. hgboost can be applied for classification and regression tasks.
Python
51
star
7

clusteval

Clusteval provides methods for unsupervised cluster validation
Jupyter Notebook
46
star
8

benfordslaw

benfordslaw is about the frequency distribution of leading digits.
Python
39
star
9

undouble

Python package undouble is to detect (near-)identical images.
Python
38
star
10

kaplanmeier

kaplanmeier is an python library to create survival curves using kaplan-meier, and compute the log-rank test.
Python
26
star
11

googletrends

Google trends is to examine trending google searches on geographical location and across time for input keywords.
Python
22
star
12

hnet

Association ruled based networks using graphical Hypergeometric Networks.
Python
21
star
13

caerus

Detection of favorable moments in time series data
Python
19
star
14

treeplot

Plot tree based machine learning models
Python
11
star
15

d3heatmap

d3heatmap is a Python package to create interactive heatmaps based on d3js.
HTML
9
star
16

flameplot

flameplot is a python package for the quantification of local similarity across two maps or embeddings.
Python
8
star
17

worldmap

This python package enables to color different countries in the world or the regions per country.
Python
7
star
18

ismember

ismember
Python
7
star
19

scatterd

Scatterd is a Python package for easy and fast creation of beautiful scatter plots.
Python
7
star
20

classeval

Evaluation of supervised predictions for two-class and multi-class classifiers
Python
5
star
21

imagesc

Make quick and beautiful heatmaps
Python
4
star
22

df2onehot

Convert a unstructured array into a stuctured dataframe.
Python
3
star
23

colourmap

Colourmap generates an unique lit of RGB and HEX colors for the specified input list
Python
3
star
24

datazets

Datazets is a python package to retrieve example data sets.
Python
3
star
25

pypickle

pypickle is for saving and loading files in pickle format.
Python
2
star
26

irelease

Library that automates releasing your Github python package at Pypi.
Python
2
star
27

thompson

Thompson is Python package to evaluate the multi-armed bandit problem. In addition to thompson, Upper Confidence Bound (UCB) algorithm, and randomized results are also implemented.
Python
2
star
28

dicter

Python package with advanced dictionary functions. Traverse through nested dicts. Set and get multiple keys. Flattens dicts. Store and load in json and more!
Python
2
star
29

relevantpackage

Example of a Python Package
Python
1
star
30

bnclassify

bnlearn
Python
1
star
31

d3plus

d3plus
Python
1
star