• Stars
    star
    1,245
  • Rank 37,740 (Top 0.8 %)
  • Language
    Python
  • License
    MIT License
  • Created about 8 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

๐Ÿ‘‘ Multivariate exploratory data analysis in Python โ€” PCA, CA, MCA, MFA, FAMD, GPA
prince_logo


Prince is a Python library for multivariate exploratory data analysis in Python. It includes a variety of methods for summarizing tabular data, including principal component analysis (PCA) and correspondence analysis (CA). Prince provides efficient implementations, using a scikit-learn API.

Example usage

>>> import prince

>>> dataset = prince.datasets.load_decathlon()
>>> decastar = dataset.query('competition == "Decastar"')

>>> pca = prince.PCA(n_components=5)
>>> pca = pca.fit(decastar, supplementary_columns=['rank', 'points'])
>>> pca.eigenvalues_summary
          eigenvalue % of variance % of variance (cumulative)
component
0              3.114        31.14%                     31.14%
1              2.027        20.27%                     51.41%
2              1.390        13.90%                     65.31%
3              1.321        13.21%                     78.52%
4              0.861         8.61%                     87.13%

>>> pca.transform(dataset).tail()
component                       0         1         2         3         4
competition athlete
OlympicG    Lorenzo      2.070933  1.545461 -1.272104 -0.215067 -0.515746
            Karlivans    1.321239  1.318348  0.138303 -0.175566 -1.484658
            Korkizoglou -0.756226 -1.975769  0.701975 -0.642077 -2.621566
            Uldal        1.905276 -0.062984 -0.370408 -0.007944 -2.040579
            Casarsa      2.282575 -2.150282  2.601953  1.196523 -3.571794
>>> chart = pca.plot(dataset)

This chart is interactive, which doesn't show on GitHub. The green points are the column loadings.

Installation

pip install prince

๐ŸŽจ Prince uses Altair for making charts.

Methods

flowchart TD
    cat?(Categorical data?) --> |"โœ…"| num_too?(Numerical data too?)
    num_too? --> |"โœ…"| FAMD
    num_too? --> |"โŒ"| multiple_cat?(More than two columns?)
    multiple_cat? --> |"โœ…"| MCA
    multiple_cat? --> |"โŒ"| CA
    cat? --> |"โŒ"| groups?(Groups of columns?)
    groups? --> |"โœ…"| MFA
    groups? --> |"โŒ"| shapes?(Analysing shapes?)
    shapes? --> |"โœ…"| GPA
    shapes? --> |"โŒ"| PCA

Principal component analysis (PCA)

Correspondence analysis (CA)

Multiple correspondence analysis (MCA)

Multiple factor analysis (MFA)

Factor analysis of mixed data (FAMD)

Generalized procrustes analysis (GPA)

Correctness

Prince is tested against scikit-learn and FactoMineR. For the latter, rpy2 is used to run code in R, and convert the results to Python, which allows running automated tests. See more in the tests directory.

Citation

Please use this citation if you use this software as part of a scientific publication.

@software{Halford_Prince,
    author = {Halford, Max},
    license = {MIT},
    title = {{Prince}},
    url = {https://github.com/MaxHalford/prince}
}

Support

I made Prince when I was at university, back in 2016. I've had very little time over the years to maintain this package. I spent a significant amount of time in 2022 to revamp the entire package. Prince has now been downloaded over 1 million times. I would be grateful to anyone willing to sponsor me. Sponsorships allow me to spend more time working on open source software, including Prince.

License

The MIT License (MIT). Please see the license file for more information.

More Repositories

1

eaopt

๐Ÿ€ Evolutionary optimization library for Go (genetic algorithm, partical swarm optimization, differential evolution)
Go
881
star
2

xam

๐ŸŽฏ Personal data science and machine learning toolbox
Python
362
star
3

flask-boilerplate

๐Ÿš€ Fully fledged Flask boilerplate code
Python
354
star
4

chime

๐ŸŽต Python sound notifications made easy
Python
294
star
5

sorobn

๐Ÿงฎ Bayesian networks in Python
Python
234
star
6

kaggle-recruit-restaurant

๐Ÿ† Kaggle 8th place solution
Jupyter Notebook
106
star
7

procedural-art

๐ŸŒŒ Procedural art with vanilla JavaScript
HTML
96
star
8

pytorch-resample

๐ŸŽฒ Iterable dataset resampling in PyTorch
Python
89
star
9

xgp

๐Ÿ”ฎ Symbolic regression library
Go
61
star
10

flask-sse-no-deps

An example of server-sent events in Flask without extra dependencies
Python
59
star
11

clavier

๐Ÿ”ค Measure edit distance based on keyboard layout
Python
58
star
12

halfgone

๐Ÿ”ณ Black and white digital halftoning
Go
47
star
13

taxi-demo-rp-mz-rv-rd-st

๐Ÿš• Self-contained demo using Redpanda, Materialize, River, Redis, and Streamlit to predict taxi trip durations
Python
44
star
14

pointu

โœ๏ธ Pointillisme tool based on Weighted Voronoi Stippling
Go
37
star
15

carre

๐Ÿ‘Œ Image simplifier
Go
33
star
16

kaggle-vsb-power

โšก 13th place solution
Jupyter Notebook
31
star
17

arcgonaut

๐ŸŒ€ Golang arc diagrams
Go
29
star
18

eaopt-examples

๐Ÿ€ eaopt examples
Go
28
star
19

starboost

โญ๐Ÿš€ Gradient boosting on steroids
Python
26
star
20

naked

The simplest way to deploy a machine learning model
Python
23
star
21

idao-2020-qualifier

Solution of team "Data O Plomo" to the qualification phase of the 2020 edition of the International Data Analysis Olympiad (IDAO)
Jupyter Notebook
18
star
22

genetic-curve-fitting

๐Ÿ“ˆ
Python
17
star
23

orc

๐ŸงŒ Parsing structured information from OCR outputs
Jupyter Notebook
17
star
24

myriade

โœจ๐ŸŒฒ Hierarchical extreme multiclass and multi-label classification.
Python
16
star
25

bike-sharing-history

๐Ÿšฒ Git scraping for bike sharing APIs
Python
16
star
26

data-science-tutorials

Jupyter Notebook
15
star
27

maxhalford.github.io

๐Ÿก Personal website
HTML
13
star
28

tuna

๐ŸŸ A streaming ETL for fish
Go
13
star
29

bbc-weather-honolulu

โ˜€๏ธ Measuring the accuracy of BBC weather forecasts in Honolulu, USA
Python
12
star
30

yamp

Yet Another MkDocs Parser
Python
11
star
31

tartine

๐Ÿž Manipulate dynamic spreadsheets with arbitrary layouts using Python
Python
11
star
32

spotgeo-challenge

๐Ÿ›ฐ๏ธ My solution to the Kelvins spotGEO challenge
Python
10
star
33

openbikes

๐Ÿšฒ Collecting and publishing bike sharing data stored at https://github.com/MaxHalford/openbikes-data
Python
9
star
34

gago

Old version of eaopt, will eventually be removed
Go
9
star
35

project-euler-python

๐Ÿ
Python
9
star
36

ikea-store-locations

๐Ÿ‡ธ๐Ÿ‡ช Retrieval and analysis of IKEA store locations
Python
9
star
37

directory-architecture

๐Ÿ“ Mimicking the tree command
Python
8
star
38

xgp-python

XGP Python package with a scikit-learn interface
Python
8
star
39

idao-2020-final

Solution of team "Data O Plomo" to the final phase of the 2020 edition of the International Data Analysis Olympiad (IDAO)
Jupyter Notebook
7
star
40

svg2stl

๐Ÿ›น Turn an SVG into an STL for stencil creation purposes
Python
6
star
41

inverted-index-search-engine

Python
5
star
42

streaming-cdf-benchmark

A benchmark to compare algorithms for estimating cumulative density functions (CDF) on streaming data
Python
5
star
43

vose

Cython implementation of Vose's Alias method
Cython
5
star
44

jan

๐Ÿ’ค Just Another Neural network
Python
5
star
45

bitcoin-analysis-m1sid

๐Ÿ’ฐ Master 1 project
Python
5
star
46

kaggle-march-madness-2019

๐Ÿ€ Men and women solutions for the 2019 edition of the Kaggle March Madness competition
Jupyter Notebook
4
star
47

kaggle-DSG18-qualifier

Jupyter Notebook
3
star
48

kaggle-DSG17-qualifier

Python
3
star
49

kaggle-plasticc-astro-classification

Jupyter Notebook
3
star
50

andor-faq-llm

๐ŸŽฒ Answering tabletop game questions using an LLM
Python
2
star
51

kaggle-answer-correctness

๐Ÿค” Solution to the Riiid! Answer Correctness Prediction competition on Kaggle
Python
2
star
52

postgres-job-docker

๐Ÿณ Docker setup for PostgreSQL + Join Order Benchmark (JOB)
Shell
2
star
53

dotfiles

๐Ÿง˜ Because it's the healthy thing to do
Shell
2
star
54

ziboinboin.com

๐Ÿ‚ Old Ziboinboin website
HTML
1
star
55

openbikes-data

๐Ÿšฒ Git storage for https://github.com/MaxHalford/openbikes
1
star
56

where-to-live

Jupyter Notebook
1
star
57

cochleas-L3SID

Python
1
star
58

chrome-infinite-scrolling-robot

JavaScript
1
star
59

kaggle-avito-demand

Python
1
star
60

tldks-2020

Jupyter Notebook
1
star