• Stars
    star
    227
  • Rank 174,885 (Top 4 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created almost 7 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Python module to perform exploratory & confirmatory factor analyses.

FactorAnalyzer

Build status Code coverage Conda version PyPI version Docs Pre-commit checks

This is a Python module to perform exploratory and factor analysis (EFA), with several optional rotations. It also includes a class to perform confirmatory factor analysis (CFA), with certain pre-defined constraints. In exploratory factor analysis, factor extraction can be performed using a variety of estimation techniques. The factor_analyzer package allows users to perform EFA using either (1) a minimum residual (MINRES) solution, (2) a maximum likelihood (ML) solution, or (3) a principal factor solution. However, CFA can only be performed using an ML solution.

Both the EFA and CFA classes within this package are fully compatible with scikit-learn. Portions of this code are ported from the excellent R library psych, and the sem package provided inspiration for the CFA class.

Please see the official documentation for additional details.

Description

Exploratory factor analysis (EFA) is a statistical technique used to identify latent relationships among sets of observed variables in a dataset. In particular, EFA seeks to model a large set of observed variables as linear combinations of some smaller set of unobserved, latent factors. The matrix of weights, or factor loadings, generated from an EFA model describes the underlying relationships between each variable and the latent factors.

Confirmatory factor analysis (CFA), a closely associated technique, is used to test an a priori hypothesis about latent relationships among sets of observed variables. In CFA, the researcher specifies the expected pattern of factor loadings (and possibly other constraints), and fits a model according to this specification.

Typically, a number of factors (K) in an EFA or CFA model is selected such that it is substantially smaller than the number of variables. The factor analysis model can be estimated using a variety of standard estimation methods, including but not limited MINRES or ML.

Factor loadings are similar to standardized regression coefficients, and variables with higher loadings on a particular factor can be interpreted as explaining a larger proportion of the variation in that factor. In the case of EFA, factor loading matrices are usually rotated after the factor analysis model is estimated in order to produce a simpler, more interpretable structure to identify which variables are loading on a particular factor.

Two common types of rotations are:

  1. The varimax rotation, which rotates the factor loading matrix so as to maximize the sum of the variance of squared loadings, while preserving the orthogonality of the loading matrix.
  2. The promax rotation, a method for oblique rotation, which builds upon the varimax rotation, but ultimately allows factors to become correlated.

This package includes a factor_analyzer module with a stand-alone FactorAnalyzer class. The class includes fit() and transform() methods that enable users to perform factor analysis and score new data using the fitted factor model. Users can also perform optional rotations on a factor loading matrix using the Rotator class.

The following rotation options are available in both FactorAnalyzer and Rotator:

  1. varimax (orthogonal rotation)
  2. promax (oblique rotation)
  3. oblimin (oblique rotation)
  4. oblimax (orthogonal rotation)
  5. quartimin (oblique rotation)
  6. quartimax (orthogonal rotation)
  7. equamax (orthogonal rotation)
  8. geomin_obl (oblique rotation)
  9. geomin_ort (orthogonal rotation)

In addition, the package includes a confirmatory_factor_analyzer module with a stand-alone ConfirmatoryFactorAnalyzer class. The class includes fit() and transform() that enable users to perform confirmatory factor analysis and score new data using the fitted model. Performing CFA requires users to specify in advance a model specification with the expected factor loading relationships. This can be done using the ModelSpecificationParser class.

Note that the ConfirmatoryFactorAnalyzer class is very experimental at this point, so use it with caution, especially if your data are highly non-normal.

Examples

Exploratory factor analysis example.

In [1]: import pandas as pd
   ...: from factor_analyzer import FactorAnalyzer

In [2]: df_features = pd.read_csv('tests/data/test02.csv')

In [3]: fa = FactorAnalyzer(rotation=None)

In [4]: fa.fit(df_features)
Out[4]:
FactorAnalyzer(bounds=(0.005, 1), impute='median', is_corr_matrix=False,
               method='minres', n_factors=3, rotation=None, rotation_kwargs={},
               use_smc=True)

In [5]: fa.loadings_
Out[5]:
array([[-0.12991218,  0.16398151,  0.73823491],
       [ 0.03899558,  0.04658425,  0.01150343],
       [ 0.34874135,  0.61452341, -0.07255666],
       [ 0.45318006,  0.7192668 , -0.0754647 ],
       [ 0.36688794,  0.44377343, -0.01737066],
       [ 0.74141382, -0.15008235,  0.29977513],
       [ 0.741675  , -0.16123009, -0.20744497],
       [ 0.82910167, -0.20519428,  0.04930817],
       [ 0.76041819, -0.23768727, -0.12068582],
       [ 0.81533404, -0.12494695,  0.17639684]])

In [6]: fa.get_communalities()
Out[6]:
array([0.5887579 , 0.00382308, 0.50452402, 0.72841182, 0.33184336,
       0.66208429, 0.61911037, 0.73194557, 0.64929612, 0.71149718])

Confirmatory factor analysis example.

In [1]: import pandas as pd

In [2]: from factor_analyzer import (ConfirmatoryFactorAnalyzer,
   ...:                              ModelSpecificationParser)

In [3]: df_features = pd.read_csv('tests/data/test11.csv')

In [4]: model_dict = {"F1": ["V1", "V2", "V3", "V4"],
   ...:               "F2": ["V5", "V6", "V7", "V8"]}
In [5]: model_spec = ModelSpecificationParser.parse_model_specification_from_dict(df_features,
   ...:                                                                           model_dict)

In [6]: cfa = ConfirmatoryFactorAnalyzer(model_spec, disp=False)

In [7]: cfa.fit(df_features.values)

In [8]: cfa.loadings_
Out[8]:
array([[0.99131285, 0.        ],
       [0.46074919, 0.        ],
       [0.3502267 , 0.        ],
       [0.58331488, 0.        ],
       [0.        , 0.98621042],
       [0.        , 0.73389239],
       [0.        , 0.37602988],
       [0.        , 0.50049507]])

In [9]: cfa.factor_varcovs_
Out[9]:
array([[1.        , 0.17385704],
       [0.17385704, 1.        ]])

In [10]: cfa.transform(df_features.values)
Out[10]:
array([[-0.46852166, -1.08708035],
       [ 2.59025301,  1.20227783],
       [-0.47215977,  2.65697245],
       ...,
       [-1.5930886 , -0.91804114],
       [ 0.19430887,  0.88174818],
       [-0.27863554, -0.7695101 ]])

Requirements

  • Python 3.8 or higher
  • numpy
  • pandas
  • scipy
  • scikit-learn
  • pre-commit

Contributing

Contributions to factor_analyzer are very welcome. Please file an issue in the repository if you would like to contribute.

Installation

You can install this package via pip with:

$ pip install factor_analyzer

Alternatively, you can install via conda with:

$ conda install -c ets factor_analyzer

License

GNU General Public License (>= 2)

More Repositories

1

skll

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.
Python
550
star
2

rstfinder

Fast Discourse Parser to find latent Rhetorical STructure (RST) in text.
Python
120
star
3

metaphor

Metaphor classification for verbs and content words
67
star
4

rsmtool

A Python package to facilitate research on building and evaluating automated scoring models.
Python
65
star
5

python-zpar

A python wrapper around the ZPar parser for English.
Python
48
star
6

gitlab-to-atlassian

Scripts to help export information from GitLab to Atlassian JIRA and Stash.
Python
36
star
7

gug-data

A dataset of sentences with ordinal labels for grammaticality
27
star
8

CATS

Coherence-Aware Text Segmentation tool, used to perform text segmentation.
Python
26
star
9

TOEFL-Spell

Corpus of Annotations for Misspelings
24
star
10

match

Match tokenized words and phrases within the original, untokenized, often messy, text.
Python
20
star
11

sarcasm

shared tasks and research related to sarcasm detection
20
star
12

VAMP

Visualization and Analysis for Multimodal Presentation
Python
10
star
13

Confero

Eye-tracking, Screen and Event Capturing System for Windows. A web application running on a separate PC allows for real time monitoring of the users actions.
Python
9
star
14

ies-writing-achievement-study-data

Data from an IES research study that explores the relationship between writing achievement and success at 4-year postsecondary institutions.
8
star
15

node-zpar

A node package that allows using the ZPar English parser with node.js
JavaScript
7
star
16

simpledep

A simple example shift-reduce parser based on a perl version from Kenji Sagae
Python
5
star
17

a11yBookMarklets

HTML
5
star
18

MIRT

A general program for item-response analysis that employs the stabilized Newton-Raphson algorithm.
Fortran
3
star
19

Person_fit_analysis

R code to accompany an article published in Applied Measurement in Education
R
3
star
20

aes-book-hands-on

Supporting files for hands-on exercises from the book "Automated Essay Scoring" by Beata Beigman Klebanov & Nitin Madnani.
3
star
21

argument-component-essays

Repository based on the analysis of argument components from student essays
2
star
22

ScoreDiff

R Software for Score Differencing
R
2
star
23

workingmemory

This program tests one's working memory capacity for sequences of numbers or letters. The items in the sequence are displayed over time, which one is asked to recall in order.
JavaScript
2
star
24

graphead

HTML5 Graphing Editor
JavaScript
1
star
25

cpd

Algorithms for Conditioned Positive Definite Matrix Under Constraints
Jupyter Notebook
1
star
26

prmse-simulations

Simulations for the PRMSE automated scoring metric.
Jupyter Notebook
1
star
27

rsmtool-conda-tester

Automatically test RSMTool conda packages on Linux and Windows.
PowerShell
1
star
28

nn-compound-sentiment

Sentiment Lexicon for Noun Noun Compounds Generated via Crowdsourcing.
1
star
29

LEAF

LEAF: Language Learners’ English Essays and Feedback Corpus
1
star