This is a simple text classification example using Latent Semantic Analysis (LSA), written in Python and using the scikit-learn library.
This code goes along with an LSA tutorial blog post I wrote here.
Steps:
- [Optional]: Run
getReutersTextArticles.py
to download the Reuters dataset and extract the raw text. This step has already been performed for you, and the dataset is stored in the 'data' folder. - Run
runClassification_LSA.py
to apply LSA to the dataset and then test classification accuracy. - Run
inspect_LSA.py
to gain some insight into what LSA is doing.