chrisjmccormick/LSA_Classification

Stars
104
Rank 330,604 (Top 7 %)
Language
Python
Created over 8 years ago
Updated over 6 years ago

chrisjmccormick/LSA_Classification

chrisjmccormick

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Text classification example in Python using Latent Semantic Analysis (LSA)

This is a simple text classification example using Latent Semantic Analysis (LSA), written in Python and using the scikit-learn library.

This code goes along with an LSA tutorial blog post I wrote here.

Steps:

[Optional]: Run getReutersTextArticles.py to download the Reuters dataset and extract the raw text. This step has already been performed for you, and the dataset is stored in the 'data' folder.
Run runClassification_LSA.py to apply LSA to the dataset and then test classification accuracy.
Run inspect_LSA.py to gain some insight into what LSA is doing.

word2vec_commented

Commented (but unaltered) version of original word2vec C implementation.

inspect_word2vec

Python code for checking out Google's pre-trained, 3M word Word2Vec model

MinHash

Example Python code for comparing documents using MinHash

dbscan

A simple implementation of DBSCAN in Python

wiki-sim-search

Similarity search on Wikipedia using gensim in Python.

hog_matlab

Matlab implementation of the HOG descriptor for pedestrian detection

simsearch

Python tools for performing similarity searches on text documents.

word2vec_matlab

mlpack-examples

Some ready-to-run C++ examples for mlpack

kfold_cv

k-Fold Cross-Validation in Matlab

summarize-long-pdfs

Long Document Summarization with ChatGPT

Jupyter Notebook

rbfn_matlab

Radial Basis Function Network implementation in Matlab

brute_knn_benchmarks

Performance measurements on brute-force k-NN implementations on GPU and CPU

llm-tuning-examples

A collection of fine-tuning examples from others, with my comments added.

Jupyter Notebook

personal-site

My personal site covering all my hobbies and interests outside of work