• Stars
    star
    287
  • Rank 139,401 (Top 3 %)
  • Language
    Go
  • License
    MIT License
  • Created almost 2 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A course catalog with extremely fast full-text search

classes.wtf

I just want to take a class about [X] but searching the online catalog is so slow, and my results are largely irrelevant. WTF?

Harvard has many course search websites, but none of them are good. This project is an attempt to take the problem more seriously: write high-performance software and set great defaults so that people can get better, more useful suggestions, 100x faster.

How does it work?

Classes.wtf is a custom, distributed search engine written in Go that focuses on speed and quality of results. It's built on an in-memory Redis database that runs as a subprocess of the application. This index supports full-text fuzzy and prefix search on all fields, along with a rich query syntax.

The frontend is a static website built with Svelte, and it processes search queries immediately after every keystroke. The goal is for the entire {request, computation, response, and render} pipeline to take under 30 milliseconds.

"Now hang on just a second," I hear you saying. The speed of light is not fast enough for data to travel around the world at this latency! But don't worry, this is fine. We run multiple replicas at geographically distributed locations using Fly.io and route requests to the nearest one. Each replica runs its own full-text query engine, so they are completely independent.

(The nearest server replica to Cambridge, MA lives in Secaucus, NJ, only 200 miles away.)

FAQ

Why did you make this? I was frustrated by how annoying it was to search for classes. And I'm a systems software engineer, which pretty much makes it my mandate to make things faster.

Why is it written in Go? Because I wrote this in a weekend and needed a really fast systems language to iterate on while also having low latency. Go's simplicity and compile times helped with this. I might rewrite it in Rust if I decide to spend a couple more weeks on it.

Why are you using Redis? It's really fast, it stores data in memory, the API is simple and robust, and it has a best-in-class full-text search module. For this size of dataset, embedding Redis gives you unmatched performance with a fraction of the cost and effort of alternatives.

Can you make this for my school? The code is all open-source, and you're welcome to take a look or port it! If you're doing this please also consider reaching out on Twitter @ekzhang1 or by email, since I'd love to hear about your work.

Where is the data sourced? The course catalog was indexed from publicly available course titles and descriptions online. See the code in the datasource/ folder.

Development

You need Go 1.20 and Docker to work on the backend and Node.js v18 for the frontend.

Downloading the dataset

This loads data from Curricle for academic terms before Spring 2022 (AY 2022) and from My.Harvard starting in Fall 2022 (AY 2023). You can customize the data loading script if you'd like to index a different set of courses.

go run . download -year 2019  # -> data/courses-2019.json
go run . download -year 2020  # -> data/courses-2020.json
# ... and so on
go run . download -year 2024  # -> data/courses-2024.json

Unfortunately, My.Harvard does not allow you to view courses from previous academic years, so years between 2023 and the current one will probably not return any data. For those, you can download the appropriate preloaded datasets from our public S3 bucket.

Historical notes about preloaded data:

  • We don't have data on divisional distributions for AY 2023, from Fall 2022 to Spring 2023.
  • Some of the courses in Spring 2023 are missing because they were added after we initially indexed them from the public data source, and we have no way of backfilling.

Combining data

Once you have the year-by-year course data, you can combine them to form a single courses.json file with all of the courses, which can be searched by the webapp.

go run . combine

This looks for all files named data/courses-{year}.json and merges them.

You can also do the inverse, splitting a single data/courses.json into multiple data/courses-{year}.json.

go run . split

Running the server

The server listens for web requests on port 7500. (It also spawns a Redis instance, using Docker, on port 7501.)

go run . server -local -data data/courses.json

You can also run it with other data files. For example, if you pass data/courses-2021.json, you'll only get search results for the academic year from Fall 2020 to Spring 2021.

Now you can develop on the frontend, which automatically proxies API requests to the server port.

npm install
npm run dev

Visit localhost:5173 to see the website.

Building a container

docker build -t classes.wtf .
docker run -it --rm -p 7500:7500 classes.wtf

Deployment

aws s3 cp data/courses-$YEAR.json s3://classes.wtf
aws s3 cp data/courses.json s3://classes.wtf
fly deploy

Acknowledgements

See the contributors page. Current maintainers can be reached by email at [email protected]. Licensed under the MIT license.

Thanks to numerous students who helped advertise the site in college communities.

More Repositories

1

bore

๐Ÿ•ณ bore is a simple CLI tool for making tunnels to localhost
Rust
7,500
star
2

sshx

Fast, collaborative live terminal sharing over the web
Rust
4,118
star
3

rustpad

Efficient and minimal collaborative code editor, self-hosted, no database required
Rust
3,011
star
4

graphics-workshop

Learn computer graphics by writing GPU shaders!
GLSL
1,999
star
5

percival

๐Ÿ“ Web-based, reactive Datalog notebooks for data analysis and visualization
Rust
575
star
6

setwithfriends

๐ŸŽฎ A frictionless multiplayer web app that lets you play Set with friends
JavaScript
523
star
7

composing.studio

Collaborative music composition for everyone.
TypeScript
518
star
8

crepe

Datalog compiler embedded in Rust as a procedural macro
Rust
420
star
9

inline-sql

๐Ÿช„ Inline SQL in any Python program
Python
407
star
10

rpt

A physically-based path tracer
Rust
404
star
11

fastseg

๐Ÿ“ธ PyTorch implementation of MobileNetV3 for real-time semantic segmentation, with pretrained weights & state-of-the-art performance
Python
324
star
12

redis-rope

๐Ÿชข A fast native data type for manipulating large strings in Redis
Zig
110
star
13

ukanren-rs

Rust implementation of ยตKanren, a featherweight relational programming language.
Rust
104
star
14

rushlight

Real-time collaborative code editing on your own infrastructure
TypeScript
95
star
15

dispict

Design a growing artistic exhibit of your own making, with semantic search powered by OpenAI CLIP
Svelte
62
star
16

library

Advanced algorithm and data structure library in C++
C++
55
star
17

char-rnn-keras

TensorFlow implementation of multi-layer recurrent neural networks for training and sampling from texts
Python
42
star
18

harmony

๐ŸŽถ Generate four-part harmony following idiomatic voice-leading procedures with DP!
Python
42
star
19

ekzhang.github.io

Source code for my personal website
Svelte
27
star
20

wkspace

Competitive programming workspace in the cloud, with support for running and testing code
JavaScript
24
star
21

vae-cnn-mnist

Conditional variational autoencoder applied to EMNIST + an interactive demo to explore the latent space.
Jupyter Notebook
22
star
22

game-of-life

Conway's Game of Life simulator running in the browser, based on the HashLife algorithm (quadtrees + memoization)
Vue
21
star
23

aoc23-alpha

Advent of Code 2023 in 25 interesting language specimens, A-Z
Erlang
20
star
24

ekzhang.sty

My personal LaTeX template, with sensible formatting and commands
TeX
15
star
25

aoc21-alpha

Advent of Code 2021 in 25 different languages, alphabet soup edition
Crystal
13
star
26

sketching

Geometry processing for real-time pencil sketching
JavaScript
10
star
27

langevin-music

Noise-conditional score networks for music composition by annealed Langevin dynamics
Python
8
star
28

music-gen

Generate and play music with a recurrent neural network running in the browser!
JavaScript
7
star
29

cs262

Solutions to introductory distributed computing exercises
Rust
6
star
30

webgl-julia-viewer

Real-time Julia Set renderer right in your browser, accelerated with WebGL
TypeScript
6
star
31

warp-pastebin

Pastebin demo app, powered by warp
Rust
5
star
32

market-game

Webapp for running estimation markets
JavaScript
5
star
33

triangulate

Fast polygon triangulation in C++, compiled to WebAssembly with Emscripten
C++
4
star
34

julia-fractal

A multithreaded Julia fractal image plotter in C++.
C++
4
star
35

hydroelastics

Efficient contact dynamics simulation using a hydroelastic pressure field model
Julia
4
star
36

archax

Experiments in multi-architecture parallelism for deep learning with JAX
Python
3
star
37

ekzlib

Source code for the ekzlib website
TypeScript
2
star
38

gravity

JS canvas universal gravity simulator
JavaScript
2
star
39

gha-cross-rs

Fast Rust cross-compilation for every platform in GitHub Actions
Rust
2
star
40

zola-blog-starter

HTML
2
star
41

julia-viewer

Java
1
star
42

chess-aops

Holds code for chess in the AoPS Classroom
JavaScript
1
star
43

inflatable

Code for the paper "Limit Densities of Patterns in Permutation Inflations"
Python
1
star
44

js-games

Collection of browser-based games for demo purposes, all <200 lines of code
JavaScript
1
star
45

homebrew-bore

Deprecated in favor of official Homebrew Core formula for bore
Ruby
1
star
46

super-tictactoe

Super Tic-Tac-Toe: web interface and Monte Carlo tree search (MCTS) algorithm
JavaScript
1
star
47

warp-react-template

Warp + React + ๐Ÿณ
JavaScript
1
star