lecture-summarizer
This project utilizes the BERT model to perform extractive text summarization on lecture transcripts. The contents of
this project include a RESTful API to serve these summaries, and a command line interface for easier interaction. You can
find more about the specs of this service and CLI in our Documentation
directory.z
Paper: https://arxiv.org/abs/1906.04165
Running the service locally
First, docker is required to run the service locally. To start the service, run the command:
make docker-build-run
On the first run of a service, this may take quite some time to complete.
Installation of CLI
The CLI tool can be downloaded using pip with the following command:
pip install git+https://github.com/dmmiller612/lecture-summarizer.git
To test the tool, try getting the current lectures in the service with the command:
lecture-summarizer get-lectures
Note, that this tool automatically uses our cloud based service by default. You can use your local service by supplying
the -base_path
option, such as -base_path localhost:5000
. As an example, to get lectures locally, you could run:
lecture-summarizer get-lectures -base-path localhost:5000
How to use CLI tool
After installing the CLI, the service should be ready to use. The lecture-summarizer uses the API service as it's backend. This backend defaults to the currently hosted one on AWS. The user can supply a specific URL if the service is hosted elsewhere. Below, briefly discusses how to use the CLI tool.
Creating a Lecture
Before one can do anything with summarizations, there needs to be at least one lecture in the system. Taking an Udacity
lecture, using the raw_lecture.txt
file at the parent of the lecture-summarizer directory as an example, one can upload
the content issuing the following command:
lecture-summarizer create-lecture -path ./raw_lecture.txt -name example_first_lecture -course IHI
Currently, the lecture-summarizer can parse sdp file formats, which are common for Udacity-based lectures. Notice that one
needs to supply a name
and a course
as metadata.
Retrieving Lectures
One can retrieve lectures with a couple of options. Those options can be found in the Documentation/CLI_Documentation.md file in the base of the repo. Some example commands are shown below:
Get a Single Lecture
lecture-summarizer get-lectures -lecture-id 1
Get All Lectures
lecture-summarizer get-lectures
Get Lectures by Name
lecture-summarizer get-lectures -name example_first_lecture
Get Lectures by Course
lecture-summarizer get-lectures -course ihi
Creating a Summary
Just like creating a lecture, creating a summary is a painless process. Below is an example of creating a summary from a specified lecture.
lecture-summarizer create-summary -lecture-id 1 -name 'my summary name' -ratio 0.2
The ratio
specifies approximately how much of the lecture that you want to summarize.
Retrieving Summaries
Just like with retrieving lectures, one can also list summaries. Below are a couple of examples:
Get a Single Summary
lecture-summarizer get-summaries -lecture-id 1 -summary-id 1
Get All Summaries
lecture-summarizer get-summaries -lecture-id 1
Get All Summaries by Name
lecture-summarizer get-summaries -lecture-id 1 -name 'my summary name'
Delete a Summary
lecture-summarizer delete-summary -lecture-id 1 -summary-id 1
RESTful API Docs
POST /lectures
This endpoint creates a lecture.
{
"course": "course identifier",
"content": "Lecture String Content",
"name": "Lecture name"
}
GET /lectures
This endpoint is used to retrieve lectures. The user can supply two query params shown below.
/lectures?course=unique_identifier
/lectures?name=course_name
GET /lectures/{id}
This endpoint is used to retrieve a single lecture
/lectures/{id}
POST /lectures/{id}/summaries
This endpoint is used to create a summarization from a lecture
{
"name": "Summarization name",
"ratio": "Ratio of sentences to select"
}
GET /lectures/{id}/summaries
/lectures/{id}/summaries?name=course_name
/lectures/{id}/summaries
GET/DELETE /lectures/{id}/summaries/{summarization_id}
This endpoint allows you to get or delete a summarization.
/lectures/{id}/summaries/{summarization_id}