• Stars
    star
    148
  • Rank 249,983 (Top 5 %)
  • Language
    Python
  • Created over 5 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Lecture summarization with BERT

lecture-summarizer

This project utilizes the BERT model to perform extractive text summarization on lecture transcripts. The contents of this project include a RESTful API to serve these summaries, and a command line interface for easier interaction. You can find more about the specs of this service and CLI in our Documentation directory.z

Paper: https://arxiv.org/abs/1906.04165

Running the service locally

First, docker is required to run the service locally. To start the service, run the command:

make docker-build-run

On the first run of a service, this may take quite some time to complete.

Installation of CLI

The CLI tool can be downloaded using pip with the following command:

pip install git+https://github.com/dmmiller612/lecture-summarizer.git

To test the tool, try getting the current lectures in the service with the command:

lecture-summarizer get-lectures

Note, that this tool automatically uses our cloud based service by default. You can use your local service by supplying the -base_path option, such as -base_path localhost:5000. As an example, to get lectures locally, you could run:

lecture-summarizer get-lectures -base-path localhost:5000

How to use CLI tool

After installing the CLI, the service should be ready to use. The lecture-summarizer uses the API service as it's backend. This backend defaults to the currently hosted one on AWS. The user can supply a specific URL if the service is hosted elsewhere. Below, briefly discusses how to use the CLI tool.

Creating a Lecture

Before one can do anything with summarizations, there needs to be at least one lecture in the system. Taking an Udacity lecture, using the raw_lecture.txt file at the parent of the lecture-summarizer directory as an example, one can upload the content issuing the following command:

lecture-summarizer create-lecture -path ./raw_lecture.txt -name example_first_lecture -course IHI

Currently, the lecture-summarizer can parse sdp file formats, which are common for Udacity-based lectures. Notice that one needs to supply a name and a course as metadata.

Retrieving Lectures

One can retrieve lectures with a couple of options. Those options can be found in the Documentation/CLI_Documentation.md file in the base of the repo. Some example commands are shown below:

Get a Single Lecture
lecture-summarizer get-lectures -lecture-id 1
Get All Lectures
lecture-summarizer get-lectures
Get Lectures by Name
lecture-summarizer get-lectures -name example_first_lecture
Get Lectures by Course
lecture-summarizer get-lectures -course ihi

Creating a Summary

Just like creating a lecture, creating a summary is a painless process. Below is an example of creating a summary from a specified lecture.

lecture-summarizer create-summary -lecture-id 1 -name 'my summary name' -ratio 0.2

The ratio specifies approximately how much of the lecture that you want to summarize.

Retrieving Summaries

Just like with retrieving lectures, one can also list summaries. Below are a couple of examples:

Get a Single Summary
lecture-summarizer get-summaries -lecture-id 1 -summary-id 1
Get All Summaries
lecture-summarizer get-summaries -lecture-id 1
Get All Summaries by Name
lecture-summarizer get-summaries -lecture-id 1 -name 'my summary name'
Delete a Summary
lecture-summarizer delete-summary -lecture-id 1 -summary-id 1

RESTful API Docs

POST /lectures

This endpoint creates a lecture.

{
  "course": "course identifier",
  "content": "Lecture String Content",
  "name": "Lecture name"
}
GET /lectures

This endpoint is used to retrieve lectures. The user can supply two query params shown below.

/lectures?course=unique_identifier
/lectures?name=course_name
GET /lectures/{id}

This endpoint is used to retrieve a single lecture

/lectures/{id}
POST /lectures/{id}/summaries

This endpoint is used to create a summarization from a lecture

{
  "name": "Summarization name",
  "ratio": "Ratio of sentences to select"
}
GET /lectures/{id}/summaries
/lectures/{id}/summaries?name=course_name
/lectures/{id}/summaries
GET/DELETE /lectures/{id}/summaries/{summarization_id}

This endpoint allows you to get or delete a summarization.

/lectures/{id}/summaries/{summarization_id}