• Stars
    star
    106
  • Rank 323,925 (Top 7 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created about 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Video to aligned timecode(SRT), transcription and translation in 4 clicks. ACI字幕组的自动时轴,听打,翻译工具。

autotimecode

Video to aligned timecode(SRT), transcription and translation in 4 clicks.

Minimal intrusive to your current workflow. Granular API exposure. Modularized.

Run it!

Make sure you have Docker Compose installed: refer to https://docs.docker.com/compose/install/ for instruction. Of course you also need Docker - refer to https://docs.docker.com/install/ for instructions.

Configure CELERY_BROKER_URL and MONGO_URL in environment variable, and run

docker-compose build && docker-compose up

Wait till DeepSegment is loaded! and DeepCorrect is loaded! shows up. This may take longer on CPU machines.

API documentations are located in https://github.com/cnbeining/autotimecode/blob/master/autotimecode_api/README.MD .

Recommended Subtitle Workflow

Note this workflow is based on ACICFG's recommendation: adjust to fit you needs.

  1. Get a rough version of timecode from video: covered in this project, /vad/ endpoint
  2. Transcribe the video (with help of STT). Roughly edit the SRT to include any time range that may be missing from the 1st step: Model building is NOT the target of this project - check /stt/ endpoint for voice recognition helper.
  3. From transcribed SRT, generate SRT with accurate timecode: /fa/ endpoint.
  4. Continue on translation (maybe with help of Machine Translation): check /nmt/ endpoint.

Background

This project is solving 4 problems:

  1. Given video, generate timecode on when human speech exists;
  2. Given video and timecode, transcribe the video automatically;
  3. Given rough timecode, generate accurate timecode aligned with video;
  4. Given transcription, generate translation.

FAQ

Where is Speech to Text(STT)?

A STT helper is added at /stt/ endpoint.

STT model training is out of the scope of this project, as this project is focusing on timecode generating and aligning.

Why include Kaldi and ffmpeg twice in different images?

  1. The target is that every segment of this project shall be reusable:
  2. Those 2 Kaldies are not in the same version. Same reason I passed PyKaldi.

Docker Compose is taking a minute to come up!

TensorFlow Serving does not really mix with custom Keras layers.

How can I finetune your models?

Stay tuned.

Where is Japanese/Chinese/xxxese/xxxlish support?

The authors are working hard to make it happen. Again, stay tuned!

TODO

  • Multiple language support
  • Add Google Drive support
  • Add ASS download support

Authors

The authors are member of, and acknowledge the help from ACICFG.

License

GPL 3.0. Please contact authors if you need licensing.

Please retrieve copies of licenses from respected repo links.

Gentle is written by @lowerquality, MIT license, https://github.com/lowerquality/gentle .

Kaldi is located at https://kaldi-asr.org/ , Apache 2.0 license.

ffsend.py was originally written by Robert Xiao ([email protected]), https://github.com/nneonneo/ffsend, and is licensed under the Mozilla Public License 2.0. If you have concern, remove this file and disable Firefox Send.

ffsend binary is provided by Tim Visée, https://github.com/timvisee/ffsend , GPL 3.0. If you have concern, please remove this file.

VAD engine is based on work of Hebbar, R., Somandepalli, K., & Narayanan, S. (2019). Robust Speech Activity Detection in Movie Audio: Data Resources and Experimental Evaluation. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). doi: 10.1109/icassp.2019.8682532 . Original code can be retrieved at https://github.com/usc-sail/mica-speech-activity-detection .

txt2txt, deepcorrect and deepsegment were written by Bedapudi Praneeth, https://github.com/bedapudi6788 , GPL 3.0.

STT and NMT technologies are provided by Google.

Some STT code are originally from https://github.com/agermanidis/autosub , MIT license.

More Repositories

1

onedrivecmd

A command line client for Onedrive.
Python
376
star
2

deep-learning-with-python-cn

Deep Learning with Python 中文翻译
225
star
3

Biligrab

Yet another danmaku and video file downloader of Bilibili.
Python
103
star
4

srt2bilibili

A batch poster of srt file to danmaku on Bilibili.
Python
28
star
5

Chrome-Data-Compression-Proxy-Standalone-Python

A Python wrapper of Chrome DCP.
Python
27
star
6

Amazon-Cloud-Drive-Python-SDK

A very basic Python SDK for Amazon Cloud Drive
Python
17
star
7

bilidirectuploader

Upload to Bilibili via Bilibili's internal uploading method
Python
15
star
8

DNSTester

A most easy but powerful DNS tester
Python
14
star
9

Whisper_Notebook

A Colab Notebook for OpenAI Whisper and DeepL API, aiming to create human-comparable results of translation and transcription.
Jupyter Notebook
14
star
10

Mukioplayer-Py-Mac

A Flash solution of Danmaku Playing on Mac.
Python
13
star
11

ABPlayerHTML5-Py

An HTML5 solution of danmaku playing.
JavaScript
13
star
12

bilibili-grab-silver

Auto grab Bilibili's live silver
Python
10
star
13

acupload

Yet another bash to upload to Letv via Acfun's API.
Python
9
star
14

DNSTester.js

Yet another most easy but powerful DNS tester http://www.cnbeining.com/
JavaScript
3
star
15

skreenics

Automatically exported from code.google.com/p/skreenics
Objective-C
3
star
16

ffmpeg-all-translate

ffmpeg-all-translate
Groff
3
star
17

audioblacker

Yet another video patch tool to bypass Letvcloud's encode.
Python
2
star
18

videoblacker

Patch video's both part to bypass Letvcloud's transcode.
Python
2
star
19

parallel-transcode

单机并行转码单个文件。
Python
2
star
20

ffmpeg-x264-tMod-OSX

A quick build of ffmpeg and x264-tMod on OSX.
Shell
1
star
21

YouBBS-Openshift

YouBBS-Openshift
PHP
1
star
22

brave-goggles

Goggles for Brave Search
1
star