• Stars
    star
    115
  • Rank 305,916 (Top 7 %)
  • Language
  • License
    Other
  • Created almost 5 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

SOVA Dataset

SOVA Dataset is free public STT/ASR dataset.

Key facts:

  • Russian, English and Chinese languages
  • ~ 32 328 hours
  • ~ 3,21 TB in .wav format

Dataset composition

Name Lang Hours Size Source Equipment Annotation Speech type Augmentation Quality
EngAudiobooksOriginal Download EN 7 130 743 Gb audiobook professional forced alignment reading none 95%
EngAudiobooksNoisy Download EN 3 873 310 Gb audiobook professional forced alignment reading phone calls 95%
RuAudiobooksDevices Download RU 298 30,24 Gb audiobook unprofessional manual reading none 99%
RuDevices Download RU 101 10,42 Gb audio records unprofessional manual live speech none 98%
RuYoutube Download RU 17 451 1 873 Gb audio records unprofessional asr live speech none 95%
ZhYoutube Download CN 3 475,1 321 Gb audio records unprofessional asr live speech none 97.83%
TOTAL - - 32 328,1 3 287,66 Gb
(3,21 TB)
- - - - - -

Audio characteristics

  • Bit rate mode: constant
  • Bit rate: 256 kbps
  • Channel(s): 1 channel
  • Sample rate: 16.0 kHz
  • Bit depth: 16 bit

Updates

Contacts

For all questions please feel free to contact us [email protected]

License

SOVA Dataset is licensed under Creative Commons BY 4.0 license by Virtual Assistant, LLC.

More Repositories

1

sova-asr

SOVA ASR (Automatic Speech Recognition)
Python
169
star
2

sova-tts

Python
154
star
3

sova-tts-engine

Tacotron2 based engine for the SOVA-TTS project
Python
66
star
4

chatKit-lib

TypeScript
59
star
5

chatKit

Open Source React Chat Widget. Ready for use and can be connected to any backend like Chatbot/NLP/Live Chat engine or messenger.
55
star
6

sova-tts-tps

NLP-preprocessor for the SOVA-TTS project
Python
51
star
7

chatKit-dl-module

CK Dialog Language module
TypeScript
41
star
8

sova-tts-vocoder

Python
39
star
9

sova-engine

Диалоговое Ядро
C++
38
star
10

sova-tts-binding

Python
38
star
11

chatKit-dialogflow-webhook

TypeScript
38
star
12

chatKit-rasa-module

CK Rasa module
TypeScript
38
star
13

chatKit-demo

Demo of CK widget with standart functionality
38
star
14

sova-dp

Dialog Processor
Shell
37
star
15

chatKit-dialogflow-module

CK Dialogflow module
TypeScript
37
star
16

sova-bls-test-widget

JavaScript
35
star
17

sova-bls-http-api

C
34
star
18

sova-bls-ext-journal

Python
34
star
19

sova-bls-core

Python
33
star
20

sova-bls-ext-context

Python
33
star
21

sova-bls-ext-preprocessor

Python
33
star
22

sova-bls-ext-postprocess

Python
32
star
23

sova-skills

SOVA Skills
22
star
24

sova-nlu

Python
5
star
25

sova-voice-studio

SOVA Voice Studio
CSS
3
star
26

sova-caller

Go
3
star
27

sova-ma-android

SOVA Mobile Application for Android
Java
3
star
28

sova-dc

SOVA DC - service for decentralized SOVA ASR/SOVA TTS computing in a distributed network
Python
3
star
29

sova-ma-ios

SOVA Mobile Application for iOS
Swift
2
star
30

sova-ide

SOVA IDE
Shell
1
star
31

sova-devkit

SOVA Developer Kit
Python
1
star