• Stars
    star
    324
  • Rank 128,974 (Top 3 %)
  • Language
    Jupyter Notebook
  • Created almost 10 years ago
  • Updated almost 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Speech Recognition with the Caffe deep learning framework, migrating to

Speech Recognition with BVLC caffe

Speech Recognition with the caffe deep learning framework

UPDATE: We are migrating to tensorflow

This project is quite fresh and only the first of three milestones is accomplished: Even now it might be useful if you just want to train a handful of commands/options (1,2,3..yes/no/cancel/...)

  1. training spoken numbers:
  • get spectogram training images from http://pannous.net/spoken_numbers.tar (470 MB)
  • start ./train.sh
  • test with ipython notebook test-speech-recognition.ipynb or caffe test ... or <caffe-root>/python/classify.py
  • 99% accuracy, nice!
  • online recognition and learning with ./recognition-server.py and ./record.py scripts

Sample spectrogram, That's what she said, too laid?

Sample spectrogram, Karen uttering 'zero' with 160 words per minute.

  1. training words:
  • 4GB of training data
  • net topology: work in progress ...
  • todo: use upcoming new caffe LSTM layers etc
  • UPDATE LSTMs get rolling, still not merged
  • UPDATE since the caffe project leaders have a hindering merging policy and this pull request was shifted many times without ever being merged, we are migrating to tensorflow
  • todo: add extra categories for a) silence b) common noises like typing, achoo c) ALL other noises
  1. training speech:

Theoretical background: papers

A. Graves and N. Jaitly. Towards end-to-end speech recognition with recurrent neural networks. In ICML, 2014

O. Vinyals, S. V. Ravuri, and D. Povey. Revisiting recurrent neural networks for robust ASR. In ICASSP, 2012

Andrew Ng et al / Baidu

Hinton et al / Toronto

good old Hinton

Schmidhuber et al using new 'ClockWork-RNNs'

The book: Automatic Speech Recognition: A Deep Learning Approach (Signals and Communication Technology) Hardcover – November 11, 2014 by Dong Yu (Author) and Li Deng (Author)

Related work

Also see the Kaldi project, which seems a bit messy but already uses deep learning with LSTM Another experimental LSTM network, which works out-of-the-box: Currennt

More Repositories

1

tensorflow-speech-recognition

πŸŽ™Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
Python
2,161
star
2

tensorflow-ocr

πŸ–Ί OCR using tensorflow with attention
Python
647
star
3

caffe-ocr

OCR with caffe deep learning framework -> Migrated to tensorflow
Shell
215
star
4

english-script

πŸ–Š οΌ₯ο½Žο½‡ο½Œο½‰ο½“ο½ˆ as a programming language
Ruby
161
star
5

angle

β¦  Angle: new speakable syntax for python πŸ’‘
Python
128
star
6

wasp

🐝 Wasp : Wasm programming language
C++
109
star
7

tensorpeers

p2p peer-to-peer training of tensorflow models
Python
63
star
8

xipher

πŸ”’ Simple perfect xor encryption cipher πŸ”’
C
62
star
9

jini-plugin

Copilot X like features for Jetbrains IDEs using ChatGpt and GPT-4
Java
37
star
10

hieros

Egyptian hieroglyps and Eurasian languages
HTML
23
star
11

Diffie-Hellman

Standalone Java reference implementation of Diffie Hellman
Java
21
star
12

jeannie-webclient

The famous Jeannie assistant now living inside of your browser, including emails, calls etc! #siri
JavaScript
21
star
13

Voice-Actions-API

Public API hook for Voice Actions Plus / Jeannie
Java
10
star
14

wasm

You are looking for webassembly
8
star
15

swadesh

Collection of swadesh lists in CSV table format with possible connections to Indo European
Roff
7
star
16

netbase

🌐 Netbase : Semantic Graph Database & Wikidata Server
C++
7
star
17

tensor-caffe

TensorFlow graph importer from caffe protobuf / prototxt
Protocol Buffer
7
star
18

karpathy_neuralnets_python

Python code accompanying Andrej Karpathy's [great] Hacker's guide to Neural Networks
Python
6
star
19

layer

tensorflow custom comfort wrapper
Python
6
star
20

Voice2Web

JavaScript
4
star
21

node-netbase

node.js module for netbase: a semantic Graph Database with wordnet, wikidata, freebase, csv, xml, ... importer
JavaScript
4
star
22

kast

Canonical AST, the only Abstract Syntax Tree you need, with importers+exporters to all languages
Python
4
star
23

english-script-samples

English script samples for the angle programming language
E
2
star
24

blueprints-netbase

blueprints driver for the netbase graph database
Java
2
star
25

speech-data

2
star
26

shapenet

Deep Learning network learning geometric shapes
Python
2
star
27

angle.js

javascript version of the angle programming language
JavaScript
2
star
28

concentration-camps

Close Korean concentration camps
1
star
29

angle.ts

Angle programming language - Typescript implementation and bindings
TypeScript
1
star
30

lyri

Lyri: a Siri-like assistant for your Terminal command line (Linux and Mac OS/X)
Swift
1
star
31

test-lld-wasm

Minimal example of how to merge/concat/link/combine two wasm files.
Shell
1
star
32

netbase-ruby

ruby adapter for netbase
Ruby
1
star