There are no reviews yet. Be the first to send feedback to the community and the maintainers!
Speech
Audio-Feature-Extraction
In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC.Speech-Synthesis-System
Language is the structural form of sharing thoughts and emotions in humans. The research motivates to stroke up for the Human-computer interaction. The overall intention of my PhD research program is focused to design Concatenation and Hidden Markov Model (HMM) based speech synthesis for the Marathi language. This will facilitate to correspond to the system and extend the technology for assertive devices based on the Marathi language. The advantage and attractive feature of the HMM system are that the voice alteration can be performed without large databases. To understand the detailed study of Synthesis techniques, I have also implemented the system for Unit Selection method. The Marathi Talking calculator is published at Play store using the technique of concatenation. This calculator performs the basic arithmetic operations and additionally speaks out the numeral in Marathi as the key is pressed. The result box synthesis the voice and speaks out the result in Marathi with correct place value of digits. The weakness of USS is it requires a large database and at joins, the quality is affected. To overcome these issues, the study reveals the built-up of a system with a phonetic based approach for Marathi Language using Concatenation and HMM.Stroke-Prediction
Machine Learning is the fastest-growing technique in many fields and the healthcare industry is no exception to this. Machine Learning algorithms plays an essential role in predicting the presence/absence of Heart diseases, tumors, and more. Such required information, if predicted well in advance, can provide important insights to doctors who can then adapt their diagnosis and treat the patient accordingly. World Health Organization has estimated 12 million deaths occur worldwide, every year due to heart diseases. Half the deaths in the United States and other developed countries are due to cardiovascular diseases. The early prognosis of stroke diseases can aid in making decisions on lifestyle changes in high-risk patients and in turn reduce the complications. If it is about to identify the relationship and factors affecting it can cured n advance time. This research intends to pinpoint the most relevant/risk factors of heart disease as well as predict the overall risk using logistic regression. In this report, I'll discuss the prediction of stroke using Machine Learning algorithms. The algorithm I have implemented is logistic regression on the HealthGammatone-like-spectrograms
Gammatone filters are a popular linear approximation to the filtering performed by the ear. This routine provides a simple wrapper for generating time-frequency surfaces based on a gammatone analysis, which can be used as a replacement for a conventional spectrogram. It also provides a fast approximation to this surface based on weighting the output of a conventional FFT.Image-Caption-using-CNNs-and-RNNs-
Image Caption Generator using CNNs and RNNs¶HTK-features-in-Python
HTK features in Python This project contains a Python implementation of the MFCC features as computed by HTK.End-to-End-Neural-Diarization
Matlab-Voice-Record-and-plot-FFT-Real-TimeSpeech-Processing-Basic-Concepts
Basic Concepts: Articulatory Phonetics – the development and classification of speech sounds; Acoustic Phonetics – the acoustics of speech production; Review of Digital Signal Processing concepts; Short-Time Fourier Transform, Filter-Bank, and LPC Methods Techniques for Speech Analysis: Features, Feature Extraction, and Pattern Comparison: Log Spectral Distance, Cepstral Distances, Weighted Cepstral Distances and Filtering, Likelihood Distortions, Spectral Distortion using a Warped Frequency Scale, LPC, PLP, and MFCC Coefficients are both statistical and perceptual speech distortion measures. Multiple Time – Alignment Paths, Dynamic Time Warping, and Time Alignment and Normalization RemarksTextPrediction
Recent Google and Facebook focused on behind-the-scenes mechanisms of text prediction. In addition to using Recurrent Neural Network and Long Short-Term Memory Networks for the motivation, there were two word2vec models for generating word embeddings also discussed.Love Open Source and this site? Check out how you can help us