Member-only story

Deep Learning, Natural Language Processing

Speech Emotion Recognition (SER)Using CNN And LSTMs

Emotions that are expressed through speech carry extra insights into human actions and reasoning.

Sameen

Published in

Towards AI

9 min readMar 15, 2021

Emotions are a basic part of human psychology that is translated directly into human actions. And an amazing instrument that can reflect many of these emotions is the human voice. Emotions that are expressed through speech carry extra insights into human actions and reasonings. Studying these relationships in depth can help us better understand the motives of people. Therefore, emotion recognition plays an important role in human-computer interaction.

My interest in this subject has to lead me to create a model that can help classify basic human emotions. In this article, I will share how I did that.

The model has created on an English Language dataset from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset. Based on recent studies, Mel-Spectrogram helps extract important features from audio data and those features were used in the CNN+LSTM model.

I have saved all of my code on GitHub, https://github.com/msaleem18/Speech_Emotion_Recognition

Dataset

For my model, I used the following dataset:

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)

Citing the RAVDESS The RAVDESS is released under a Creative Commons Attribution license, so please cite the RAVDESS if…

zenodo.org

In order to read and process audio data in Python, I used the Librosa library; the final data is stored as NumPy array.

import numpy as np
import pandas as pd
import librosa as lib
import librosa.displaypath = "/Users/saad/Saad/Education/Ryerson/MRP/Dataset/Audio_Speech_Actors_01-24/ALL"#READ ENGLISH FILES

files = []
modality =[]
vocal =[]
emotion =[]
intensity =[]
statement =[]
repetition =[]
actor =[]
gender = []
time = []
audio_data = []
sr = []

max_row = 0
max_col = 0
min_row = 1000
min_col = 1000

n_fft = 2048
hop_length = 512
n_mels = 200

for file_name in…