DeepSpeech is a speech to text (STT) or automatic speech recognition (ASR) engine developed by Mozilla. It allows recognizing a speech and convert spoken words into text. DeepSpeech is an open-source and deep learning based ASR engine that uses TensorFlow for implementation.
This tutorial provides example how to use DeepSpeech to convert speech to text from WAV audio file.
Using pip
package manager, install deepspeech
from the command line.
pip install deepspeech
DeepSpeech offers pre-trained models for American English. Download model (deepspeech-X.Y.Z-models.pbmm
) from releases page of the mozilla/DeepSpeech
repository. X.Y.Z
stand for version. The model performs best when recordings are made in low-noise environments.
In addition to improve accuracy, we can use an external scorer that uses vocabulary. A scorer (deepspeech-X.Y.Z-models.scorer
) can be downloaded from the releases page.
DeepSpeech also offers a few sample audio files in WAV format. Download archive (audio-X.Y.Z.tar.gz
) and extract files.
We create a DeepSpeech model and enable an external scorer. The wave
module is used to read WAV audio file. We convert speech to text by using stt
method.
from deepspeech import Model
import wave
import numpy as np
modelPath = 'deepspeech-0.8.2-models.pbmm'
scorerPath = 'deepspeech-0.8.2-models.scorer'
audioPath = 'audio/2830-3980-0043.wav'
ds = Model(modelPath)
ds.enableExternalScorer(scorerPath)
fin = wave.open(audioPath, 'rb')
frames = fin.readframes(fin.getnframes())
audio = np.frombuffer(frames, np.int16)
text = ds.stt(audio)
print(text)
We can use own WAV audio files. We need to record a voice using appropriate parameters that matches what the model was trained on.
- Sample rate: 16 kHz
- Channel: 1
- Bit rate: 256 kb/s
A voice can be recorded by using SoX (Sound eXchange) command line tool.
- On Ubuntu or Debian, run the following command to install SoX:
sudo apt install sox
- On Windows, download SoX from SourceForge.
After installing SoX we can record a voice by using a command.
- On Ubuntu or Debian:
rec -r 16k -c 1 test.wav
- On Windows:
sox -t waveaudio -r 16k -c 1 -d test.wav
Leave a Comment
Cancel reply