(Research) STT (1) – Converting Speech to Text (Guide)

Guide to Converting Speech to Text Using Python


Those who have some experience in program development might have wondered or been intrigued by the idea of developing voice-based services when using AI speakers (or assistants). ^^

For such individuals, even those with limited development experience or lacking knowledge in natural language processing or speech recognition, there are several easy-to-follow packages that I would like to introduce.

You can utilize these tools according to your specific situation



Speech_recognition

Speech Recognition Using Python: A Guide

For those interested in voice recognition services, here's an overview of the available packages and tools you can use to convert speech to text in various languages. Below are some popular packages and APIs that support speech recognition in different languages:

Popular Speech Recognition Libraries/Packages

  • apiai
  • assemblyai
  • google-cloud-speech
  • pocketsphinx
  • SpeechRecognition
  • watson-developer-cloud
  • wit

Speech Recognition with Language Support (Other than English)

Many libraries and APIs allow you to specify a language keyword argument to recognize speech in languages other than English. Example:

python
recognize_google(audio, language='fr-FR') # For French

Using SpeechRecognition Library

In my case, I’m using the speech_recognition library and recognize_google() method to convert speech to text in Korean.


Environment Setup for Speech Recognition

  1. Install SpeechRecognition Library
    Use pip to install the SpeechRecognition library:

    bash
    pip install SpeechRecognition
  2. Install PyAudio for Microphone Access
    To access the microphone with the SpeechRecognizer, you need to install PyAudio:

    On Windows:

    bash
    pip install pyaudio

    On Debian Linux:

    bash
    sudo apt-get install python-pyaudio python3-pyaudio

Supported Audio File Formats

The following file formats are supported for speech recognition:

  • WAV (PCM / LPCM format)
  • AIFF
  • AIFF-C
  • FLAC (Standard FLAC format, OGG-FLAC is not supported)

Example Code: Converting Speech to Text (Korean)

Here’s a simple example of how to use the SpeechRecognition library to convert speech into text:


import speech_recognition as sr


recognizer = sr.Recognizer()
recognizer.energy_threshold = 300

## wav 파일 읽어오기
harvard_audio = sr.AudioFile("./TEST.wav")

# audio file을 audio source로 사용합니다
r = sr.Recognizer()
with harvard_audio as source:
    audio = r.record(source)  # 전체 audio file 읽기

# 구글 웹 음성 API로 인식하기 (하루에 제한 50회)
try:
    print("Google Speech Recognition thinks you said : " + r.recognize_google(audiolanguage='ko'))
except sr.UnknownValueError:
    print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service; {0}".format(e))
    
"""
    recognize_bing(): Microsoft Bing Speech
    recognize_google(): Google Web Speech API
    recognize_google_cloud(): Google Cloud Speech - requires installation of the google-cloud-speech package
    recognize_houndify(): Houndify by SoundHound

    recognize_ibm(): IBM Speech to Text
    recognize_sphinx(): CMU Sphinx - requires installing PocketSphinx
    recognize_wit(): Wit.ai
"""

print(recognizer)

Comments

Popular posts from this blog

[MaritimeCyberTrend] Relationship and prospects between U.S. Chinese maritime operations and maritime cybersecurity

인공지능 서비스 - 챗봇, 사전에 충분한 지식을 전달하고 함께 학습 하기!

Matching Shipbuilding Schedules with Cybersecurity Deliverables