In this article, I’ll show you how can you give voice to your chatbot so that you can ask question in your voice and get response back as an audio.
To make this entire flow, we need to integrate two services:
- Azure Speech Service
- Azure OpenAI Service
Which means that you should have an active Azure subscription before getting started.
This is how the high level workflow looks like:
First we will take user’s voice from microphone, then will transcribe that voice so that we can get text using Azure Speech SDK. Next, we will pass that text to OpenAI API and get the response back. As a final step, we will pass this response to Azure Speech SDK, which will convert it back to voice.
In order to proceed, one must full fill below requirements:
- Azure subscription — An active Azure subscription is required. If you do not have one, you can create a free account here .
- An instance of Azure speech service
- An instance of Azure OpenAI service
Let’s get started with each of the steps mentioned in workflow.
Generate Text From Voice
For generating text from voice or to transcribe, we need an instance of Azure Speech Service and that can be done in Azure portal. Once the instance is deployed in Azure, we need to the grab key and the region. If you are not sure how to perform this step, then watch my video here.
import azure.cognitiveservices.speech as speechsdk
AZURE_SPEECH_KEY = "ENTER_YOUR_KEY_HERE"
AZURE_SPEECH_REGION = "ENTER_YOUR_REGION_HERE"
speech_config = speechsdk.SpeechConfig(subscription=AZURE_SPEECH_KEY, region=AZURE_SPEECH_REGION)
speech_config.speech_recognition_language = "en-US"
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
print("You can speak now. I'm listening...")
speech_recognition_result = speech_recognizer.recognize_once_async().get()
output = speech_recognition_result.text