Passing An Audio File To LLM

Shweta Lodha
4 min readNov 8, 2023

In this article, I’ll explain about how we can pass an audio file to LLM and I’m taking OpenAI as our LLM.

There are many people who prefer audio and video tutorials over reading along with our podcast lovers as listening seems to be more effective for them as compared to reading a book, an e-book or an article, and it is quite common that after a certain period of time, we may forget some of the portions of our tutorial. Now, in order to get the insights again, re-watching or re-listening is the only option, which could be very time-consuming.

So, the best solution is to come up with a small AI-based application by writing just a few lines of code which can analyze the audio and respond to all the questions that are asked by the user.

Here, utilizing generative AI could be the best option, but the problem is, we can’t pass audio directly as it is text-based. Let’s deep dive into this article, to understand how we can make this work in a step-by-step fashion.

High-level steps

To execute the solution from end-to-end, we need to work with below components/libraries:

Audio to Text Generator

  • For transcript generation, we will be using AssemblyAI

Embedding Generator

  • For generating the embeddings, we will be using OpenAIEmbeddings

Vector Database

  • Chroma will be used as an in-memory database for storing the vectors

Large Language Model

  • OpenAI as LLM

And all these are wrapped under a library called Langchain, so we will be highly utilizing that too.

First of all, we need to grab the keys as shown below:

Get An OpenAI API Key

To get the OpenAI key, you need to go to https://openai.com/, login and then grab the keys using highlighted way:

Get An AssemblyAI API Key

To get the AssemblyAI key, you need to go to AssemblyAI | Account, login and then grab the keys using highlighted way:

--

--