Passing An Audio File To LLM

4 min readNov 8, 2023

In this article, I’ll explain about how we can pass an audio file to LLM and I’m taking OpenAI as our LLM.

There are many people who prefer audio and video tutorials over reading along with our podcast lovers as listening seems to be more effective for them as compared to reading a book, an e-book or an article, and it is quite common that after a certain period of time, we may forget some of the portions of our tutorial. Now, in order to get the insights again, re-watching or re-listening is the only option, which could be very time-consuming.

So, the best solution is to come up with a small AI-based application by writing just a few lines of code which can analyze the audio and respond to all the questions that are asked by the user.

Here, utilizing generative AI could be the best option, but the problem is, we can’t pass audio directly as it is text-based. Let’s deep dive into this article, to understand how we can make this work in a step-by-step fashion.

High-level steps

To execute the solution from end-to-end, we need to work with below components/libraries:

Audio to Text Generator

For transcript generation, we will be using AssemblyAI

Passing An Audio File To LLM

High-level steps

Audio to Text Generator

Embedding Generator

Written by Shweta Lodha