Skip to content

Building a Voiceover Generator with ElevenLabs and Cline

Welcome! In this post, I’ll walk you through building a web application that generates multiple voiceovers using the ElevenLabs API and Streamlit. If you prefer to follow along in video form, you can watch the complete tutorial here: Voiceover Generator Tutorial on YouTube. The video goes into greater detail on some steps, especially those involving real-time coding decisions and troubleshooting. Whether you choose the video or this written guide, you should be well-equipped to create and deploy your own voiceover generator.

What You’ll Need

Before we dive into the technical details, there are a few things you will need to set up beforehand:

  • OpenRouter API Key (requires a small credit balance)
  • GitHub Account (free)
  • ElevenLabs API Key (free)
  • Streamlit Account (free)

Each of these can be created easily, and I’ve provided additional guides in the video for those who might need help with setup.

Tools We Use

For this project, we’ll use the following tools:

  • GitHub Codespaces: A cloud development environment that makes it easy to code from anywhere.
  • Cline VS Code Extension: This acts as our AI assistant to help with coding tasks, providing suggestions and even writing parts of the code.
  • OpenRouter: Provides API routing capabilities.
  • Streamlit: A powerful yet beginner-friendly framework that allows for rapid prototyping of web applications with Python.

Building the App Step-by-Step

Step 1: Setting Up the GitHub Repository

First, create a new GitHub repository and initialize it with a README, .gitignore, and license. This repository will be the base for all of our code. Open this repository in GitHub Codespaces to begin coding.

Step 2: Installing Cline as Our AI Assistant

Once we have our repository open in GitHub Codespaces, we need to install the Cline VS Code extension. Cline will act as our development copilot, generating boilerplate code and making suggestions for building our Streamlit app. After installing, you’ll need to provide an OpenRouter API key to connect Cline to OpenRouter’s services.

Step 3: Creating the Project Structure with Cline

Next, we ask Cline to create a basic project structure for our Streamlit app. Our goal is to have an app that processes XML-like formatted files and generates multiple audio files using the ElevenLabs API.

To begin, we prompt Cline to create essential files, including env.template, elevenlabs_api.py, script_parser.py, and app.py. Each file serves a specific purpose:

  • elevenlabs_api.py: Handles all API interactions with ElevenLabs.
  • script_parser.py: Parses the XML-like formatted files that describe which character says which lines.
  • app.py: The main Streamlit file that runs the web application.

Step 4: Setting Up API Keys

Once the project structure is ready, it’s time to configure the environment. Copy the env.template file and rename it as .env, then provide your ElevenLabs API Key. This .env file will store your credentials securely, allowing you to connect with the ElevenLabs API.

Step 5: Running the Application Locally

With everything set up, we’re ready to run the application for the first time. In GitHub Codespaces, use the following command to run the Streamlit app locally:

streamlit run app.py

Once executed, Streamlit will spin up a local server, and you can view your application in your browser. The initial UI created by Cline will allow you to upload scripts and generate voiceovers.

Step 6: Testing and Tweaking the Voiceover Generation

We then test the voiceover generation functionality by assigning different voices to various characters in the script. Using the ElevenLabs API, the app generates separate MP3 files for each character’s voice. It’s amazing how effective Cline was in helping us achieve this functionality in one go, although we still had to verify and tweak parts of the code.

Step 7: Deploying to Streamlit Cloud

Once we are satisfied with the locally tested version, it’s time to deploy the app to Streamlit Cloud so that others can access it. Since we’ve already linked our GitHub account to Streamlit, deploying is straightforward:

  1. Click on Create App in the Streamlit dashboard.
  2. Select the voiceover-generator repository from GitHub.
  3. Set the main file path to app.py.
  4. Click Deploy.

Once deployed, your application will be available online for public use. Make sure to keep your API keys secure and limit the number of requests if sharing the app publicly.

Challenges Encountered

One of the challenges we faced during deployment was handling the environment variables for the ElevenLabs API key. We needed to ensure that the .env contents were properly formatted when copying them to the Streamlit Cloud’s secret manager.

We also ran into a situation where ElevenLabs detected unusual activity, likely due to the number of API requests during testing. To resolve this, we generated a new API key and rebooted the application.

Recap and Conclusion

By following this tutorial, you now have a fully functional multi-voice generator using the ElevenLabs API and Streamlit. This app can generate voiceovers for different characters in a script, making it ideal for content creators, educators, or anyone who needs custom audio quickly.

If you want to see the entire development process in action, including the challenges we encountered and how we solved them, check out the full video here: Voiceover Generator Tutorial.

Feel free to explore the GitHub repository and experiment with the code: Voiceover Generator GitHub Repo. If you have any questions or suggestions, leave them in the comments, and I’ll be happy to help.

Happy coding, and all the best with your voiceover projects!

Leave a Reply

Your email address will not be published. Required fields are marked *