So, I recently cooked up this side project called pdf-to-podcast.com. It's pretty simple: you toss in a PDF, and it spits out an audio podcast you can listen to while doing the dishes or whatever. But behind the scenes, there's some cool tech stuff happening.

The Tech Stack

Why Gemini and OpenAI?

Now, you might be wondering why I went with Google Gemini for the LLM and OpenAI for TTS. Here's the deal:

The Secret Sauce (aka How It Works)

  1. Upload Your PDF: It's as easy as drag-and-drop.
  2. Gemini Does Its Thing: The LLM reads your PDF and crafts a podcast-style conversation.
  3. Multi-Voice Magic: Each part of the conversation gets assigned a different speaker, and OpenAI TTS gives them their own voice.
  4. Audio Mixing: All the voices are blended into one audio file.
  5. Podcast Time: You get your brand-new podcast, ready to listen to!

Prompt Generation Pro-tip

A pro tip for generating prompts is to use Anthropic’s prompt generation tool. You may need to create an account with them to use it. You’ll find it after logging in to the dashboard.