Building Listenbooth

January 2026

I read a lot online. Too much, probably. Dozens of open tabs, all things I genuinely want to get through.

When I’m home, I’m behind my computer. When I’m outside, I try to avoid screens altogether. Partly because it feels better, partly because Andrew Huberman tells me to.

The problem is that everything I want to read still lives on my screen.

So I built Listenbooth.

It takes any URL with readable content and converts it into a “voice note” you can listen to. Articles, blogs, documentation pages, announcements - if it’s text on a page, it usually works.

Simple concept. The implementation, as usual, was more interesting than I expected.

Stack

Bun (v1.3+) as runtime
React 19 for the frontend
Google Gemini for text processing and TTS
Railway Storage Buckets for audio storage
Firecrawl for content extraction

I specifically wanted to try Railway’s new storage buckets. They’re S3-compatible, so you can use any S3 client library, but they’re tightly integrated with Railway’s platform.

Pipeline

Input → URL
Output → MP3

Scrape – Extract readable content
Optimize – Format for speech
Generate – Convert to audio
Store – Upload and serve

Scrape

Firecrawl takes a URL, returns clean markdown. I tried building my own scraper initially, but modern websites are hostile to extraction - JavaScript rendering, lazy loading, cookie modals. Sometimes the right move is to pay for a solved problem.

Optimize

You can’t feed raw markdown to TTS and expect good results.

“The 11 engineers met on 01/15/2024 to discuss the v2.0 release.”

A naive TTS says:

“The eleven engineers met on zero one slash fifteen slash twenty twenty-four…”

Before synthesis, I run text through gemini-2.5-flash-lite to:

Convert numbers to spoken form
Remove code blocks (nobody wants to hear console.log read aloud)
Strip markdown formatting
Expand abbreviations
Remove URLs, footnotes, and references

This took output from “technically correct” to “pleasant to listen to.”

Generate

Gemini’s gemini-2.5-flash-preview-tts model is different from traditional TTS. It understands context - you can tell it to speak cheerfully, or in a whisper, or with a specific accent.

There are 30 voices with names from mythology: Zephyr, Kore, Fenrir, Puck. Some are bright and energetic, others calm and measured. I exposed all 30 because I couldn’t pick favorites.

The model returns raw PCM (24kHz, 16-bit, mono). I spawn ffmpeg to convert to MP3:

const ffmpeg = Bun.spawn([
  'ffmpeg',
  '-f', 's16le',
  '-ar', '24000',
  '-ac', '1',
  '-i', 'pipe:0',
  '-q:a', '2',
  '-f', 'mp3',
  'pipe:1'
], {
  stdin: 'pipe',
  stdout: 'pipe',
  stderr: 'pipe'
})

Pipe PCM in, get MP3 out. No temporary files.

Store

Railway’s storage buckets: click “New” → “Storage Bucket.” Done. Credentials inject automatically.

import { s3 } from 'bun'

await s3.write(`audio/${id}.mp3`, mp3Buffer, {
  bucket: process.env.S3_BUCKET,
  accessKeyId: process.env.S3_ACCESS_KEY_ID,
  secretAccessKey: process.env.S3_SECRET_ACCESS_KEY,
  endpoint: process.env.S3_ENDPOINT,
  type: 'audio/mpeg'
})

const url = s3.presign(`audio/${id}.mp3`, { expiresIn: 3600 })

Railway buckets are private by default - you can’t link directly to a file. Instead, you generate a presigned URL that expires (I use 1 hour). When a user wants to play audio, my server generates the URL and redirects. The audio streams directly from the bucket to the browser.

Bucket egress is free on Railway, so I don’t pay for that bandwidth. For a project serving large audio files, this matters.

Streaming Progress

Conversion takes 10–30 seconds. Without feedback, users stare at a spinner. With SSE, I update them at each stage:

data: {"step":"scraping","status":"in_progress"}
data: {"step":"scraping","status":"complete","title":"Product Launch Announcement"}
data: {"step":"optimizing","status":"in_progress"}
...

Each step gets a checkmark when complete. Small thing, but it makes the wait feel productive rather than anxious.

Things That Didn’t Work

Audio Duration

The browser’s <audio> element has a duration property. Except 30% of the time I got Infinity or NaN.

MP3 duration detection is complicated. The browser needs either a duration header (which ffmpeg doesn’t always include) or to scan the file. For streaming audio, this might not be available immediately.

My fix: listen to multiple events - loadedmetadata, durationchange, canplaythrough - and only update when I get a finite, positive number.

const handleDurationChange = () => {
  if (audio.duration && isFinite(audio.duration) && audio.duration > 0) {
    setDuration(audio.duration)
  }
}

Not elegant, but it works. Sometimes that’s enough.

Environment Variables

Bun’s S3 driver expects specific variable names. Railway’s template provides them. I initially tried custom names and nothing worked. Read the docs.

Architecture Decisions

No database. History disappears on refresh. For a demo, this is fine. Adding a database means more complexity, more things to break. The audio files persist in the bucket - that’s the durable state that matters.

Single process. Bun.serve() handles API and static files. No nginx, no separate processes.

Plain CSS. No Tailwind. I know this is controversial, but I find it easier to reason about styling when it’s in one place rather than scattered across component files.

Base UI. Unstyled primitives for dropdowns, sliders, dialogs. Accessibility without fighting someone else’s design opinions.

Bigger Picture

This took a few hours. A “it works and I can show it to people” weekend project.

The code I wrote is mostly glue. Important glue - SSE streaming, progress UI, audio player - but glue. The hard problems are solved by other people’s infrastructure.

The tradeoff is that I’m dependent on these services. If Firecrawl changes their API, I update. If Gemini’s TTS gets deprecated, I find an alternative. If Railway’s pricing changes, I reconsider my architecture.

This is software as composition rather than construction. Build faster, ship sooner, depend on others more. For a weekend project, it’s the obvious choice. For something more serious, you might want more control.

Try It

Paste a URL, pick a voice, generate. It’s free. Then go for a walk.

Listenbooth - Convert any URL to Audio

Convert articles, blogs, and documentation into voice notes

listenbooth.up.railway.app