Rike Pool

Reverse engineering Seedream

I spent a weekend building an app with the Seedream 4.0 API from ByteDance, the company behind TikTok. What started as a simple “throw together a quick UI” project turned into one of those classic API exploration rabbit holes where you discover the documentation tells you what’s supported but not how things actually behave. I learned more from poking at the system than from any official spec. This post walks through the technical decisions and some of the more interesting discoveries.

For context: Seedream 4.0 is ByteDance’s image model that unifies generation and editing in a single architecture. It handles text-to-image, image-to-image with reference inputs, and natural language editing - all through one API. BytePlus hosts it via their ModelArk platform.


Building with Bun and vanilla JavaScript

The application is intentionally minimal: a Bun server running TypeScript on the backend, vanilla JavaScript on the frontend, zero framework dependencies. I chose this setup because the interesting complexity here lives in understanding the API itself, not in managing React state or wrestling with bundler configurations.

The server acts as a proxy. Your API key stays server-side, all requests flow through it, and the frontend can hit /api/images/generations without exposing credentials. The server validates everything before forwarding to BytePlus.

Here’s the entire architecture:

  • server.ts: Validates requests, injects API key, proxies to BytePlus
  • app.js: Calculates dimensions, handles uploads, makes API calls
  • index.html: Form controls and image gallery
  • styles.css: Dark theme and responsive layout

No build step, no transpilation. Bun serves everything directly and hot-reloads during development via the --hot flag. This simplicity turned out to be crucial when debugging API behavior.


Finding the hidden pixel budget

BytePlus’s documentation says the API supports resolutions up to 4K. I initially interpreted this as “4096 pixels on the long edge,” which seemed straightforward. But when I started testing different aspect ratios, something didn’t add up.

I tried generating a 9:16 portrait at what should have been 5K resolution (5120×2880), and it worked. Then I tried 6K (6144×3456) and got this error:

image size must be at most 16777216 pixels

That number is suspiciously specific. It equals exactly 4096 × 4096. The API doesn’t have a simple “4K max” constraint. Instead, it has a pixel budget of 16,777,216 total pixels.

This means you can exceed 4096 on one dimension as long as the total stays under the budget:

  • 4096×4096 = 16,777,216 pixels (100% of budget)
  • 5944×2547 = 15,141,168 pixels (90% of budget, valid)
  • 6144×3456 = 21,233,664 pixels (exceeds budget, rejected)

The API also has a minimum: 921,600 pixels. Go below that and you get an error. This floor exists for quality reasons (presumably to prevent tiny, malformed outputs). The ModelArk docs list the output pixel range as [1280x720, 4096x4096], which confirms the minimum is exactly 1280×720 = 921,600 pixels.


Automating optimal image dimensions

Once I understood the pixel budget, the obvious next step was calculating optimal dimensions automatically. Instead of forcing users to pick “4K” and hope for the best, the interface now has two special modes.

Min Quality calculates the smallest valid dimensions for your aspect ratio:

  • 16:9 becomes 1280×720 (exactly 921,600 pixels)
  • 1:1 becomes 960×960 (exactly 921,600 pixels)
  • 21:9 becomes 1393×597 (just above the minimum)

Max Quality calculates the largest valid dimensions:

  • 1:1 becomes 4096×4096 (16,777,216 pixels - 100% of budget)
  • 21:9 becomes 5944×2547 (15,141,168 pixels - 90% of budget)
  • 16:9 becomes 4864×2736 (13,276,704 pixels - 79% of budget)

The math uses the aspect ratio to solve for dimensions that maximize (or minimize) the pixel count while respecting the constraint. For a landscape ratio like 16:9:

const aspectValue = 16 / 9;
const height = Math.floor(Math.sqrt(16777216 / aspectValue));
const width = Math.floor(height * aspectValue);
// Result: 2736 × 4864

This gives users three options: use a preset like 2K for predictable output, use Min for speed, or use Max for quality. The pricing is the same regardless of resolution ($0.03 per image), so there’s no financial penalty for choosing Max.


Why batch generation ignored my settings

Seedream 4.0 has a feature for batch generation (sometimes called sequential image generation). When enabled, you can request multiple images in a single API call. The documentation suggests this maintains consistency across outputs, like generating the same character in different poses.

I implemented it the obvious way: set the appropriate mode and specify max_images: 4. Send one request, get four images back. Simple.

Except it wasn’t. Sometimes it returned one image, sometimes two, sometimes the requested four. The API was ignoring the count parameter.

After testing various prompts, the pattern became clear. The API’s “auto” mode looks at your prompt and decides how many images to generate. If your prompt says “a portrait,” it generates one. If your prompt says “four variations of a portrait,” it generates four.

The API is text-driven. The max_images parameter acts as a ceiling, not a target.

This created a UX problem. Users select “4 images” from a dropdown and expect four images. The solution was prompt enhancement. Before sending the request, the frontend prepends explicit instructions:

if (maxImages > 1) {
  enhancedPrompt = `Generate exactly ${maxImages} separate individual images,
    not a collage or grid. Each image should be completely independent and
    standalone. Do not combine multiple variations into a single image. ${prompt}`;
}

This verbose instruction overrides the auto-detection. The API now reliably generates the requested count. The tradeoff is prompt tokens, but the clarity is worth it. WaveSpeedAI’s documentation confirms this behavior - their guidance explicitly states to set the max_image count first, then specify the desired count in the prompt itself.


Speed vs consistency

Sequential generation has another interesting property: it’s slow. Really slow. The API generates images one at a time, where each image can use the previous one as context. This is great for consistency but terrible for speed. Generating 15 images sequentially might take 60+ seconds.

The pricing model doesn’t care about sequential vs parallel. Each image costs $0.03 whether you generate 15 in one request or 15 in separate requests. This created an obvious optimization opportunity.

I added a “Parallel Generation” toggle. When enabled, the frontend makes N simultaneous API calls instead of one sequential call:

if (parallelMode && maxImages > 1) {
  const promises = Array.from({ length: maxImages }, () => {
    return fetch('/api/images/generations', {
      method: 'POST',
      body: JSON.stringify({
        model,
        prompt,
        size,
        watermark,
        sequential_image_generation: 'disabled'
      })
    }).then(r => r.json());
  });

  const results = await Promise.all(promises);
  const combinedData = results.flatMap(result => result.data || []);
  return { model, data: combinedData };
}

All requests fire at once. They complete whenever they complete. The frontend waits for all of them, then combines the results into a single response object. The UI doesn’t know the difference.

The speedup is dramatic. Generating 15 images in parallel might take 8 seconds instead of 60. The tradeoff is consistency. Parallel images are independent variations, not a coherent set. Each call gets a different random seed.

Users now have a choice:

  • Sequential mode: slow, consistent characters and styles
  • Parallel mode: fast, independent variations

The generation timer shows the difference. After completion, the result panel displays “Generation Time: 6.8s” so users can compare approaches.


Generating 15 images in under 8 seconds

Combining Min Quality with Parallel Mode gives you the absolute fastest generation possible:

  • Min Quality: 720×1280 pixels (smallest valid size)
  • Parallel Mode: 15 simultaneous requests
  • Result: 15 draft-quality images in about 7 seconds
  • Cost: Still $0.45 (15 × $0.03)

Compare this to the slowest:

  • Max Quality: up to 4096×4096 pixels (largest valid size)
  • Sequential Mode: 15 images one at a time
  • Result: 15 high-quality consistent images in about 65 seconds
  • Cost: Still $0.45 (15 × $0.03)

Same price, 9× speed difference. The choice depends entirely on your use case.


Compressing images in the browser

Reference images can be uploaded for image-to-image generation. The API accepts up to 10 reference images and has a 10MB size limit per image. But sending 10MB images over the network is slow, especially on mobile connections.

The solution is client-side optimization. Before uploading, the browser checks if the image exceeds 4096px on either dimension or if the file size exceeds 2MB. If either is true, it creates a canvas, resamples the image, and compresses to JPEG at 85% quality:

const canvas = document.createElement('canvas');
canvas.width = width;
canvas.height = height;
const ctx = canvas.getContext('2d');
ctx.drawImage(img, 0, 0, width, height);
const optimizedDataUrl = canvas.toDataURL('image/jpeg', 0.85);

A 10MB photo might become a 500KB JPEG. The API receives smaller payloads, requests complete faster, and users on slow connections aren’t penalized. The quality loss at 85% is imperceptible for reference images.


Preventing the model from making collages

The sequential generation prompt enhancement worked, but it created an unexpected side effect. When users requested multiple images, the API sometimes interpreted “generate 4 separate images” as “create a 2×2 grid collage.”

Instead of four independent images, you’d get one image containing four panels. This defeated the entire purpose.

The fix required more explicit language. The current prompt enhancement reads:

Generate exactly 4 separate individual images, not a collage or grid.
Each image should be completely independent and standalone. Do not
combine multiple variations into a single image.

This verbose instruction is necessary. The model needs explicit negative examples (“not a collage or grid”) and positive reinforcement (“completely independent and standalone”). Without both, it occasionally falls back to collage mode.

The enhancement only runs if the user’s prompt doesn’t already mention image counts. A regex checks for keywords like “separate,” “individual,” “variations,” etc. If found, the prompt passes through unmodified.


Final thoughts on API design

Building this interface taught me more about the Seedream API than the documentation ever could. The pixel budget constraint, the text-driven image count detection, the speed tradeoffs, the collage problem - all of these emerged through experimentation.

The cleanest APIs are often the ones you have to poke at. Official docs tell you what’s supported. Testing tells you how it actually behaves. The gap between the two is where you find the interesting problems.

I think there’s a broader lesson here about API design. BytePlus built a flexible system (the pixel budget approach is actually quite elegant), but the documentation leads you toward simple mental models that break down at the edges. The “4K support” framing is technically accurate but practically misleading. Perhaps the better framing would have been “16.7 megapixel budget,” which immediately suggests the tradeoffs available.

The final application is about 500 lines of JavaScript, 300 lines of TypeScript, and 150 lines of HTML. No frameworks, no bundlers, no complexity. Just a thin layer over an API that does the hard work.

You can generate one image at 720×1280 in parallel mode for draft exploration. You can generate 15 images at 4096×4096 in sequential mode for final production. Or anything in between. The interface stays out of your way and lets the API do what it’s good at.

My sense is that this pattern - minimal frontend, smart proxy, aggressive experimentation with the underlying API - works well for a lot of AI tooling. The models are doing the heavy lifting. Your job is to understand their quirks and expose sensible controls. Everything else is just getting in the way.