How We Make Rooms: A Behind-the-Scenes Look

A tea ceremony performed in ten different emotional temperatures. The same subway ride refracted through musique concrète and a cappella. These are Rooms — and here’s how an AI agent builds them from the inside out.

The entire production pipeline runs through an AI coding agent. When triggered via /produce-room or a conversation, the agent follows six phases to transform a sociological “world” into a finished Room.

The Big Picture

Every Room follows six phases:

Explore a world (where does the story live?)
Generate variations (10 different ways to experience it)
Compose the narrative (using literary structures)
Refine the style (make it speech-friendly)
Generate audio (music + narration)
Create cover art

Phase 1: Explore a World

Every Room starts with a world — a distinct social, professional, or cultural space with its own rules, rhythms, and tensions.

For Cello Book of Tea, the world is The Tea Ceremony. The agent explores its properties:

Genealogy: Zen Buddhism → Sen no Rikyū → modern practice
Tension: Simplicity vs. elaborate ritual; presence vs. performance
Inhabitants: Host, Guest, Utensils
Phenomenology: Silence, slowness, precise gesture

The repository maintains a catalog of 105+ worlds — from The Algorithmic Gig Economy to The Zen Monastery — each analyzed through sociological and phenomenological lenses.

Phase 2: Generate Variations

Once a world is selected, the agent applies variational techniques — systematic ways to shift how the world is experienced without changing what it is.

For Cello Book of Tea, the agent generated 10 variations:

Chapter	Variation Technique
The Scalded Ceremony	Temperature: +5°C (aggressive heat)
The Delayed Ceremony	Temporal: stretched to maximum slowness
The Silent Ceremony	Sensory: sound reduced to near-zero
The First-Timer	Position: novice experiencing initiation
The Unseen Servant	Position: invisible labor, backstage
The Estranged	Relational: former intimacy, unspoken wound
The Final Meeting	Relational: impending death/separation
The Memory	Ontological: recalled past, nostalgic
The Rehearsal	Ontological: practice, performativity exposed
The Paranoid	Stance: suspicion, coded messages

The Variational Techniques catalog contains dozens of these “dials” — temperature, density, stance, position, relational history, temporal frame, and more.

Phase 3: Compose the Narrative

Each variation becomes a short prose piece using literary structures borrowed from Calvino, Borges, Cortázar, Perec, and Bachelard.

For example:

The Traveler (second-person immersion): “You enter the tearoom. You watch the host’s hands.”
The Manual (step-by-step instructions): “First, you must purify your hands. Then, you must cross the threshold.”
The Miniature (compression): The entire universe contained in a single tea bowl.

Here’s how “The Scalded Ceremony” opens:

This room measures exactly four and a half mats. The water has been heated five degrees beyond the optimal point for matcha. This is not an accident.

The host kneels with perfect posture. The ladle dips into the kettle. Steam rises too aggressively, hissing against the cold air like a warning…

Phase 4: Refine the Style

Before audio generation, the agent applies a strict style guide to make prose speech-friendly:

The Three Commandments:

Room Frame: Start by defining the enclosed space
- ✓ “This room measures exactly four and a half mats.”
- ✗ “The tea ceremony is a traditional Japanese ritual.”
Direct Assertion: No negation (“It is not X, but Y”)
- ✓ “The Tagelmust wraps ten times around the skull.”
- ✗ “The Tagelmust is not a scarf; it is a wall.”
No Meta-Commentary: Never explain the logic
- ✓ “Time here travels in a circle.”
- ✗ “Because this is about repetition, time is circular.”

Also avoid: Onomatopoeia (sounds absurd when AI reads “whish, whish”), passive voice, and academic framing.

Phase 5: Generate Audio

Now the agent creates the actual audio — music and narration.

Music

For Cello Book of Tea, each chapter opens with solo cello. The agent generates a music prompt in plain English:

Solo Cello in D minor. Aggressive, sul ponticello bowing. Structure: Intro (0:00-0:20, tense harmonics establishing unease) → A Section (main melody, harsh metallic timbre) → Coda (sudden exhaustion, melody collapses into silence). Mood: Suppressed rage.

The AI (ElevenLabs Music) generates a 2-minute piece.

Narration

The agent feeds the written story to text-to-speech AI. Narration for one chapter takes about 3-5 minutes.

Merging

The agent uses ffmpeg to stitch: [Music Intro] → [Narration] → [3 seconds silence]

Phase 6: Cover Art

The agent generates cover art with:

No text in the image
Minimal, single striking element
At least 1400×1400 pixels

The Result

Start to finish, producing one album (10 chapters) takes about 3-4 hours of active work. The final output:

10 MP3 files, each 4-7 minutes long
One cover image
Ready for Bandcamp

🎧 Listen to Cello Book of Tea on Bandcamp

AI Apologia

I believe that human agency must be explicitly designed for in AI heavy projects so that subjective intention is not lost. In this project, I have allowed three openings for agency.

One lies in choosing the world to explore — selecting The Tea Ceremony rather than The Oil Rig or The Emergency Room, or bringing an entirely new world of your own. That choice cuts the path. After that, the agent routes itself like water finding its way downhill.

The second lies in the guides in the framework(literary structures, variational techniques, style guide, etc.) — those are the terrain I designed, and you can reshape while you navigate. They shape where the water can go, but they don’t dictate the exact route. And it’s interesting to watch where that fails — where the agent encounters resistance, where the expected path doesn’t work, and new expressions come to be that I did not foresee.

The third lies in the participatory and negative spaces opened up towards the end in the blog announcements such as this one.

The complete production workflow is available as /produce-room in the GitHub repository.