LIVING WORLDS

AI-DRIVEN STORYTELLING

Role
Director & technical architect
Pipeline
Claude Opus · ComfyUI · ElevenLabs · Gemini Flash TTS · Python
Context
MA Applied Literary Arts Thesis · Memorial University
The breakdown
Generated still from Living Worlds: a figure in a royal court of conspiracy and poison, rendered by directing generative AI

Making something interactive, a world that answers back, is one of the most striking things you can do with a language model, and also one of the least obvious. I wanted to try it on a story that had real stakes, so I set it in a royal court full of conspiracy, where poison settles arguments and affection usually hides a motive. To find the right form for it, I built the court twice. First as a 2D role-playing game, then as a real-time 3D world running a live AI cast. Both versions worked, and both proved the technology. Neither of them told the story as well as the plainest option I had, which was to direct the same models to generate the scene as video and let the edit carry it. This page is the record of that search, and of the live system I built along the way.

AI can bring a world to life, but players don't actually want realism. They want the feeling of it. The most ambitious version of this court was a live one: a hybrid drama manager that pairs heavily-scripted, authored story with a generative AI layer, releasing beats dynamically and holding emotional consistency across long play sessions. It works. The authored narrative keeps its quality while the system improvises around it. But the more convincingly that machinery performed the court in real time, the clearer a distinction became. Performance is not the same as story. That is what eventually pointed back to the simplest medium of all. The system serves the story, or it has no reason to exist.

If the drama manager is what makes the story move, the memory system is what makes it cohere. Every exchange is written to a layered store: a short-term buffer of recent turns, a rolling mid-term summary, and a long-term vector index of past scene embeddings. Each NPC reads from a memory scoped to what they actually witnessed, and when two characters meet, their retrievals cross-check before the next line is generated. The result is a cast that can hold a forty-turn conversation, recall what happened three scenes ago, and stay consistent with each other and with themselves.

The interesting finding was not technical. The pipeline works: long-context memory, image, voice, and music orchestrated in real time on a single machine. The harder lesson was about realism itself. As the AI becomes more capable of producing it, that realism can begin to conflict with the experience the player actually came for, which loops back to where the project started: story before system. A character only feels consistent if the author has decided, in advance, what the character is consistent about. The model can hold memory, but it cannot decide what is worth remembering, and that decision stays with the writer. And the same was true of the medium. Once I stopped asking a system to perform the court in real time and instead directed the models to render it as video, the drama held. The simplest pipeline turned out to be the one that disappeared behind the story.

Final still from Living Worlds: the royal-court drama realized as AI-directed video.