2026-05-14

Turning recorded D&D sessions into comics

I am participating in a Birthright homebrew RPG campaign in Icelandic, where there are two other players and a DM. Starting a couple of weeks back, I have begun to transform each session into a comic in English. The set up is not complicated enough to warrant more than one post describing it.

Why a separate recorder

I have purchased a Zoom H1 Essential and I just keep it on the table (attached to a small tripod). You can use a phone, but since a phone also does multiple tasks at once, once you start looking at it, checking a rule, or messaging somebody, you knock the microphone and turn it around, lowering the quality. An additional dedicated device will only do one thing uninterrupted. I just press record when the session starts and leave it on until the end.

Icelandic speech-to-text that actually works

Now I put the audio through ElevenLabs speech-to-text tool. I did try to generate summaries straight from the audio using Google Gemini before that, and it is good for a quick 2 minute summary, but for the whole 3-4 hour recording it gets lost somewhere in the middle. It is better to use a transcription as a leverage.

The text transcript generated by ElevenLabs is a hint, and not a ground truth. I was present at the event, after all, and my memory should be the ultimate authority here. However, at the current error rate, the transcription is quite a reliable record of everything that happens in a single session, especially the general flow of actions.

It may be noted that proper nouns can get mangled, Birthright race names in Icelandic like rjúfi or anúri can come out as nonsense, and our PCs’ names rotate through three or four spellings, but the structure of every scene is intact.

Transcript → Comic Script

This is where Claude Code does its thing. The workflow here looks like:

Read through the entire transcript. A huge amount of it will be stuff out of character (politics talk, fitness center talk, and who’s making coffee). Strip it all out. What’s left is the story as told in-character.
Create a page outline. Numbered list of pages, one line per page. This is approved before any pages are written.
Write each page one at a time. Every page contains a panel breakdown with dialogue, caption, sounds effects, and optionally a chibi of the DM in the margin. I review and fix each page before the next one gets written. Usually, it is not a lot of work, but it prevents going back and forth later on.
Create the art. Single Python function reads in the comic script, applies the appropriate references, then calls gpt-image-2 on every page.

I need the per-page cadence because the transcript lacks attribution. When we play, we usually do not announce what we do by saying things like “Bjarki casts fireball”. It is more common that someone yells “fireball!” and ElevenLabs speaker attribution might not get right who said it. Only I remember that Bjarki cast the fireball¹. Reviewing each page allows me to catch this kind of mistake early on, before it generates a new image that needs to be scrapped.

Reference images keep the cast consistent

The gpt-image-2 model produces nice panels, but it has no sense of history and thus cannot remember anything from its previous call. Consistency arises from the reference images provided for each generation task. Every PC, along with all NPCs that appear repeatedly in the story, gets their own character design sheet.

The _party ref: a labelled lineup of the three PCs plus Pú the squirrel. The model accepts up to sixteen reference images per call so three individual refs would fit, but a single composed lineup keeps proportions, palette, and silhouettes tighter than three separate refs negotiating composition and saves a bit on input tokens.

Character design sheet for Bjarki, with full-body view, head turnarounds, and callouts to signature gear. — An individual PC ref. Multiple face angles plus a full body give the model enough information to draw the same person in any pose without drifting.

NPCs get the same treatment. Below is Trausti van Gilt, a second-year mirror-magic dilettante who shows up later in the session.

Reference sheet for Trausti van Gilt, with a full body view, three-quarter portrait, and signature silver mirror pendant. — An NPC design sheet. Once Trausti has a ref on disk, every page that lists him in `NPCS_ON_PAGE` picks it up automatically and his face stays consistent across panels.

The page generator makes its decision based on the comic script. The page identifies the characters appearing on it (CHARACTERS_ON_PAGE: _party, NPCS_ON_PAGE: trausti) and the script uses that information to choose the correct PNGs from disk. If a character takes a visible object or suffers a visible injury that continues through several pages, then the system creates an alternate ref, anchored on the main ref (e.g., bjarki_wounded), and switches to using the new slug until that particular condition passes.

Appearance of the pages

The first page from session four is a silent establishing shot. The group is taking rest inside the entrance of the magic dungeon that seals itself in six days from when it was opened. Pú the squirrel is perched atop the shoulders of Bjarki. The yellow text indicates the setting of the image, the parchment-like box with the d20 icon represents the DM’s voice, while the little chibi drawing in between panels represents the DM’s commentary during the session.

Comic page: three adventurers and a squirrel sit by lanternlight inside a dungeon. Captions establish that they are on day two of four. — Page 1 of the 5 May session. Four distinct text elements (speech bubble, yellow caption box, parchment DM-narration box, and the blue-pen chibi DM doodle) each have their own visual treatment, defined once in a style guide and prepended automatically to every prompt.

A louder page from the same session. Skarphéðinn throws an alchemical fire-flask into a chamber full of giant spiders, the DM whispers “There were two. There is one.” in the margin, and the panel layout gets to breathe.

Comic page: a character throws a flask of alchemical fire into a spider-filled chamber. Large FWOOOM! sound effect. — Sound effects are integrated into the art. The big middle panel is deliberately wider than the others; when the page outline gives a beat its own page, I give the loudest panel most of the real estate.

A puzzle page near the end. The party reaches a chamber full of mirror-objects and works out that the dungeon’s parting gift is whichever ordinary-looking thing pairs with one of the silver replicas.

Comic page: party studies mirror objects in a vault, including a silver dog statue. Pú the squirrel watches from Bjarki's shoulder. — Page-to-page continuity is enforced by attaching the previous one or two already-generated pages as additional references when generating each new page. The model is told to use them *only* for lighting and lettering style, never for layout or composition.

Where it breaks, and what I do about it

Some actual failure points:

Moderation flag for warlock-class terms. Using “Pact,” “patron,” and even “Hex” will cause a page ban. If “Hex” is included with an image of a living, breathing human, expect immediate moderation and page lock. I replace “Pact” with “bargain,” “Patron” with “mentor,” and “Hex” with “magical curse.” “Through” pierce-through anatomy will block but “across” without any gore or blood allowed.
PCs with similar appearances swap places. Two bearded, cloaked characters switch back and forth between each page until I force each panel into explicitly labeling which characters are actually in that frame. This led me to include an additional field CHARACTERS IN FRAME on a per-frame basis.
The DM is never included within a panel. The DM is a storyteller who operates exclusively in marginalia and narration boxes. No exceptions, he doesn’t interact. Chibi ref image attachment is automatic whenever a prompt includes a margin doodle.

Where I’m going with this

Right now, I treat the pipeline as a session summariser, where it gives me a 16-page summary of what took place, which gets posted to our group chat before the next session starts, allowing us to stay on track for the upcoming game. If given some time, it could be made into an actual comic book and get published, but that isn’t something I’m striving for here. The aim is to create a memorable record of the sessions, one that would be more enjoyable to read than the transcript.

The Skill, should you choose to follow this path

Here is the condensed version of my project’s CLAUDE.md. Just rename it to SKILL.md, put it in a new folder with a structure similar to the one below, and voila, you’re ready to go!

---
name: session-to-comic
description: Turn a recorded TTRPG session into a graphic-novel comic. Use when a new YYYY_MM_DD/ folder appears with a script.txt transcript. Reconciles cast, drafts a page-by-page script with per-page approval, and generates final pages with gpt-image-2.
---

# Session-to-comic pipeline

## Project layout

```
project/
├── characters/<slug>/profile.md       # PCs and DM, with REF_LABELS section
├── characters/<slug>/refs/ref_main.png
├── characters/_party/refs/ref_main.png  # composed group lineup
├── npcs/<slug>/profile.md             # in-world recurring NPCs
├── npcs/<slug>/refs/ref_main.png
├── world/style_guide.md               # canonical visual style + lettering spec
├── generate_image.py                  # CLI: character | npc | chibi | page
└── YYYY_MM_DD/
    ├── script.txt                     # raw STT (the only required input)
    ├── comic_script.txt               # produced by this skill
    └── pages/page_NN.png              # produced by this skill
```

## Working principles, read first

The user was actually present at the session. The transcript is noisy and
loses attribution and location. The user's memory is authoritative; the
transcript is a hint.

- **Ask, don't invent.** When the transcript is ambiguous about who, where,
  what, or how, ask one short question before resolving the ambiguity.
- **Be explicit about what you are leaving out.** In every page outline,
  list in-character beats you are deliberately omitting (out-of-character
  chitchat doesn't count).
- **Get per-page approval before drafting the next page** by default. Offer
  the user the option to switch to "draft all the way through" once page 1
  is approved.
- **Surface uncertainty as a question, not a guess.**

## Workflow

1. **Read the entire transcript.** Identify only the in-character story beats.
2. **Reconcile cast state.** Update each PC's `profile.md` for new items,
   visible wounds, session changelog. Regenerate any ref whose appearance
   changed, then regenerate the `_party` group ref.
3. **Identify and create new NPCs.** Any named NPC with more than a passing
   mention gets `npcs/<slug>/profile.md` with a REF_LABELS section, plus a
   generated ref.
4. **Write `comic_script.txt`.**
   - **Gate 4a:** propose a page outline (numbered list, one line per page,
     plus list of omitted beats). Wait for approval.
   - **Gate 4b:** ask whether to draft page-by-page or all the way through.
     Default to page-by-page for page 1, then offer the choice.
   - Each panel declares `CHARACTERS IN FRAME` with a one-line visual
     reminder per character. Without this, similar-looking PCs swap
     identities between panels.
5. **Final review.** Present the full `comic_script.txt` and any new refs.
   Wait for an explicit "go ahead, generate"; per-page drafting approval
   does not imply image-generation approval.
6. **Generate every page** in script order (not parallel), so each page can
   chain the previous one or two as continuity references. Run in the
   background, monitor for errors, retry moderation failures with the
   substitutions below.

## Page format

```
=== PAGE N ===
CHARACTERS_ON_PAGE: _party
NPCS_ON_PAGE: <slug>, <slug>

PROMPT: <one-sentence framing for the whole page>

PANEL N (size, position):
  CHARACTERS IN FRAME: <slug (visual reminder)>, <slug (visual reminder)>
  SCENE: <action and composition>
  CAPTION: "<terse scene-setting, yellow box>"
  DM NARRATION: "<world-lore narrator voice, parchment box>"
  CHARACTER_NAME: "<dialogue, speech bubble>"
  SFX: "<sound effect, integrated into art>"

DM MARGIN DOODLE (between panels X and Y): "<chibi DM + handwriting>"
```

## Moderation substitutions (apply before the first attempt)

| Don't write | Use instead |
|---|---|
| Molotov, firebomb, rag-wick | "alchemical fire-flask," "alchemical fuse" |
| Patron, pact, Hex, Invoke Doom | "old mentor," "old bargain," "three new spells" |
| Pact weapon | "summoned blade" |
| Anatomical pierce-through (sword through head, etc.) | Clean stylized across-body strike; add "no gore, no blood, no anatomical detail of the strike itself" |

When a real photo is attached as character-ref input, frame the prompt as
"an artist's character design sheet for a tabletop role-playing game" and
treat the photo as "loose inspiration." Never combine a real photo with
warlock-class language in the same prompt.

## Commands

```bash
python3 generate_image.py character --name <slug> [--force]
python3 generate_image.py npc       --name <slug> [--force]
python3 generate_image.py chibi     --name <slug> [--force]
python3 generate_image.py character --name _party --force
python3 generate_image.py page      --session YYYY_MM_DD --page N [--force]
```

The generate_image.py script itself is short, about 600 lines of Python that prepends the style guide, picks ref images off disk based on the page’s declared cast, and posts to the OpenAI Images API. It’s easy to reproduce from OpenAI’s API reference.

How to cite this

If this writeup helps you set up something similar and you want to acknowledge it somewhere:

@misc{einarsson2026dndcomics,
  author       = {Einarsson, Hafsteinn},
  title        = {Turning recorded {D\&D} sessions into comics},
  year         = {2026},
  month        = may,
  howpublished = {Blog post},
  url          = {https://haffi112.github.io/2026/05/14/dnd-comics/}
}

Although I might change my playstyle in the future and announce actions in the third person more often, that might feel strange and take away from the immersion of the game. ↩