ESSAY

What "AI-narrated" actually means.

The phrase "AI-narrated audio guide" gets used to describe four genuinely different products. They are not the same thing, and the differences matter for what a museum is actually buying. A short, opinionated taxonomy.

ERIC DUFFY·FOUNDER·JUN 6, 2026·7 MIN READ

Four small ceramic vessels of subtly different shapes arranged in a row on a long pale-wood table, soft directional side-light — the lead image for the essay on the four distinct products marketing calls "AI-narrated".

Every vendor I'm aware of in the museum audio-guide space has called their product "AI-narrated" at some point in the last two years. I have done it myself. The phrase has become the shorthand for everything that isn't a 2010-era handset, and it covers a category that has, in fact, four distinct products underneath it. The four products do different things, fail in different ways, and are appropriate for different museums. Treating them as one thing — which is what the marketing language does — is making procurement decisions worse.

So here is the taxonomy, written for a curator or director who is about to sit through four vendor pitches and needs a vocabulary to keep them straight.

The four things "AI-narrated" can mean

The first product is AI-voiced human writing. A curator or contract writer drafts the entire tour script in text. Neural text-to-speech produces the audio from the approved script. The "AI" in "AI-narrated" refers exclusively to the voicing step. Nothing about the content was generated by a model; the model only read what the human wrote. This is the conservative version of the category, and it is genuinely useful — it eliminates the studio booking, voice talent contracts, and multilingual recording costs, while keeping the writing entirely in the institution's voice.

The second product is AI-assisted drafting plus AI voicing. The curator uploads reference materials — catalog notes, wall text, exhibition essays. A language model produces a first-pass draft of each stop, drawing on the uploaded materials. The curator reviews, edits, and approves. The approved draft is then voiced. The "AI" here is doing two jobs — drafting and voicing — and the curator is the editor on both. This is, in my opinion, the most defensible version of the category for most institutions: the model accelerates the writing step but doesn't replace the curator's editorial judgment.

The third product is AI-generated tour-on-demand. The visitor opens the tour and the platform generates the script in real time from a knowledge base, sometimes including general training-data knowledge alongside whatever the institution provided. There is no curator review step at the script level; the institution sets the source materials and the model produces the narration the visitor hears, often differently for each visitor. The "AI" here is doing the curatorial work, which the museum is paying for the privilege of supervising indirectly.

The fourth product is AI-as-chatbot, where the audio-guide pretense is mostly decorative. The visitor scans a QR code and is dropped into a chat interface backed by a model — sometimes a model with retrieval over the museum's content, sometimes a generic model with the museum's branding on top. There may or may not be pre-produced audio narration at all; the product is really a chat layer, and the "AI-narrated" framing is being applied retroactively to describe the conversational element.

These are not, in my view, four versions of the same product with different feature sets. They are four genuinely different products with different failure modes, different appropriate use cases, and different procurement implications. The marketing language obscures this, and most vendors are vague about which one they actually ship.

Why the differences matter for what you're buying

The four products fail differently, and the failure mode matters more than the success mode when you're picking a vendor.

The first product (AI-voiced human writing) fails the same way a traditional studio production fails — if the script is bad, the audio is bad, but at least you know exactly what's wrong and you can fix it by rewriting the script. There is no surprise.

The second product (AI-assisted drafting with curator review) fails when the curator review step is inadequate. The model produces a draft; the curator skims; something subtly wrong gets through. This is a real risk, and it is the risk that any institution moving from traditional production to AI-assisted production needs to manage. The risk is real but bounded — every line a visitor hears was approved by a human who knew the material, even if the human was tired the day they approved it. You can audit. You can correct.

The third product (AI-generated tour-on-demand) fails in a way that is structurally harder to fix. The visitor receives narration the institution never saw. When it goes wrong — a confident wrong attribution, a hallucinated provenance, a misstated fact — the institution finds out from a complaint, not from a review process. The cost of correction is higher because you have to figure out what the model said, why it said it, and whether the fix is to change the source materials or to constrain the model further. This is not, in my view, a product that most museums should be buying.

The fourth product (AI-as-chatbot) fails when the chatbot answers questions outside its competence. If the chat layer has retrieval over the institution's materials and refuses to answer when it can't ground, this can be a useful complement to a real audio tour. If the chat layer is a generic model with museum branding, it is producing the same content the visitor could have gotten from ChatGPT on the same phone, and the museum is paying for the wrapper.

If you do not know which of these four a vendor is actually selling, you do not know what you are buying. The "AI-narrated" framing makes them all sound the same. They are not the same.

How to figure out which one a vendor ships

A short set of questions, in priority order.

Who writes the script the visitor hears? If the answer is the curator (with or without AI assistance), you are looking at product one or two. If the answer is the model, with or without retrieval, you are looking at product three. If there isn't really a script and the experience is mostly chat, you are looking at product four.

Does a human approve each stop before a visitor hears it? This is the single most important question and the one that separates products one and two from product three. The "yes" answer is a defensible category. The "no" answer is one that I think most institutions, on reflection, do not want to be buying.

What does the platform do when the model doesn't know? If the answer involves the model producing a "best guess" or "general background," you are in product three or four with a vendor who hasn't thought hard about grounding. If the answer is a graceful refusal pointing the visitor to a docent or the wall card, you are in a category that takes the failure mode seriously.

What's the difference between the visitor experience and what a curator would write? If the visitor experience could have been written by the curator with the same source materials, you are in product one or two. If the visitor experience is something the curator could not have written — because it changes for each visitor, or because it goes off on tangents the curator wouldn't have pursued — you are in product three.

These four questions, asked plainly, separate the four products faster than any feature list. They are the questions I'd ask if I were on the buying side and I'm noting them publicly here because I think every museum evaluating this category should have them on a notecard for vendor calls.

Where I think the category settles

I don't believe all four of these products survive the next five years. The fourth one is mostly a positioning trick — once buyers learn to ask the questions above, the chatbot-with-museum-branding versions get exposed and the vendors selling them either build the actual audio tour underneath or fade. The third one (tour-on-demand) is the most fragile; the moment a visitor gets a confidently wrong answer at a high-profile institution, the press cycle around that is going to scare the category, and the vendors who built around the model-as-curator approach are going to spend a year retrofitting curator-approval gates.

What I think survives is the second product — AI-assisted drafting with curator review and AI voicing — because it captures the cost and speed advantages of the category while preserving the editorial accountability that museums actually need to defend interpretation to a board. This is what we build. I would prefer to be wrong about it being the right answer, because being wrong would mean the category is more flexible than I think; I do not, in fact, think I'm wrong.

If you are evaluating "AI-narrated audio guides" right now, the most useful thing you can do is figure out which of the four each vendor is actually shipping. The marketing language won't tell you. The four questions above will.

About the author

Eric Duffy is the founder of Convo, a platform that helps museums and cultural institutions publish multilingual audio tours their visitors can have a conversation with. He writes about how to evaluate the AI audio-guide category from inside it. Reach him at eric@convo.app or on LinkedIn.