MULTILINGUAL

Speak to every visitor in their first language.

Multilingual is the single dimension where AI changes the math most. Legacy studio production multiplies cost and time linearly per language. Convo adds the next nine from one approved English source — same script, regenerated and re-voiced in about a minute. This page is for the director or visitor-experience lead who needs to make the case that the museum can serve non-English-speaking visitors at scale without re-recording.

WHAT CONVO SHIPS

Ten languages, from one approved source.

Convo ships ten languages today: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, and Arabic. The institution writes and approves the tour once, in English. The other nine come from that source — same scenes, same stops, same edits, re-voiced in roughly a minute whenever the English changes.

That sentence is doing a lot of work, so it’s worth being concrete about what it means in production. A curator finishes a round of edits on the English script at 10:14am. They press regenerate. By 10:15 the same stop exists in ten languages, voiced in the institution’s chosen voice for each. No studio is booked. No talent is contracted. No translator is emailed a copy of the new file and asked to turn it around by Thursday. The change propagates because the source did.

For the institution, the consequence is structural: multilingual stops being a budget line item to be defended at the annual planning meeting and starts being the default. You no longer choose which exhibits get translated. They all do, because translating one and translating ten cost roughly the same.

ENEnglish
ESSpanish
FRFrench
DEGerman
ITItalian
PTPortuguese
ZHChinese
JAJapanese
KOKorean
ARArabic

The authoring workflow that produces this is covered in more depth on the authoring page. The shorthand: edits to the English source are the unit of work. Everything downstream — the other nine languages, the audio, the visitor app — refreshes from there.

THE LANGUAGE LIST

Why ten, not fifty.

The underlying speech models support far more than ten languages. We stopped at ten on purpose. The constraint isn’t technical. It’s editorial.

The ten we ship cover the languages that actually appear in American and European museum visitor data with any regularity: the dominant Romance languages, German, the three East Asian languages that drive most inbound tourism, and Arabic. Past that, the per-language review burden starts to outrun the per-language audience. A museum that adds Tagalog gains very few Tagalog-speaking visitors who don’t also have English; a museum that adds Swahili gains almost none. The languages we shipped are the ones where the audience exists at meaningful scale.

The other constraint is review. Every language that goes live needs someone who reads it well enough to catch the things the model gets subtly wrong — honorifics, religious terminology, the institution’s house style for a given artist. Ten languages is already at the edge of what most museums can vet internally; thirty is fantasy. We’d rather ship ten that the institution actually reviewed than thirty it didn’t.

The longer version of this argument, with the demographic data, sits at how many languages a museum audio guide actually needs. It’s the question we get asked most often by directors evaluating us against a vendor that brags about forty.

WHERE THE MODEL ENDS

Translation versus localization.

This is the honest section. AI handles a lot of multilingual work well now. It also doesn’t handle some of it, and the line between the two matters more for a museum than for almost any other kind of customer.

What current models are very good at: literal accuracy, register, the parallel structure of an audio script. A walk-through of a Greek vase, a wall card about an artist’s early period, a transition between two paintings in a room — the translation carries those without difficulty. The text is not stilted. The voicing matches the original cadence. If the English reads like an institutional voice, the other nine do too.

What still needs human review: sacred and religious terminology, where the right word depends on tradition and audience; named-entity transliteration, where conventions differ between regions and academic disciplines and house style; region-specific cultural framing, where the same factual sentence reads neutral in one country and pointed in another. A model can produce a defensible default for all three. None of those defaults should ship without an institution’s explicit approval.

The workflow Convo encourages: write and approve the English source carefully, then have a reviewer who reads each target language go through that language’s output and edit. When they edit, the change persists on regeneration — the next round of edits to the English doesn’t blow their corrections away. The platform makes review cheap; it doesn’t pretend the review is unnecessary.

The deeper argument is in translation versus localization. If you’re the person on the museum side responsible for not shipping something embarrassing, that piece is the one to read.

THE VISITOR ARGUMENT

What this changes for visitors who don’t read English.

The case I make to directors is this. A museum visitor arriving with limited English has traditionally received the visual collection plus the wall text and very little else. The audio guide — when it exists — has been in English, maybe Spanish, maybe one or two more. The educator-led tour, if scheduled, has been in English. The exhibition catalog they could afford to take home is in English. The institution is, in effect, fluent in their eyes and mute in their ears.

Multilingual narration is the first interpretation layer that meets a non-English-reading visitor where they are. Not because the platform is clever about translation, but because the math finally lets the institution do it without choosing which language gets cut. A Korean grandmother visiting with her grandchildren can hear the same story, in Korean, that English-speaking visitors hear in English. A French exchange student can ask the visitor guide a follow-up question in French and get an answer grounded in the same curator-approved source. The same tour, ten ways.

The visitor-side mechanics are covered in more detail on the visitor Q&A page. The point I want to make here is the audience one. The museum has always had visitors arriving in their second or third language; what has changed is that the institution can finally speak to them in their first.

This is the argument I unpack at length in the essay one collection, many audiences. The summary: a single collection has always had many audiences. What’s new is that the audio interpretation layer can finally have many too.

WHERE THIS ISN’T THE RIGHT ANSWER

Where the multilingual case doesn’t fit.

Two cases. The first is the institution whose audience is overwhelmingly mono-lingual local — a regional historical society in a small town, a university museum whose visitors are almost entirely the student body, a tribal cultural center serving a specific community. If the visitor data shows that nine of the ten languages we ship would be heard by under a percent of visitors apiece, the multilingual case is a thinner one. The platform still works, but the argument that justifies it is a different argument — production speed, conversational Q&A, accessibility — not languages.

The second is the production where a single named voice is the curatorial point. A retrospective narrated by the artist themselves. A poet reading their own work. A scholar with a recognizable cadence that an exhibition has been built around. In those productions, the voice is the message, and re-voicing it in nine other languages — even very well — is not what the institution wants. Convo can sit alongside a production like that and handle the other galleries; we don’t pretend it should replace the named voice on the marquee one.

The right way to think about Convo’s multilingual case is institutional rather than per-stop. It’s the answer when the question is: can we serve every visitor in their first language, across the permanent collection, without re-recording every time we change the script? The answer is yes, and that hasn’t been true before.

Pricing for the multilingual program — which is bundled into the same monthly tiers as the rest of the platform, not charged per language — is on the pricing page. The deeper category context, with audience data and vendor comparisons, sits at the multilingual interpretation hub.

COMMON QUESTIONS

What directors ask about going multilingual.

The ten shipped today are English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, and Arabic. Other languages are something we add as customers ask for them — the underlying models support far more than ten. If the language you need isn’t on the list, tell us; we’ve built around expanding it on request rather than gating the whole product behind a pre-set menu.
You decide. Convo produces the translations, but a translation only goes live when a person at the institution marks the script as approved. Most museums route translation review to whoever already vets multilingual wall text — an education manager, a docent, or an outside consultant for the languages no one on staff reads. We don’t pretend the platform replaces that judgment.
The difference is structural. Studio production charges per language per minute of finished audio — adding Spanish doubles the bill, adding ten multiplies it. Convo doesn’t charge per language: the ten languages come bundled, and adding or updating them after launch doesn’t trigger a new production cycle. The cost lives in the platform, not the language count.
Yes. The language picker lives in the visitor web app and follows them across stops. A family with a Korean-speaking grandparent and an English-speaking grandchild can hand the phone back and forth and the tour state persists. Questions asked of the visitor guide are answered in whatever language the question was asked in.
Most of the time, yes — but this is exactly the kind of thing review exists for. Named entities, transliterated names, the convention your institution uses for a particular artist or dynasty: these are decisions, not errors. When the translation surfaces something the institution has a house style for, you edit it on the script and the change applies on the next regeneration. We don’t pretend to know your house style; we make it cheap to enforce it.
Two paths. Either contract a reviewer for that language — the work is small enough (a few hours, total, for a typical pilot) that it’s within reach of most education budgets — or limit yourself, at first, to the languages someone on staff can vet. Better to ship five carefully reviewed languages than ten unreviewed ones. The platform doesn’t force the question.
WHAT WE’RE ASKING

Pick one gallery.
We’ll ship it in ten languages.