BACK TO MULTILINGUAL INTERPRETATION
PILLAR 03 · MULTILINGUAL INTERPRETATION

Translation vs localization for cultural interpretation.

Translation moves words across a language boundary; localization moves meaning across a cultural one. For museum audio, the difference decides whether a visitor in their second language feels addressed or merely accommodated.

ERIC DUFFY·FOUNDER·11 MIN READ·UPDATED 2026-05-29

A museum I won't name spent eighteen months and a six-figure budget producing a Mandarin version of its permanent collection tour. The translation was technically correct. The first Mandarin-speaking focus group politely told the curator that the tour referred to one of the museum's signature works using a phrase no Chinese speaker would ever use to describe that kind of object, that the dates were given in a format Chinese students stop using around middle school, and that the recurring word "we" — meant as the institutional voice — read as oddly familiar in Mandarin without the formal pronoun the register required.

Nothing in that critique is a translation failure. Every word was correctly translated. It is a localization failure: the words moved across the language boundary, but the meaning didn't fully move across the cultural one. This is the distinction that decides whether your second-language visitor feels addressed or merely accommodated, and it's the distinction that AI tooling in 2026 is both very good at and surprisingly bad at, depending on which row of the work you're looking at.

I run Convo. We voice tours in ten languages from one source. The piece below is the framework we use internally and the one we recommend to curators evaluating any platform — ours or otherwise. For the broader pillar, see our guide to multilingual interpretation.

What's the difference between translation and localization?

Translation moves words across a language boundary. Localization moves meaning across a cultural one. Translation answers "what does this sentence say in Mandarin?" Localization answers "what would a Mandarin-speaking visitor need to hear, in what register, with which references, to take away what an English-speaking visitor takes away from the same stop?"

In practice, localization is translation plus a set of additional decisions: which register to address the visitor in (formal vs. informal, distant vs. intimate), how to handle idiomatic art-historical terms that don't have a clean equivalent in the target culture, how to render dates and measurements in conventions the listener actually uses, how to treat religious vocabulary with the appropriate gravity for the target audience, and how to handle proper names — when to transliterate, when to use a culturally-established equivalent, when to leave the original. Localization is what professionals in the field mean when they say a translation "lands" or "doesn't land." It's the difference between a Spanish reader recognizing themselves in the text and a Spanish reader recognizing the English in the text.

The American Alliance of Museums' 2025 review of bilingual exhibit work at the Museum of Contemporary Art Chicago makes the same point: museums "already function as 'translation zones,'" and the choice of register, formality, and even whether to allow Spanglish are intentional, not incidental. The same applies in every other language pair.

Where is AI translation actually strong?

AI is now reliably strong on the high-volume, low-judgment parts of the work — and that's most of the work. Modern neural machine translation and LLM-based translation handle literal accuracy on declarative museum copy at a level that, for the bulk of an audio tour script, matches what a competent human translator would produce on a first pass. In the 2025 benchmark literature on culturally-aware machine translation, the top systems are at or near human parity on neutral expository text.

The specific things AI does well in 2026, in our experience and in the published evaluations:

  • Register matching. When you brief the system on a register — "formal museum voice, second-person plural, no contractions" — it stays in that register across thousands of lines more consistently than a team of human translators working in parallel.
  • Terminology consistency. A single model translating from one approved English source will render "Impressionism" the same way in every stop, every language, every time. Human teams across multiple translators frequently won't.
  • Literal accuracy on factual sentences. Dates, dimensions, artist names, provenance facts, attribution language — the parts of a tour that are simply true or false — translate cleanly.
  • Speed and parallelism. Re-voicing ten languages from one English source in roughly a minute is a category change in what a curator can afford to ship.

For a 30-stop tour where each stop is 90 seconds of narration, roughly 80–90% of the lines are exactly this kind of work. AI translation is genuinely good at it.

Where does AI translation still need human review?

The other 10–20% is the part that decides whether the tour feels localized or merely translated. The 2025 research is unusually consistent on which categories machines still get wrong. They cluster into four buckets:

  • Named entities with culturally established equivalents. "The Renaissance" translates literally into Mandarin as 文藝復興 (wényì fùxīng), which is the correct and standard rendering — but the concept a Chinese-speaking visitor brings to that word is shaped by a different intellectual history, and an audio tour that doesn't briefly anchor the term to its European context can leave a listener with the wrong frame. The literal translation is right. The interpretive footing isn't, unless the script accounts for it.
  • Religious vocabulary. Phrases like "the Holy Family," "the saints," "the Resurrection," "icon," and "altarpiece" sit at the intersection of art-historical and religious meaning. In Arabic, the literal translation of "the Holy Family" (al-ʿāʾila al-muqaddasa) is technically available, but the appropriate register, the level of theological precision, and the surrounding context that signals "this is a Christian art object, not a religious claim being made to you" all require judgment a default translation pass won't supply.
  • Calendar systems and dates. "Created in 1450 CE" is correctly translated into Arabic as a literal sentence. But Islamic art catalogs and Arabic-speaking audiences often expect Hijri dating (AH, Anno Hegirae) alongside or instead of the Gregorian. A 2026-era tour of an Islamic art collection that gives only CE dates reads as if the audio were produced for someone else and translated at you.
  • Idiomatic art-historical terms with no clean target-language equivalent. "Old Masters," "Golden Age," "the Sublime," "the picturesque" — these terms carry a load of European art-critical history that doesn't transfer cleanly. A machine translation will produce a literal target-language phrase. A localized version will gloss the term briefly, use a culturally established equivalent if one exists, or rephrase the underlying point.

The pattern across all four buckets is the same: the failure is not linguistic, it's interpretive. The translation is correct; the interpretation of the artwork has been written for one audience and handed unchanged to another.

What does a defensible 2026 localization workflow look like?

The right shape is AI for volume and human review for the 10–20%. This is the workflow we've watched land cleanly at the institutions that take multilingual reach seriously. It has three parts:

  1. An AI pass from one approved English source. The platform translates and revoices the whole tour from the curator-approved English script. This is where the speed and consistency advantage lives. The 80–90% of the script that is neutral expository writing comes through clean.
  2. A targeted human review on the 10–20%. A native-speaking reviewer with cultural-context expertise reads the target-language script focused on the four risk categories above — named entities, religious vocabulary, calendar systems, idiomatic art-historical terms. They are not retranslating; they are checking interpretation. This is hours of work per language per tour, not weeks.
  3. A change loop that doesn't require re-recording. When the reviewer flags a line, the curator edits the source-language script, and the platform regenerates the target-language audio in seconds. This is the structural advantage of an AI-narrated platform over studio production for multilingual work: fixing the line is a Tuesday, not a project.

For comparison: a traditional studio production of the same multilingual program — one English master plus three additional languages — adds roughly 60–80% of the original production cost per language, takes weeks per language to ship, and makes the "fix this one line" step prohibitively expensive. See our piece on re-voicing tours across languages for the full production-economics comparison.

How do you brief a vendor on localization, not just translation?

Skip the word "translate" in the brief and force the conversation about meaning. A useful brief for a multilingual audio vendor reads something like this:

We need a [target language] version of our tour that an [adjective for target audience] visitor would not have to mentally retranslate to follow. The audio should use [formal / familiar] register and [Hijri / Gregorian / both] dating where relevant. We will provide a native-speaking reviewer for a single focused pass on religious vocabulary, idiomatic art-historical terms, and named entities; please show us how your platform supports same-day script edits without re-recording.

The phrases that matter: would not have to mentally retranslate to follow (this names the goal), register (this surfaces a decision most platforms skip), and same-day script edits without re-recording (this names the operational requirement that makes the human-review loop affordable).

If a vendor responds with "we use professional-grade translation models" without engaging the register or edit-loop questions, they are selling you translation. If they engage both, they are selling you localization. The difference shows up in the focus-group conversation a year from now.

For the upstream question of how many languages to ship at all, see how many languages a museum audio guide actually needs.

Where the literal-translation approach is actually fine

Honesty section. Not every tour needs full localization on every language. The cases where a careful literal translation, lightly reviewed, is genuinely sufficient:

  • Wayfinding and operational copy. "The next stop is in the East Gallery, on your left." Literal translation, correctly handled for plurals and grammatical agreement, is fine.
  • Object-identifying lines. "This is a 16th-century oil-on-panel portrait of a Florentine merchant." Literal translation gets you almost all of the way.
  • Languages and audiences with high English-fluency overlap. A Dutch- or Swedish-speaking visitor who is choosing the Dutch or Swedish audio out of preference rather than necessity is typically fine with a competent translation that hasn't been deeply localized. The bar rises sharply when the second-language audio is the only thing keeping the visitor inside the tour.
  • Temporary exhibitions with short runs. A six-week show where translation costs need to be proportional to the run can ship with light localization on the high-risk categories and literal translation everywhere else.

The cases where you cannot skip localization: religious art collections, Islamic art collections, East Asian art presented to East Asian visitors in their first language, indigenous and Native American interpretation, and any tour where the source script makes recurring use of European art-historical idiom. In those, the localization pass is not a polish step — it's the difference between a tour the visitor experiences and a tour the visitor decodes.

FAQ

For the 80–90% of a museum tour that is neutral expository writing — object descriptions, wayfinding, art-historical framing in the source language — yes, the 2026 machine-translation literature supports it. For the remaining 10–20% — religious vocabulary, calendar systems, idiomatic terms, named entities — no, and the failure modes are the kind that focus groups politely point out a year after launch. A native-speaking reviewer doing a focused pass on those categories is hours of work, not weeks, and it's what separates a translated tour from a localized one.

Per language. The judgment calls that matter — register, religious vocabulary, calendar systems, idiomatic equivalents — are language-specific and culture-specific. A bilingual Spanish-English staffer cannot meaningfully review the Mandarin script even if they are an art historian. The good news: the review is targeted (the four risk categories, not the whole script), so the time commitment per reviewer is manageable.

Less well, and you should ask vendors specifically about the languages you need. The major European and East Asian languages (Spanish, French, German, Italian, Portuguese, Mandarin, Japanese, Korean) and Arabic are well-covered in modern models. Less-resourced languages — many indigenous languages, some African and South Asian languages — still require more human work per stop. Convo ships ten languages today; institutions that need more should brief vendors carefully on the specific languages and audiences.

Usually yes. Most platforms accept existing translations as reference material — either as the canonical target-language script (and the AI just voices it) or as a baseline that the platform translates from and the curator edits. The biggest gain from migrating is usually update agility: corrections that took months to ship in the old workflow ship in seconds in the new one.

For a 30-stop tour, our experience is roughly two to six additional hours per target language for the focused human review on the high-risk categories. Compared to the alternative — studio production per language, weeks per language — it is a small marginal cost on top of an AI-narrated baseline.

The verdict

Translation is necessary; localization is the bar. AI tooling in 2026 is genuinely good at the volume work and genuinely weak at the four interpretive categories — named entities, religious vocabulary, calendar systems, idiomatic art-historical terms — that decide whether a second-language visitor feels addressed in their first language or merely accommodated. The right workflow is not "AI does translation, humans don't" or "humans do translation, AI helps." It is: AI does the pass from one approved English source, a native-speaking reviewer checks the high-risk categories, and the platform regenerates without re-recording. That workflow ships in weeks, not quarters, costs a fraction of studio production per language, and produces tours that actually land.

If you want the broader category map, the multilingual interpretation pillar collects everything we've written on the topic. If you'd rather start with numbers for your own institution, our pricing is published in full and the pilot tier is free.


About the author

Eric Duffy is the founder of Convo, a platform that lets museums and cultural institutions publish multilingual audio tours their visitors can have a conversation with. He writes about the economics and craft of museum interpretation from inside the category — drawing on RFP data, discovery calls with curators and directors, and the production economics of running a tour in ten languages from one source. Reach him at eric@convo.app or on LinkedIn.

WHAT WE’RE ASKING

Pick one gallery.
Give us two weeks.