Most American museum audio guides are English-only. A growing number ship two or three languages. Almost none ship the ten or twelve that would actually match the buildings their visitors walked in from. The gap is a budget story — until very recently, the production math made multilingual interpretation a stretch goal — but it's also a positioning story. Museums that haven't thought hard about who their non-English-speaking visitors actually are tend to default to a thin "tourist track" framing that misses most of the audience.
This piece is the visitor-demographics piece of our multilingual interpretation pillar. It covers who the non-English-speaking audience really is, what the data says about reach and demand, why first-language interpretation matters for emotionally significant material even when the visitor speaks fluent English, and what a serious program design looks like. I'm Eric Duffy. I run Convo, a platform that produces multilingual audio tours for museums. The framing below reflects what we've heard from curators and visitor-experience directors in roughly a hundred discovery conversations over the past year.
Who counts as a non-English-speaking museum visitor?
Three groups, and conflating them is the most common framing mistake. International tourists are visitors traveling on a foreign passport, here for a trip, often with limited English. They're the easiest group to picture and the smallest of the three. Residents who speak a non-English language at home are the largest group by a wide margin — roughly 68 million people in the US, per the 2018–2022 ACS. Many speak English well enough to handle the visit in English; many would still prefer not to. Limited-English-proficient residents — around 29.6 million people — are a subset of the second group for whom English-only interpretation is functionally inaccessible.
The reason the framing matters is that the three groups want different things from a tour. Tourists usually want a confident orientation in their travel language and don't expect cultural perfection. Bilingual residents often want the depth they'd get in English, delivered in the language their relationship to art was formed in. LEP residents need the basics done right or the visit collapses. A program designed only for tourists treats the resident audience as an afterthought. A program designed only for LEP visitors underestimates what bilingual residents will actually use.
How large is the non-English-speaking audience at most US museums?
Larger than the visitor surveys make it look, because the surveys are usually conducted in English. The US Census 2018–2022 ACS puts the national share at 21.7% of the population age 5 and older — about 68 million people. The composition is dominated by Spanish (61.1% of non-English speakers, roughly 13.3% of the total US population), followed by Chinese including all dialects (5.1% of non-English speakers), Tagalog (2.5%), Vietnamese, and Arabic.
The numbers in major museum cities are dramatically higher. The American Alliance of Museums' 2022 brief on multilingual audiences cites 49% in New York, 59% in Los Angeles, and 36% in Chicago. Add international visitors and the operative share of any visit on a peak weekend in those cities is probably higher than the in-building survey will tell you — bilingual visitors fill out the English survey because that's the survey.
A practical test: if a director walks the building on a Saturday and listens, the share of conversations that aren't in English at most major US museums is far higher than the share of the audio guide catalog that isn't in English. The interpretation program lags the audience by something like a generation.
How many international tourists actually visit US museums?
Enough that the headline numbers should reframe how directors talk about international audiences. The National Travel and Tourism Office reports 72.4 million international visitor arrivals to the US in 2024 — up 9.1% over 2023 and within 9% of the 2019 peak. Per the NTTO Survey of International Air Travelers, 29.8% of overseas visitors went to an art gallery or museum during their trip, and 35.6% visited a national park or monument. The top source markets for 2024 were the United Kingdom (4.0 million), India (2.2 million), Germany (2.0 million), Brazil (1.9 million), and Japan (1.8 million).
In the cities where most international travel concentrates, the volume is concrete. New York City hosted 13 million international visitors in 2024, alongside 51.3 million domestic visitors, per NYC Tourism + Conventions. The top international source markets for NYC specifically: the UK (1.1 million), Canada (1.0 million), France, Brazil, and Italy. Los Angeles, Miami, and Washington show similar patterns at lower volumes. The implication is direct: a major-city museum that ships English plus Spanish is leaving the Mandarin, French, German, Italian, Japanese, and Portuguese audiences with whatever Google Translate produces from the wall text.
What languages should a US museum prioritize?
Start with Spanish. Spanish is by a wide margin the largest non-English language in the US and the most-requested addition in the discovery conversations we've had. After Spanish, the answer is local: Mandarin in San Francisco and New York, Korean in LA, French and Haitian Creole in Miami and New York, Arabic in Detroit and Dearborn, Russian and Bengali in NYC. NYC's Mayor's Office of Immigrant Affairs reports that Spanish, Chinese, Russian, Bengali, and Haitian Creole are the top five languages spoken by immigrant New Yorkers (MOIA 2024).
For international tourism, the languages of the top source markets are well known and largely overlap with the resident demand: Spanish, Mandarin, French, German, Italian, Portuguese, Japanese, Korean, Arabic, and English. That set covers most international visitors to most US museums and most non-English-speaking residents of most US cities at the same time. It happens to be the set we ship at Convo. It is not a coincidence — it's what a serious modern interpretation program looks like.
For a deeper treatment of the language-count question — including how to decide which to launch first and when to add more — see how many languages does a museum audio guide actually need.
Why does first-language interpretation matter when the visitor speaks fluent English?
This is the part of the case most museums haven't fully worked through, and it's the strongest argument for shipping more languages than the tourist-track framing would suggest. A meaningful share of bilingual residents speak English at work, on the subway, with the school principal — and would still rather hear a piece of interpretation about a painting from their grandmother's country in the language their grandmother spoke. The reason isn't comprehension. It's that emotional material lands differently in a first language than in a second.
The neurolinguistic literature is clear about this. The 2017 Frontiers in Psychology study on embodiment and emotional memory found that emotional memory effects — the enhanced recall of emotionally charged words versus neutral ones — were significant in late bilinguals' first language but effectively absent in their second. Skin conductance responses to emotional words were significantly stronger in L1 than L2 (t = 2.2, p = 0.04). Participants also showed a strong bias to incorrectly categorize slightly emotional L2 words as neutral. Translated into museum terms: the same line of interpretation about grief, beauty, ancestry, or place lands with measurably more force in the visitor's first language than in their fluent second one.
This is the under-told case for multilingual audio. Tourist tracks make the case on accessibility. The first-language case makes it on what a museum is actually for — the moment when an object becomes a personal artifact rather than a public one. A program designed only for English-comprehension misses that moment for a large share of its bilingual visitors.
Is translation enough, or does the audio need to be localized?
Translation is the floor. Localization is what makes the floor feel like a tour and not a translation. The 2022 AAM brief on multilingual audiences lays out the working principle for institutional translation programs: translators should be involved from the start, not at the end; they need spatial and curatorial context, not just text; the final product should be reviewed by a separate native speaker before it ships.
The dimensions that matter most for audio specifically:
- Register. A line that reads warmly in English can read clinically in literal Spanish, or stiffly in literal Japanese. A localized track adjusts register to match what the language actually does.
- Idiom and reference. Museums in the US lean on cultural references that don't travel — "the Gilded Age," "Reconstruction," sports metaphors. The localized line either explains the reference or replaces it with one that lands.
- Pronunciation. Foreign names, technical terms, and place names need to be pronounced as a native speaker would expect to hear them. A Mandarin track that mispronounces a Chinese artist's name is worse than no Mandarin track at all.
- Length and pacing. Spanish is roughly 25% longer than English at equivalent content density; Japanese and Korean shorter. A localized track is paced for the language, not the English source.
- Review by a native speaker. Non-negotiable. The reviewer doesn't have to be a professional translator — a bilingual staff member, a docent, a community partner all work — but a human who speaks the target language at native-fluent level should listen to the whole track before it ships.
This is the workflow that separates a serious multilingual program from a bulk-translated one. The production economics of AI-narrated tours make all ten languages affordable; the editorial discipline of native review makes them good.
How does a phone-based audio guide platform change what's feasible?
The production math is what shifted. The traditional studio model added a language at roughly 60–80% of the original English production cost — a translator, a native voice actor, a studio booking, an editor, and a mastering pass per language. Three languages was the realistic ceiling for a mid-size institution; ten languages was a vendor pitch that never closed.
A modern AI-narrated phone-based platform re-voices an entire approved English script across all ten languages in roughly a minute, with no additional studio cost per language. The delivery is QR code to the visitor's own phone, which means the visitor's device renders the right language by default and the museum doesn't carry a multilingual handset inventory. The cost of shipping ten languages is now the editorial review time per language — not the production cost.
For a fuller comparison of what AI-narrated tours change versus the traditional studio model, see AI audio guide vs traditional audio guide. For the broader argument about how the audio guide stops being the product when the production constraint lifts, the note on the audio guide is not the product makes the case.
What about accessibility for visitors who can't read wall text?
The audio layer is the accessibility layer for a wide range of visitors, not just non-English speakers. Blind and low-vision visitors rely on it. Visitors with reading-related disabilities — dyslexia, low literacy in any language, age-related vision loss — rely on it. Visitors whose first language is a writing system the museum doesn't print but does narrate (Mandarin, Arabic, Hebrew, Russian) rely on it. Multilingual interpretation and accessibility interpretation are two faces of the same program.
The full accessibility treatment lives in Pillar 4 — see the accessibility and inclusion hub for the legal floor, the WCAG framing, and the audio-description discussion. For the framing on serving multiple audiences from a single curatorial corpus, the note on one collection, many audiences is the companion piece.
What does a serious multilingual program actually look like?
Five elements, in roughly the order an institution should sequence them.
- Audience inventory. Pull the languages spoken at home in the museum's metro area from the ACS, layer the top international source markets from the city's tourism bureau, and decide on a target language set. For most major US cities, the set is the ten we listed above. For a regional or rural institution, it may be three.
- A canonical English script that the curatorial team has reviewed and signed off on. This becomes the source of truth that all other languages are produced from.
- A reviewer per language before launch. A native-fluent speaker who listens to the whole track and flags pronunciation, register, and tone. Often a docent, a community partner, a board member, or a hired specialist.
- A correction workflow. What happens when a Mandarin-speaking visitor emails to say a line is wrong? Who fixes it, how fast, and how does the correction propagate to the audio? On a modern platform this is minutes; on a legacy studio model it's a rebooking.
- Wayfinding that surfaces the languages. A QR sticker that says "Available in 10 languages / Disponible en 10 idiomas / 10 种语言" is the difference between a feature shipping and a feature being used. The signage discussion belongs in Pillar 5 — visitor experience when those pieces ship.
The pattern across the institutions that have done this well: treat the non-English-speaking visitor as a primary audience, not an accommodation. The interpretation program designed for them ends up serving everyone better.
Frequently asked questions
Continue reading
This piece is part of the multilingual interpretation pillar. The companion piece on how many languages a museum audio guide actually needs is the most useful next read for institutions sizing the program. For the broader argument about serving different visitor types from a single curatorial corpus, the note on one collection, many audiences is the place to go. For the production-economics piece — why ten languages used to be impossible and now isn't — start with AI audio guide vs traditional audio guide.
About the author
Eric Duffy is the founder of Convo, a platform that lets museums and cultural institutions publish multilingual audio tours their visitors can have a conversation with. He writes about the visitor demographics, production economics, and program design of museum interpretation, drawing on discovery conversations with curators, directors, and education leads at small and mid-size US museums. Reach him at eric@convo.app or on LinkedIn.