Captions, transcripts, and hearing accommodations in museum audio guides.

KEY TAKEAWAYS

Captions, transcripts, and ASL video are three different accommodations for three different audiences. Captions help hard-of-hearing visitors who can still parse synchronized text; transcripts help anyone who'd rather read than listen; ASL video serves Deaf visitors for whom English is a second language. Most institutions need all three.
The quiet structural win of an AI-narrated platform is that the script exists in text by construction. A transcript isn't a deliverable to budget for — it's a byproduct. Default it on.
Hearing loops with T-coil are still the right answer for fixed listening positions (an auditorium, an orientation theater). Phone-based playback with Made-for-iPhone hearing aids or Bluetooth LE Audio handles ambulatory tours, and most modern hearing aids support both.
ASL is a fundamentally different language, not a translation of English. AI can generate captions and translate text well; it cannot produce ASL video. For high-priority objects, plan to film a Deaf interpreter.
Roughly 15% of US adults — about 37.5 million people — report some degree of hearing difficulty (NIDCD, 2024). Among visitors over 65, the share is closer to one in four. Hearing accommodations are not a niche accommodation; they're a quarter of your audience over 65.

Audio tours have a built-in problem: they're audio. For a meaningful share of any museum's audience, that's the wrong delivery mechanism by default. This piece walks through how a serious institution actually accommodates Deaf and hard-of-hearing visitors in audio interpretation — what the three accommodations are, when each is required, what hardware fits where, where AI legitimately helps, and where it doesn't. It's written for the curator or accessibility lead deciding what to build into a new audio program, not for a vendor pitch.

I run Convo, an AI-narrated audio guide platform, so I have a stake in this. I've tried to keep the recommendations platform-neutral where I can and call out the Convo-specific claim when I can't.

How big is the hard-of-hearing audience, really?

It's a quarter of your visitors over 65, and it's only growing. The headline statistic from the National Institute on Deafness and Other Communication Disorders is that about 15% of American adults — roughly 37.5 million people — report some degree of hearing difficulty (NIDCD, 2024). That share rises sharply with age: 5% of adults 45–54, 10% of those 55–64, 22% of those 65–74, and 55% of those 75 and older (NIDCD, 2024). The Hearing Loss Association of America puts the total at more than 50 million Americans with some degree of hearing loss — roughly one in seven people, more common than diabetes or cancer (HLAA, 2024).

Museum audiences skew older than the general population. If your over-65 visitor share is 30%, then roughly 7% of all visits include someone with measurable hearing loss before you've even counted hard-of-hearing visitors of other ages. The accommodation isn't optional in any practical sense — it's the difference between an audio program working for your audience and silently excluding a quarter of its oldest segment.

Captions vs transcripts vs ASL: what's the difference?

They serve three different audiences and aren't interchangeable. Curators often collapse these into "the accessibility stuff" and then under-deliver on all three. The distinctions matter.

Captions are time-synchronized text that appears as the audio plays. They serve hard-of-hearing visitors who can still parse synchronized speech and want to follow along, and they serve hearing visitors in noisy galleries or in a second language. WCAG 2.1 AA treats captions as the baseline for any prerecorded audio in a digital product (W3C, 2018).

Transcripts are the full text of the audio, untimed, scrollable. They serve visitors who'd rather read than listen — Deaf visitors, visitors in a quiet gallery without headphones, visitors who want to skim ahead, visitors who want to re-read the part about the provenance question. The ADA's effective-communication guidance specifically names "a printed script of a stock speech (such as given on a museum or historic house tour)" as an effective auxiliary aid (ADA.gov, 2024). Transcripts are also what AI assistants and screen readers consume — the same artifact serves accessibility and SEO and AI citation at once.

ASL video is a recorded American Sign Language interpretation of the content, ideally produced by a Deaf interpreter. It serves culturally Deaf visitors for whom English is a second language, not a first. This is the accommodation most museums underfund, and the one AI cannot produce.

Why transcripts should be the default on AI-narrated platforms

Because the transcript exists before the audio does. This is the quiet structural advantage of an AI-narrated audio guide that almost no one talks about. In a traditional studio model, the audio is the canonical artifact — the script existed once, in a Word document, and the audio is what got mastered and delivered. Transcripts are a separate workstream that often doesn't get done because nobody owns it.

In an AI-narrated platform, the script is the canonical artifact. The audio is generated from it. The transcript isn't a deliverable to budget for; it's a byproduct of the workflow. There's no marginal cost to surfacing it in the visitor interface, and no marginal cost to keeping it in sync — when the curator edits the script and re-voices, the transcript updates automatically. Phone-based platforms in the category should be defaulting transcripts on for every stop, in every language.

The implication is practical: if you're evaluating audio guide vendors, "is the transcript visible to visitors by default, in every language, without curator effort" is a one-question test of whether the platform takes accessibility as a feature or as paperwork. If the answer involves a separate upload step or a per-stop checkbox, the answer is the second one.

What about hearing loops and T-coil?

Loops still belong in fixed listening positions; they don't ride well on phone-based tours. A hearing loop is a wire installed around a defined listening area that transmits an audio signal as a magnetic field. About 70% of modern hearing aids contain a telecoil (T-coil) that picks up that field directly, bypassing ambient noise (HLAA, 2024). For a hearing-aid wearer sitting in an orientation theater or auditorium, a loop is the cleanest possible delivery: the audio goes straight into their hearing aids with no background noise, no headphones to fit over the device, no fumbling with a borrowed listening unit.

Loops are fixed infrastructure, though. They cover a defined room, not a roaming tour. For ambulatory audio guides — the gallery-walking, stop-to-stop format that most museum audio actually is — loops are the wrong shape. What replaces them is direct phone-to-hearing-aid streaming.

Where to use what:

Auditoriums, orientation theaters, film rooms, ticket counters. Install a hearing loop. This is settled hardware with decades of standards behind it.
Walking audio tours. Rely on the visitor's phone and their hearing-aid pairing (covered below), with on-screen captions and a visible transcript as the universal fallback.
Front-desk service counters. A small counter loop is cheap, ADA-friendly, and a meaningful signal that the institution takes hearing accommodation seriously.

How does phone-based playback work with hearing aids?

On iPhone, through Made-for-iPhone (MFi) pairing or Bluetooth LE Audio; on Android, the same plus the newer Auracast standard. A Made-for-iPhone hearing aid pairs with the visitor's iPhone the same way a Bluetooth headset does, but with native iOS-level integration — they appear under Settings → Accessibility → Hearing Devices, and audio from any app routes through them directly (Apple, 2024). For a phone-based audio guide running in the browser, the audio routes to the visitor's hearing aids automatically with no extra steps. The visitor experience is: open the tour URL, tap play, hear the narration in their hearing aids at the level they've already tuned for their own hearing profile.

This is the part the legacy model couldn't match. A rented handset with a borrowed headphone is a poor delivery mechanism for someone wearing hearing aids — the headphones either don't fit over the device, leak feedback, or override the wearer's tuning. A phone-based tour inherits a delivery channel the visitor has already optimized for their own ears.

Practical implication: when you build signage that explains the QR code and the tour, include one line — "Compatible with hearing aids paired to your phone." That sentence reaches a meaningful share of your over-65 audience and costs nothing.

Where does ASL video fit?

On your highest-priority objects, for the audience that needs it most — and you have to film it. This is the accommodation most institutions underdeliver on, partly because ASL video is genuinely expensive to produce and partly because curators conflate "captions" with "Deaf accessibility" and stop there. They aren't the same thing.

American Sign Language is a distinct language with its own grammar and syntax, not a signed version of English. For culturally Deaf visitors, English is often a second language, and reading English captions is closer to reading a translation than reading their native tongue. ASL-interpreted content is the equivalent accommodation, and the institutions doing it well — the Met, the Smithsonian American Art Museum, the New-York Historical Society, the Denver Art Museum — produce ASL tours and gallery talks with Deaf educators rather than hearing interpreters retrofitted to the work (American Alliance of Museums, 2024).

What this means practically: don't try to ASL-interpret your entire 80-stop tour. Pick 8–15 highest-priority objects, plan a video shoot with a Deaf interpreter, and host the videos at the same stops as the audio. The cost lives in the production, not the hosting.

Where AI helps with hearing accommodation — and where it doesn't

AI legitimately helps with captions, transcripts, and language coverage. It cannot produce ASL. Worth being precise about this, because the category is full of overclaims.

What AI does well:

Live transcript generation. The script exists in text by construction. Surfacing it as a live, synchronized transcript at the visitor's stop is a small UI problem, not an AI one — but AI made the script-by-construction workflow possible in the first place.
Caption translation. A caption track in English can be machine-translated into the other nine languages a serious platform supports, at no marginal cost beyond review. The same neural-translation models that produce the audio script produce the caption text.

What AI cannot do:

Produce ASL. ASL is a different language, in a different modality, with a different syntax. The handful of research demos producing avatar-style sign language from text are not at a quality bar where any serious institution would ship them. Plan a video shoot.
Interpret tone, irony, or rhetorical emphasis in captions reliably. The captions are accurate to the script, which is what matters; but if you're relying on captions to carry the curatorial voice for a Deaf visitor, you're underdelivering. ASL video is the right shape.
Make hardware accessibility decisions for you. Whether to install a hearing loop in your auditorium is an architecture question, not an AI question. The platform helps you ship transcripts and captions; it doesn't replace the building's accessibility plan.

Where this doesn't fit

A few honest caveats. Phone-based playback assumes the visitor brought a phone that can pair with their hearing aids. Most can; some can't. A loaner-phone fleet at the front desk closes that gap for a small fraction of visitors, and it's worth funding. Hearing loops are the right answer in fixed-listening spaces and the wrong answer for ambulatory tours; don't try to make one solve the other. And no AI-narrated platform replaces the work of producing real ASL content with Deaf collaborators for your most important objects. If your accessibility strategy stops at "we have captions," you've cleared a floor but you haven't built a program.

What's the right next step?

If you're starting from zero, the practical order is: turn on transcripts and captions for every stop in every language (this should be a default, not a project); add one line to your QR-code signage about hearing-aid compatibility; audit your fixed listening positions for hearing-loop coverage and install where missing; then plan a focused ASL video shoot for 8–15 priority objects with a Deaf educator. For where this fits in a wider accessibility program, see our accessibility pillar guide, the spoke on WCAG and audio guide web apps, and the spoke on ADA requirements for museum audio guides.

FAQ

For most prerecorded museum audio content, yes — the ADA explicitly names "a printed script of a stock speech (such as given on a museum or historic house tour)" as an effective auxiliary aid. Where a visitor specifically requests a sign language interpreter for a live or interactive program, the ADA requires the institution to give primary consideration to that request unless an equally effective alternative exists. Captions and transcripts handle the audio guide; live programs are a separate workstream.

Visible by default. The "accessibility menu" pattern treats text as a special accommodation rather than a parallel channel, which is the wrong frame. Hearing visitors also use transcripts — to skim, to read in noisy galleries, to share with a companion. Defaulting them on serves Deaf and hearing visitors at once and signals that the institution treats reading as a first-class way to engage with the tour.

For audio-only stops, the question is moot; you're rendering text alongside an audio player, not subtitling video. For ASL video clips at high-priority stops, open captions (always-on, baked into the file) are more reliable than closed — they survive being shared, embedded, or downloaded, and they don't depend on the player respecting a track selection.

Yes, in your auditorium, orientation theater, and ticketing counter. The two solve different problems. Phone pairing covers ambulatory tour content; a fixed hearing loop covers fixed listening positions where the visitor is seated for a longer stretch and not all visitors will have brought their phone or paired it. The two together cover the building.

Nobody films all 80. The realistic shape is to pick 8–15 highest-priority objects — the ones the institution itself would call its core — and produce ASL video for those, with a Deaf educator presenting. That number is defensible, fundable, and a meaningful program rather than tokenism. Expand from there based on visitor feedback and grant cycles.

No — at least not at a quality bar a serious institution should ship. ASL is a separate language with its own syntax and a strong cultural expectation that signed content is produced by Deaf signers. The handful of avatar-based sign generation demos are research artifacts, not production tools. Film with a Deaf educator. The AI helps with the captions, transcripts, and language coverage around the ASL video, not with the ASL itself.

Captions and transcripts should be included — they're a byproduct of the script-first workflow, not a line item. If a vendor charges separately for "accessibility features" on a phone-based platform, that's a category-wide red flag. The real costs are the ones that exist outside the platform: filming ASL video (the largest), installing hearing loops in fixed-listening spaces, and the loaner-phone fleet at the front desk for visitors without a compatible device.

The verdict

Hearing accommodation in museum audio interpretation has historically been a workstream that didn't get done — a separate budget, a separate vendor, a deliverable that got cut in the second pass on the scope. The script-first workflow of an AI-narrated platform quietly removes the largest excuse: the transcript and the caption track exist as a byproduct of producing the audio at all. What remains is the work that always was the real work — installing loops where listening is fixed, signaling phone-and-hearing-aid compatibility on every piece of signage, and filming ASL video with Deaf educators for the objects the institution most cares about. None of those are AI problems. The AI just stops them from competing for budget with making the tour exist in the first place.

For where this fits in a broader accessibility program, see the accessibility pillar guide. For the legal frame, the spokes on WCAG and ADA requirements cover the standards in detail. For the production economics that make all of this affordable, our pricing is on one page.

About the author

Eric Duffy is the founder of Convo, a platform that lets museums and cultural institutions publish multilingual audio tours their visitors can have a conversation with. He writes about museum interpretation from inside the category — drawing on RFP data, discovery calls with curators and accessibility leads, and the production economics of phone-based audio programs. Reach him at eric@convo.app or on LinkedIn.