If you work in museum interpretation or visitor experience, accessibility shows up in three different conversations that often get tangled. There's the legal conversation — what does the ADA actually require, and what changed in 2024? There's the standards conversation — what does WCAG 2.1 AA mean for a web-delivered audio guide? And there's the practice conversation — what does it actually feel like for a blind visitor, a Deaf visitor, or a visitor whose first language isn't English to take your tour?
This piece is the hub for our Accessibility & inclusion pillar. I'll try to keep those three threads separate without pretending they aren't related. I'll also flag, throughout, where I'm giving you a starting point versus where you need primary sources or counsel. This is a fast-moving area of law and practice, and you should treat anything written by a software founder — including me — as an orientation, not as legal advice.
What does the ADA actually require for museum audio guides?
The ADA does not say "museums must have audio guides" or "audio guides must include audio description." What it says is that places of public accommodation and state and local government entities must provide effective communication and equal access to programs and services for people with disabilities. Auxiliary aids and services — which include audio description, captions, written materials in alternative formats, and assistive listening systems — are the means by which a museum delivers on that obligation. The choice of which aids to provide is, with some exceptions, the museum's, as long as the result is genuinely effective.
Two parts of the ADA matter for most museums: Title II covers state and local government museums, and Title III covers private nonprofit and for-profit museums. Both have the effective-communication obligation. What changed in 2024 is that the Department of Justice, under Title II, issued a final rule on web content and mobile apps that names WCAG 2.1 Level AA as the explicit technical standard. The original compliance dates have since been extended to April 26, 2027 for entities serving populations of 50,000 or more, and April 26, 2028 for smaller entities and special districts. That rule directly affects publicly operated museums; it indirectly raises the floor for private museums by making WCAG 2.1 AA the de facto standard courts and complainants point to.
Title III, which covers most independent and nonprofit museums, doesn't currently have an equivalent named technical standard, but the litigation landscape since the Robles v. Domino's line of cases has treated commercial websites as places of public accommodation. The practical answer for a private museum: WCAG 2.1 AA is the standard you should be designing toward, even though no rule explicitly names it for you.
None of this is legal advice. If your institution is sizing up obligations, get counsel involved — and make sure they read your funder agreements, because federal grants frequently carry Section 504 obligations independent of the ADA.
What is WCAG, and which version applies to a museum audio guide?
WCAG — the Web Content Accessibility Guidelines, maintained by the W3C Web Accessibility Initiative — is the international technical standard for making web content accessible to people with disabilities. It's organized into success criteria across four principles: content should be Perceivable, Operable, Understandable, and Robust. Each criterion has three conformance levels: A (minimum), AA (the practical compliance target), and AAA (the high bar).
For a museum audio guide delivered as a web app on the visitor's phone — which is what most modern AI audio guides are — WCAG 2.1 Level AA is the applicable bar. WCAG 2.2, published in October 2023, is the current published version and adds nine success criteria around input modality and authentication; conformance to 2.2 implies conformance to 2.1. The DOJ Title II rule still names 2.1 AA specifically, so that's the legal floor; 2.2 AA is where modern web practice is heading.
The criteria that matter most for an audio guide:
- 1.1.1 Non-text Content (A) — every image needs a text alternative. For an audio guide, this means the image of the artwork on the player screen needs alt text the screen reader can announce.
- 1.2.1 through 1.2.5 — captions, audio description, and media alternatives for time-based media. The relevant one for prerecorded video content at AA is SC 1.2.5 Audio Description (Prerecorded).
- 1.4.3 Contrast (Minimum) (AA) — text and image-of-text need a 4.5:1 contrast ratio. A common failure on museum tour players is light gray label text on white.
- 2.1.1 Keyboard (A) — every function must be reachable without a mouse. For a touch player, this maps to screen-reader operability.
- 2.4.6 Headings and Labels (AA) — section headings need to actually describe the section. Screen readers traverse pages by heading.
- 3.1.2 Language of Parts (AA) — when a stop switches languages, the
langattribute needs to switch with it so the screen reader pronounces it correctly.
A vendor whose web player has been audited against WCAG 2.1 AA should be able to share an Accessibility Conformance Report (often called a VPAT) on request. Vendors who can't are worth pressing on.
What is audio description, and when do museums need it?
Audio description is a separate narration track that describes the visual content of a work — the composition, color, posture, expression, scale, what's happening in the scene — for visitors who can't see it. It's distinct from the curator's interpretive narration, which usually assumes the visitor is looking at the work; audio description provides the visual the interpretive narration is sitting on top of.
The American Council of the Blind's Audio Description Project is the field's primary resource and publishes guidelines for describing visual art. The Smithsonian's Office of Visitor Accessibility is another anchor — several Smithsonian museums run regularly scheduled verbal description tours, and their published Guidelines for Accessible Exhibition Design (PDF) is widely used as a starting point by museums building their own programs.
For an audio guide specifically, audio description usually shows up as one of two patterns:
- A parallel description track — when the visitor reaches a stop, they can choose between the interpretive narration and a longer track that opens with a paced description of the work before moving into the interpretation. This is what most large institutions provide for their flagship tours.
- Description woven into the primary narration — for smaller institutions or where the production team chose to write a single accessible track, the description is built into the curator's prose. This is harder to do well but reaches more visitors.
For a private museum without a court order, audio description is not strictly required line-by-line under the ADA. What is required is effective communication for visitors who are blind or have low vision. In practice that almost always means the audio guide carries either description or a paired program that does — because the alternative, sending a docent every time a blind visitor arrives, is not a scalable or dignified solution.
Modern AI platforms can lower the production cost of description meaningfully, because the script can be drafted from the same source materials and reviewed by an audio-description specialist rather than written from scratch. The honest caveat: AI-drafted descriptions still need to be reviewed by someone trained in description practice. The category benefits enormously from the audio description principles the ACB has published over the past two decades, and the institutions that take description seriously involve blind reviewers in their process.
What about captions, transcripts, and Deaf and hard-of-hearing visitors?
The audio guide creates an obvious access problem for Deaf and hard-of-hearing visitors: if the interpretation is audio, and the visitor can't hear the audio, the program isn't reaching them. The accessible answer is straightforward and is now table stakes on serious platforms: every audio stop ships with a synchronized, on-screen transcript or captions, and the visitor can read instead of listen.
Best practice on a museum audio guide, drawn from current accessibility practice in the field:
- Provide both captions and a full transcript. Captions synchronize with playback for visitors who want the read-along experience; the transcript is a single scrollable view of the entire stop for visitors who'd rather read at their own pace.
- Include speaker labels and meaningful non-speech information. When a stop has multiple voices, label them. When ambient sound is part of the interpretation, describe it. The W3C's media accessibility documentation is the technical reference; the AAM's article on Deaf culture and accessibility — which profiles practice at the Met, the Frick, the Children's Museum of Indianapolis, the New-York Historical Society, and the Columbia River Maritime Museum — is a good orientation to how leading institutions are operationalizing this.
- Show progress and timing. A Deaf visitor reading a transcript needs the same orientation a hearing visitor gets from pacing cues. A clear "stop 4 of 12" indicator and an estimated duration are accessibility features.
- Consider ASL where the audience and budget support it. For institutions serving significant Deaf-community audiences, on-screen ASL interpretation embedded into the player is a meaningful upgrade over transcripts alone. ASL is a distinct language, not a transcription of English; the AAM piece above describes how museums are programming around this.
- Make hearing-loop and assistive-listening compatibility explicit. Visitors with hearing aids and cochlear implants benefit when the player's audio output works cleanly with their existing devices. This is one of the underappreciated upsides of BYOD delivery — the visitor's phone is already paired with their hearing technology.
A useful framing: a transcript is the floor, captions are the next step, ASL is the ceiling, and the goal across all three is that a Deaf visitor's experience is the same artifact as a hearing visitor's, not a stripped-down version.
How does WCAG apply to the visitor web app specifically?
A web-delivered audio guide is, technically, a web application. Everything WCAG says about web applications applies to it. The most common failure modes on audio-guide players I've audited, in rough order:
- Player controls without accessible labels. The play/pause button is an icon with no
aria-label. The skip-forward button is a div with no role. A screen reader announces "button" with no context. This is a 1.1.1 / 4.1.2 failure and one of the cheapest to fix. - Focus traps and unreachable controls. Modal dialogs that don't return focus correctly. Language pickers reachable only by touch. This is a 2.1.1 / 2.4.3 issue.
- Color contrast below 4.5:1. Almost always on label text, language pills, or subtitle text. Easy to catch with an automated tool, easy to miss in design review.
langattribute that doesn't switch. The page is markedlang="en"but the visitor switched the tour to Spanish; the screen reader keeps pronouncing in English. A 3.1.2 failure that's invisible to sighted QA.- Captions that can't be resized or styled by the user. WCAG 1.4.4 requires text resizing up to 200%. Hard-coded caption sizes fail this.
If you're evaluating a vendor, ask for the VPAT. If they don't have one, ask whether the player has been audited by an independent accessibility firm, and which one. If they balk at either question, the floor probably isn't where it needs to be.
The WebAIM WCAG 2 checklist is the practical implementation reference most engineering teams use. It's not the standard itself, but it's the document I'd point a developer at to get started.
How does language access fit into accessibility?
Language access — providing interpretation in languages other than English — is sometimes filed under "internationalization" and treated as separate from accessibility. I think that's a mistake. For a museum visitor who reads and speaks Spanish, Mandarin, or Arabic as their first language, the English-only tour is functionally inaccessible to them in the same way a non-captioned video is functionally inaccessible to a Deaf visitor. The technical mechanisms are different. The visitor's experience of being shut out is not.
The accessibility framing matters for two reasons. The first is rhetorical: it forces an institution to treat multilingual production as a baseline obligation rather than an upgrade. The second is structural: it bundles language access into the same review workflow as audio description and captioning, which keeps it from getting deprioritized when budgets are tight.
The legal note here: federal funding recipients also have obligations under Title VI of the Civil Rights Act around language access for visitors with limited English proficiency. The Department of Justice has published guidance on what "meaningful access" looks like in different contexts. For a tourist-destination museum receiving federal grant money, that guidance is worth reading alongside the ADA materials.
The category-level shift here is that AI platforms have made multilingual production cheap enough that the cost has stopped being the constraint. The constraint is review — having a native speaker for each language read the output before it ships to visitors. Pillar 3 in this library — Multilingual interpretation — goes deeper into how many languages a specific museum actually needs and how to staff the review side. The accessibility relevant point here is that the same platform that lets you ship ten languages also has to render each of them correctly to assistive technology.
What does a phone-based audio guide change for accessibility?
The shift from rented handsets to the visitor's own phone, which has happened across the audio-guide category in the past five years, turns out to be one of the most accessibility-positive things to happen to museum interpretation in a long time. Three reasons:
- The phone is already configured. A blind visitor walks into your museum with VoiceOver or TalkBack already running, with their preferred speech rate, with their hearing aids paired, with their text size set the way they need it. A rented handset starts every visitor at the default and asks them to reconfigure on the spot. The visitor's device wins on access by default.
- The QA surface is smaller. A web player can be audited once and run on every phone. A handset has hardware buttons, a separate UI, a separate audio chain, and a vendor-specific assistive layer that has to be tested and maintained independently. Most institutions never had the budget for that.
- Updates ship instantly. When a description gets revised or a caption typo gets fixed, a web app updates immediately. A handset fleet has to be re-flashed; native apps wait for the app store. For accessibility fixes, that latency matters.
The honest caveat is that BYOD assumes the visitor has a smartphone, knows how to use it, and is willing to use it in the gallery. None of that is universal, and a small subset of visitors — older, lower-income, or those who came to the museum precisely to put their phone away — will be poorly served. The accessible answer is to keep a small inventory of pre-configured loaner devices at the visitor services desk, available on request, with the assistive settings already enabled. That hybrid pattern is what I'd recommend to a director treating BYOD as the default.
For the broader visitor-experience shift, see Pillar 5 — Visitor experience — and the note on the 2026 museum visitor.
Where does this approach stop working?
Every Pillar in this library has a section on where the category doesn't fit. For accessibility, the honest list:
- Visitors who can't or won't use a phone. Older adults with limited tech access, visitors who came to disconnect, and visitors whose assistive setup is on a specific device that isn't a smartphone. Loaner devices and printed alternatives still matter.
- Highly specialized assistive workflows. A visitor using a refreshable Braille display, a switch-access user, a visitor pairing through a specific cochlear-implant streamer — these benefit from a setup pass with staff, not a self-service QR code. Your access desk is still the right entry point for them.
- Programs the audio guide isn't replacing. Verbal description tours led by trained staff, ASL-led tours, sensory-friendly hours, touch tours of the collection. These are programs the audio guide augments; it doesn't replace them. The institutions getting this right run both.
- The legal floor in jurisdictions outside the US. EU institutions are subject to the European Accessibility Act and to national implementations of EN 301 549. Canadian institutions look at the Accessible Canada Act. UK institutions look at the Equality Act 2010. The principles transfer; the specific obligations don't.
The category point: a phone-based audio guide with description, captions, transcripts, language support, and a strong web player is a much better accessibility floor than what most museums have today. It is not the ceiling, and it isn't sufficient on its own for every visitor.
What should an institution actually do?
A pragmatic order of operations for an institution that wants to move from "we have basic accessibility" to "we take this seriously":
- Get clear on which laws apply. Title II (state/local government), Title III (private), Section 504 (federal funding), state law, and — for some institutions — international standards. A short memo from counsel is worth more than a long blog post.
- Audit the current visitor web experience against WCAG 2.1 AA. Either internally with the WebAIM checklist or via an outside firm. Get a VPAT from any vendor in the audio-guide pipeline.
- Decide on description scope. Every stop? A subset of flagship works? Tier the program — most institutions can't ship description for everything on day one, and a clear plan beats an aspirational policy.
- Ship captions and transcripts on every audio stop. This is the cheapest, highest-leverage accessibility move and should be the default for any tour going forward.
- Build the review workflow before the volume. A native speaker for each language. A description specialist on the description tracks. A Deaf reviewer on the caption choices. The platform makes the volume cheap; the editorial review is what makes the volume accountable.
- Publish your accessibility commitments. A page on your site that says what's available, where, and how a visitor with a specific need can plan their visit. The Smithsonian's accessibility resources are a useful structural reference.
This is the work the audio guide can carry once it's been set up correctly. The platform is the infrastructure; the editorial program is the institution.
How does this relate to the rest of Convo's resources?
This is the hub for Pillar 4. The spokes under this pillar will go deeper into each piece — audio description in practice, ADA and WCAG specifics for non-lawyers, captioning workflows, language access as an accessibility strategy, and how to brief a board on the 2024 rule.
If you're earlier in your evaluation, the AI audio guides hub covers what the category actually is and how it works. The spoke on AI audio guides vs traditional audio guides covers the production-economics shift that makes broader accessibility coverage affordable in the first place. The Pillar 3 hub on multilingual interpretation covers language access in more depth.
For the broader argument about how the audio guide isn't the product but the visit is, the note on the audio guide is not the product sets the frame. For Convo specifically, our product page describes the platform's accessibility-relevant capabilities, and our security and trust page covers the operational side.
Frequently asked questions
Continue reading
The spokes under this pillar will cover each piece in more depth — audio description in practice, captioning workflows, the 2024 DOJ rule for non-lawyers, and language access as an accessibility strategy. They'll publish as we move through the year.
For the broader category, the AI audio guides hub is the place to start. For the production-economics piece that makes broader accessibility coverage affordable, the spoke on AI audio guides vs traditional audio guides is the next read. For the framing argument behind everything in this library, the audio guide is not the product is the note I'd point you at.
About the author
Eric Duffy is the founder of Convo, a platform that lets museums and cultural institutions publish multilingual audio tours their visitors can have a conversation with. He writes about how museums could afford to be more ambitious with interpretation, drawing on discovery conversations with curators, directors, education leads, and accessibility specialists at small and mid-size US museums. This article is general orientation, not legal advice; consult counsel and the linked primary sources for your specific obligations. Reach him at eric@convo.app or on LinkedIn.